Open an interactive online version by clicking the badge

or download the notebook.

HRTFs: Identification and Normalization#

Fabian Brinkmann
Audio Communication Group, Technische Universität Berlin
Contact: fabian.brinkmann@tu-berlin.de

The task of this notebook is (i) to perform a system identification, that is, to estimate HRTFs based on raw data from sweep-based acoustic measurements in a first step, and (ii) to normalize the HRTFs in a second step, that is, to apply a standardized post-processing with the goal to minimize differences between HRTFs of the same subject that were measured at different facilities.

Duration: 180-360 Minutes

Requirements: Basic knowledge of HRTFs and coordinate conventions. Good knowledge of digital signal processing and pyfar.

References
[1] H. Møller, “Fundamentals of binaural technology,” Appl. Acoust., vol. 36, pp. 171–218, 1992, doi: 10.1016/0003-682x(92)90046-u.
[2] H. Bahu et al., “Towards improved consistency between databases of head-related transfer functions,” J. Audio Eng. Soc., 2025.
[3] F. Brinkmann et al., “A High Resolution and Full-Spherical Head-Related Transfer Function Database for Different Head-Above-Torso Orientations,” J. Audio Eng. Soc., vol. 65, no. 10, pp. 841–848, Oct. 2017, doi: 10.17743/jaes.2017.0033.
[4] B. Xie, “On the low frequency characteristics of head-related transfer function,” Chinese J. Acoust., vol. 28, no. 2, pp. 1–13, 2009.
[5] B. Bernschütz, “A spherical far field HRIR/HRTF compilation of the Neumann KU 100,” in AIA-DAGA 2013, International Conference on Acoustics, Merano, Italy, Mar. 2013, pp. 592–595.
[6] R. O. Duda and W. L. Martens, “Range dependence of the response of a spherical head model,” J. Acoust. Soc. Am., vol. 104, no. 5, pp. 3048–3058, 1998.

Dependencies

pip install pyfar>=0.7 pooch nbgrader ipykernel watermark

[ ]:

import pyfar as pf
import numpy as np
import matplotlib.pyplot as plt
from IPython.display import Audio, display
import pooch
import os
%matplotlib ipympl

1. Load example data#

The following downloads the data for this task: recorded sine sweeps for five source positions from the FABIAN HRTF database [3].

[ ]:

# adjust this path to your needs. Using `None` will download the file to your
# system cash.
path = None

# Leave this as it is: This is the URL from which the data will be downloaded
# and a hash for checking if the download worked.
url = 'https://github.com/pyfar/files/raw/refs/heads/main/education/VAR_TUB/hrtf_post_processing_and_normalization.far?download='
hash = '1f4e7ad698ce65c1e359d914918fa9f4f81ca611eb9812243b55798fb2462732'

file = pooch.retrieve(
    url, hash, fname='hrtf_post_processing_and_normalization.far', path=path)

This reads the data as pyfar Signal and Coordinate objects.

[ ]:

# read data
data = pf.io.read(file)

# sweeps recorded at the ears and corresponding microphone positions
ear_pressure = data['ear_pressure']
ear_positions = data['ear_positions']

# source positions for which the sweeps were recorded
source_positions = data['source_positions']

# sweeps recorded at the center of the loudspeaker array with the dummy
# being absent and the corresponding microphone position
reference_pressure = data['reference_pressure']
reference_position = data['reference_position']

Next, we download and import Python code for a spherical head model that will be used in the normalization process

[ ]:

# Leave this as it is: This is the URL from which the data will be downloaded
url = 'https://raw.githubusercontent.com/pyfar/open-educational-resources/refs/heads/main/docs/oer/courses/Virtual_Acoustic_Reality_TUB/hrtf_processing/spherical_head.py'
hash = '8ed6f4ec104f22864046243e7fe102a0cf0bac00337bddb24171ebee9cef3b97'

# Download to the directory of this notebook for importing
_ = pooch.retrieve(
    url, hash, fname='spherical_head.py', path=os.getcwd())

from spherical_head import spherical_head

2. Get familiar with the data#

It is always good to get to know the data you are working with. Start by plotting the scene geometry defined by the positions loaded as pyfar coordinate objects above. You should know the following:

For how many and which source position are the data available?
Where were the microphones located during the HRIR and reference measurements?

[ ]:

# YOUR CODE HERE
raise NotImplementedError()

Plot a recorded sweeps and use pyfar plot shortcuts to inspect the sweep in the time and frequency domain.

[ ]:

# YOUR CODE HERE
raise NotImplementedError()

Play back the same sweep via headphones.

[ ]:

# YOUR CODE HERE
raise NotImplementedError()

3. Deconvolution#

The first step is to obtain the raw HRTF by means of deconvolution, i.e.,

\[H = \frac{P_\text{ear}}{P_\text{reference}} \, \mathrm{e}^{-\mathrm{j}\omega\tau} = P_\text{ear} \, \frac{1}{P_\text{reference}} \, \mathrm{e}^{-\mathrm{j}\omega\tau}\]

Where \(H\) is the HRTF (complex spectrum) and \(\tau\) a delay to force causality. You will soon see, why this is required.

NOTE: From now on, you should always visualize the processing steps for all data to inspecting if things went according to plan.

a) Invert reference#

Note that regularization is often used to compute the inverse \(1/P_\text{reference}\) to avoid excessive gains when inverting band-limited signals. This is done to not boost out-of-band noise. You can realize this and most other processing steps can be done with the pyfar.dsp module.

The regularization will later act on the HRIRs as a band pass. Bahu et al. [2, Sec. 1.1] suggest to low-pass at 18 kHz to normalize measurements across different datasets. If measurements are valid beyond this range, it does not harm to increase the low-pass frequency.

[ ]:

# sweep inversion
# YOUR CODE HERE
raise NotImplementedError()

b) Deconvolve#

Next, perform a frequency domain multiplication of the sweeps recorded at the ear channels with the inverse computed above.

[ ]:

# YOUR CODE HERE
raise NotImplementedError()

Did you carefully inspect the result? You might have noticed that some of the time domain signals have significant energy at the end of the impulse response. This comes from the deconvolution that can be written as

\[\frac{P_\text{ear}}{P_\text{reference}} = \frac{|P_\text{ear}|}{|P_\text{reference}|} \mathrm{e}^{\mathrm{j}\omega(\tau_\text{ear} - \tau_\text{reference})}\,,\]

which shows that the group delay of the reference \(\tau_\text{reference}\) is subtracted from the group delay of the ear signals \(\tau_\text{reference}\). Hence, the HRIRs become acausal if the source position is closer to the ear than the reference position.

c) Force causality#

Correct this with a cyclic time shift. For now, make sure the shift is large enough. You will refine the temporal alignment in a later step.

[ ]:

# YOUR CODE HERE
raise NotImplementedError()

4 Temporal alignment#

As suggested by Bahu et al. [2, Sec. 1.2] you should now align the HRIRs by

Estimating the onsets of the HRIR for the frontal source position. It is suggested to use leading-edge detection with a threshold of 20 dB. This can be done with pyfar.dsp.find_impulse_response_start.
Shift all HRIRs so to make sure the frontal HRIR starts after 1 ms. Use the smaller onset, if they differ across the left and right ear.

Start by detecting the onset time.

[ ]:

# YOUR CODE HERE
raise NotImplementedError()

Now shift HRIRs to make sure that the frontal HRIR starts at 1 ms.

[ ]:

# YOUR CODE HERE
raise NotImplementedError()

In case you focused on inspecting the frontal HRIR only it is now time to check how all aligned HRIRs look like.

[ ]:

# YOUR CODE HERE
raise NotImplementedError()

5. Windowing#

Acoustic measurements usually contain reflections, even if they were done in an anechoic chamber. In this case reflections may come from the measurement equipment itself (other loudspeakers, supporting construction, etc.) or from the room (door, floor, etc.).

Reflections show up in the impulse response as peaks that follow the direct sound. In the spectrum they cause a ripple (comb-filter) effect.

Reflections are commonly discarded by applying a time window to IRs and Bahu et al. [2, Sec. 1.3] suggest an asymmetric Hann window with a length of 5.8 ms, a fade in of 0.25 ms and a fade out of 1 ms. The fade in should start 0.25 ms before the earliest onset detected in the HRIR dataset.

Note: The length and fade-in are recommendations that should work in many cases. In some cases a longer window might be possible, or a shorter window might be required.

[ ]:

# find the earliest onset in the HRIR dataset
# YOUR CODE HERE
raise NotImplementedError()

# window the HRIRs
# YOUR CODE HERE
raise NotImplementedError()

# plot windowed HRIRs and window
# YOUR CODE HERE
raise NotImplementedError()

# plot HRTFs before and after windowing
# YOUR CODE HERE
raise NotImplementedError()

If you inspected the HRIRs, you might have noticed that

the HRIRs become much smoother after the reflections are windowed away, especially for HRIRs in the head shadow zone
the frequency resolution changed, if you truncated the HRIRs after 5.8 ms
the low frequency response changed, which can best be seen on a linear frequency axis

6. Low-frequency extrapolation#

Due to the limited frequency range of loudspeakers commonly used to measure HRTFs, the HRTF is usually invalid at low-frequencies and must be estimated by means of extrapolation.

At low frequencies, the HRTF magnitude is close to 0 dB. It is not exactly 0 dB due to the difference between the distances from the source positions to the ear and to the reference position at the center of the head. Bahu et al. [2, Sec. 1.5] suggest to interpolate between the magnitude of a spherical head model at 0 Hz and the HRTF magnitude at a cut-off frequency \(f_c\), above which the measured HRTFs are valid.

More extrapolation approaches have been suggested from which you could chose as well

Xie [4] suggest linear extrapolation of the magnitude and unwrapped phase response to 0.
Bernschütz [5] suggests a Linkwitz-Riley crossover network with a time-aligned low-pass.
Numerically simulated HRTFs [3] or a spherical head model [6] can be used to replace the HRTF at invalid frequencies.

[ ]:

# Compute spherical head transfer functions using `spherical_head()`
# YOUR CODE HERE
raise NotImplementedError()

You should now carefully inspect the data to decide above which frequency it is still valid. A possibility to determine this is to see above which frequency the aligned HRTFs have approximately the same magnitude as the windowed HRTFs. Note that this frequency limit can depend on the source position and the time window applied in a previous step.

Chose the frequency above which the HRTFs are left unchanged and extrapolate below this frequency.

Note: The magnitude of a spherical head HRTF with a radius of 8.75 cm is approximately constant below 200 Hz. You could apply the target values from this frequency downwards. Or find a frequency for which extrapolation best matches HRTFs before windowing.

[ ]:

# YOUR CODE HERE
raise NotImplementedError()

7. Far-field extrapolation#

The previous sections showed that the low-frequency HRTF magnitude depends on the distance at which the HRTFs were measured, which might not always be desired.

To correct this, Bahu et al. [2, Sec. 1.6] suggest to compute distance distance variation functions (DVFs) using Spherical Head Transfer Functions (SHTF)

\[\mathrm{DVF}(r_\text{m}, r_\text{ref}, \Omega) = \frac{\mathrm{SHTF}(r_\text{ref}, \Omega)}{\mathrm{SHTF}(r_\text{m}, \Omega)}\]

where \(r_\text{m}\) is the distance at which the HRTFs were measured, \(r_\text{ref}=100\) m a distance in the far-field, and \(\Omega\) the source position given by azimuth and elevation.

Compute the DVS wuth the spherical_head function.

[ ]:

# YOUR CODE HERE
raise NotImplementedError()

Apply the far-field extrapolation by applying the DVSs to the HRTFs

\[\mathrm{HRTF}(r_\text{ref}, \Omega) = \mathrm{HRTF}(r_\text{m}, \Omega) * \mathrm{DVF}(r_\text{m}, r_\text{ref}, \Omega)\]

[ ]:

# YOUR CODE HERE
raise NotImplementedError()

8. Truncate to final length#

HRIRs should be as short as possible, usually they can be shortened to 256 or even 128 samples after proper low-frequency extrapolation.

You can optionally truncate the HRIR and compare it to the full-length version. Although you windowed the HRIRs before, the low-frequency extrapolation most likely changed the time signal. It might hence be useful to apply a second time window including fade-in and fade-out.

[ ]:

# window to final length
# YOUR CODE HERE
raise NotImplementedError()

# compare against full length (pad zeros to increase FFT resolution)
# YOUR CODE HERE
raise NotImplementedError()

9. Diffuse field equalization#

HRTFs are diffuse-field equalized by a spectral division of each HRTF by the HRTF averaged across source positions, also called Diffuse-Field HRTF or Common Transfer Function (CTF). After equalization the HRTFs are often referred to as Directional Transfer Functions (DTFs).

There are multiple ways to average across source positions, of which the RMS is one possibility

\[\mathrm{CTF}_{l,r}(f) = \sqrt{\frac{1}{Q} \sum_q^{Q-1} |w_q \, \mathrm{HRTF}_q(f)|^2}\]

with indices \(l\) and \(r\) denoting the left and right ear, \(w_q\) optional weights for averaging, the frequency \(f\), and the number of sources \(Q\). The diffuse-field equalization is often done separately for the left and right ear. This removes natural differences between the left and right ear, but could also mitigate measurement errors, e.g., an incorrect microphone placement. If this is not desired the above should also average across ears.

Note:

dividing by the CTF equals multiplying by its inverse. It might be best to again use regulated inversion
the CTF is zero phase by definition and making it minimum-phase is suggested by Bahu et al. [2, 1.4]

[ ]:

# compute the CTF
# YOUR CODE HERE
raise NotImplementedError()

# plot CTF and CTF inverse
# YOUR CODE HERE
raise NotImplementedError()

Now, apply the inverse CTF in a final processing step to obtain the directional transfer functions.

[ ]:

# compute CTFs
# YOUR CODE HERE
raise NotImplementedError()

# compare HRTFs and DTFs
# YOUR CODE HERE
raise NotImplementedError()

Challenge#

Note that no measurement is like the other. The parameters that worked here, must not necessarily work for other data. Always check all your processing steps.

License notice#

This notebook is licensed under CC BY 4.0

Watermark#

The following watermark might help others to install specific package versions that might be required to run the notebook. Please give at least the versions of Python, IPython, numpy , and scipy, major third party packagers (e.g., pytorch), and all used pyfar packages.

[ ]:

%load_ext watermark
%watermark -v -m -p numpy,scipy,pyfar,sofar,nbgrader,watermark

Python implementation: CPython
Python version       : 3.13.4
IPython version      : 9.1.0

numpy    : 2.3.3
scipy    : 1.15.3
pyfar    : 0.7.3
sofar    : 1.2.2
nbgrader : 0.9.5
watermark: 2.5.0

Compiler    : Clang 14.0.6
OS          : Darwin
Release     : 24.6.0
Machine     : arm64
Processor   : arm
CPU cores   : 8
Architecture: 64bit