How to create sounds that only you can hear in a venue

Jacob Donley, University of Wollongong and Christian Ritz, University of Wollongong
The Conversation

Picture your typical busy cafe or restaurant that’s full of people. The diners are usually all forced to listen to the same music that’s pumped into the venue via the speakers.

What if you could create sound that was tailored to each table’s taste so the people there could listen to their own music, sports event, news or just enjoy the silence?

It might sound impossible but it’s closer to becoming a reality than you think. And there are many potential uses for these customised zones of sound.

For example, open plan offices could potentially be quieter as the sounds from watching online videos or conferencing could be custom designed so as not to annoy your co-workers. It could be possible to watch movies at the cinema or at home in different languages by simply sitting in the zone that suits you best.

Personalised advertisement in shopping malls could become a reality, tailored to individuals. Sports stadiums could have various commentary and a quieter field to play on. Maybe art installations and museums could provide audible content in front of exhibits.

The list of possibilities is endless.

How does it work?

The sounds you hear every day are waves that travel through air, just like the ripples on the surface of a pond. When the waves meet they can either stand on top of each other, cancel each other or combine to make new sounds.

But by carefully controlling the waves and where they combine, it is possible to have them cancel in some spaces and amplify in other spaces.

If this idea sounds familiar it might be because you’ve heard of noise cancelling headphones and earphones that perform a similar trick. They listen to sounds moving towards the ear and produce a wave that cancels that sound. But this only happens in one very small area at the ear.

Now think of extending this to a larger area surrounded by loudspeakers to control the ripples of sound travelling through the space inside.

In the zone

Each loudspeaker is tuned carefully. They can now produce zones of sound from standing waves and zones of quiet from cancelling waves.

By varying the output of sounds from an array of speakers surrounding a space you can target specific sounds in specific locations. The different sounds here are represented by the different colours.
The Conversation, CC BY-ND

Theoretically, with unlimited power, any size and any number of zones of sound or silence can be created. But it is more practical to create zones no smaller than approximately half a metre wide and separated by about half a metre. This size would comfortably fit a human head and so the need to go smaller is often not necessary.

A difficult obstacle to overcome is the reflections of sound from the walls of a room. Reflections introduce more sound that needs to be controlled.

The same way light reflects off a mirror, sound reflects off a wall. A mirror that is dirty or covered reflects light poorly and the walls of a room can be treated in a similar manner to reduce sound reflections.

But for walls that can’t be treated, a number of microphones in the room may be a solution to reducing the reflections. Lasers have also been used to measure how sound travels through a space and might hold potential in the future to predict room reflections.

Generally, the more loudspeakers used, the better the quality of the zones that can be created. When space is limited, the number of loudspeakers that can fit into a room is reduced.

If too few loudspeakers are used and too many zones created, sounds start to leak between the zones. This can cause issues with privacy but not all leaked sounds can be heard.

One way to improve the privacy in the zones is by using a technique called sound masking. We do this by carefully adding zones of subtle noise that acoustically covers up leaked sounds with minimal effect on others.

We can also reduce the error when creating the zones. This is done by allowing sounds to leak when people can’t hear them, which saves energy. That saved energy can then be used to improve sounds in other places where people want to hear them.

Promising research

Some promising research looks at using mathematical models of how the ear hears and processes sound.

This can provide better quality zones and reduce the number of loudspeakers required. Similar to how popular MP3 audio, MP4 video and JPG picture storage technologies work by using human perception to decrease file sizes.

There are still numerous questions to be answered, such as how do we perceive these zones of sound, is the quality of the sound any good and can reflections from the room be efficiently compensated for?

But we are still on the path to making them a reality. In the past ten years a lot of theory has been published on this topic. In the past five years some two-zone systems have been shown to work in the real world.

Researchers have shown it is possible to provide a couple of distinct zones (with about half a metre between them). One zone can contain speech (or music) at a volume equivalent to a regular conversation, the other zone can contain a sound at a volume similar to background air conditioning.

Scale this up and public places will never sound the same again.

Jacob Donley, PhD Candidate, University of Wollongong and Christian Ritz, Associate professor, University of Wollongong

This article was originally published on The Conversation. Read the original article.
This article was also featured on IFLScience and featured on

The ConversationIFLScienceIFLScience

A fast MATLAB executable (MEX) compilation of the PESQ measure

After having spent a good while looking for a working copy of the Perceptual Evaluation of Speech Quality measure implemented in MATLAB, I ended up using the readily available “wrapper” functions found on the MATLAB file exchange site. While these wrappers work a treat they are a bit slow as they require a call to the system to run the PESQ binary file. The PESQ binary file is compiled from the source code found on the ITU site for the PESQ standard. The source code is written in C which MATLAB supports for its MATLAB executable (MEX) compilations. That was what then prompt me to look for a MEX compiled version of the PESQ measure but to my surprise I couldn’t find any. I managed to get the ITU standards’ source code compiled and working in a MEX function and I will go through the few steps required to do it.

First thing’s first, we must head over to the ITU page for the PESQ standard and download the PESQ source code. In the ZIP file there is a second ZIP file under \Software, we are interested in the contents inside that second ZIP file under \P862_annex_A_2005_CD\source. Extract the source folder to a separate location where you can work on it with MATLAB. We are going to compile it using the MEX compiler function for MATLAB but it needs a few extra things added to it first so that MATLAB knows how to handle the source code and interface it to the MATLAB data types.

Open MATLAB and edit the pesqmain.c file (you can make a copy and rename it if you wish). MEX source code needs an entry point that MATLAB understands so we can’t use main() as the entry point anymore, however, we can still keep it for use as a function. MATLAB looks for an entry point called mexFunction() so we will have to add this to the code and we will do it just before the existing main() function:

#include "mex.h"
#include "matrix.h"
#include <stdlib.h>
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[]){
     //Entry point code goes here

Now, a neat way to make use of the existing code would be to rearrange and convert the MATLAB data types coming in to mexFunction() and then send them off as C data type equivalents to main() in a way it was expecting them. I did this by adding the following code to mexFunction():

    double res = 0.0;
    int i;
    int argc  = nrhs + 1;
    char **argv = (char**) mxMalloc(argc * sizeof(char*));
    argv[0] = mxMalloc( sizeof("pesq") );
    strcpy(argv[0], "pesq");
    for (i=1; i<argc; i++) {
        argv[i] = mxMalloc( mxGetN(prhs[i-1]) + 1 );
        argv[i] = mxArrayToString(prhs[i-1]);
    res = main(argc,argv);
    nlhs = 1;
    plhs[0] = mxCreateDoubleScalar(res);

    for (i=0; i<argc; i++) {

The only problem with adding this code is that it is requiring a return value from main(). This is because we ideally want to return the PESQ result to the MATLAB workspace (currently I have only implemented this for the MOS-LQO), so a few more additions and we will be set. Replace the main() function and its definition with:

double main(int argc, char **argv)

Now it is capable of returning decimal numbers for the results (negative numbers for errors). We also need to find where the original PESQ code would print out the result and use the same variable to get our return value. The original code should look something like this:

if ( err_info.mode == NB_MODE )
     printf ("\nP.862 Prediction (Raw MOS, MOS-LQO):  = %.3f\t%.3f\n", (double) err_info.pesq_mos, 
          (double) err_info.mapped_mos);
     printf ("\nP.862.2 Prediction (MOS-LQO):  = %.3f\n", (double) err_info.mapped_mos);
return 0;

and we want to replace the return value with:

return err_info.mapped_mos;

Now the source code should be ready to compile as a MEX function. If you don’t want anything printed to the screen during the execution of the PESQ algorithm then you will have to comment out the printf() functions riddled throughout the source code (or create a switch).

To compile the source code as a MEX function make sure that in MATLAB you are in the directory specified earlier and run the following at the command window:

mex('pesqmain.c', 'pesqmod.c', 'pesqio.c', 'pesqdsp.c', 'dsp.c')

That’s it, all done. You can now use the pesqmain_mex.mexw64 function in MATLAB with the same arguments you would use with the binary compiled PESQ function, be careful, though, they will need to be entered as separate character arrays in MATLAB. When I ran this MEX function it was approximately 8 times faster than the MATLAB PESQ wrappers on file exchange. Hope it saves someone a little bit (or a lot) of time! =]

If for some reason you cannot build a MEX executable PESQ function then you can download my compiled version here. Additionally, if you place the MEX executable in a class folder named +Tools then you can use this helper function (alternatively you could remove Tools. in the helper function).

The Speech Transmission Index (STI) for MATLAB

There are a few robust algorithms available to objectively measure the intelligibility of speech. They compare what is heard by a listener to the information that was transmitted by the talker, however, only one of these available algorithms works in reverberant rooms and at the time I was looking for an implementation for MATLAB, I couldn’t find any, zilch. So, I decided to code it from scratch myself and will show you how to do it step by step.

The articulation index (AI) is an older measure which has evolved to the speech intelligibility index (SII). The SII uses a band-importance and band-audibility function to account for the lack of mutual information from masking and from varying presentation levels of the clean speech. There is also the short-time objective intelligibility (STOI) measure which is designed for the prediction of time-frequency weighted noisy speech and the speech transmission index (STI) which makes use of the modulation transfer function.

Now, this might all sound a little confusing but in a nutshell only the STI is suitable for measuring intelligibility in reverberant rooms. The STI in essence measures the response of the acoustic channel between the listener and the talker and from this measurement of the channel it estimates how much mutual information could be maintained across that channel. The reason it is reasonably robust to varying reverberation levels is because it analyses spectral band powers using the modulation transfer function (MTF).

Firstly, to measure the STI we will need the impulse response (IR) of the channel. This can be found a few ways for simulations and for real-world recordings. Some of these might be to transmit a valid signal, record the result and then deconvolve the recording with the original signal. Some signals that could be used are maximum length sequence (MLS), white noise, sine sweep, logarithmic sine sweep, popping of a balloon or anything else that goes bang. I will go over some nice methods in a future post.

Once we have a valid IR we will use this as the input to our MATLAB function which will compute the STI. I will be showing how to do this using the older version of the standard (still quite accurate for low presentation levels). I used this as my function definition:

function [STI_val, ALcons, STI_val_approx, ALcons_approx] = STI( ImpulseResponse, fs, OctaveFilters )

which is ready for some extra features which I will explain later.

Now, before we start to calculate the STI we want to set up a second function to find the MTF because to find the STI we must do this many many times. So I added another function definition after the end keyword for the first function and used this definition:

function MTF = getMTF(ir)

So, to find the MTF for a given IR there are five simple steps which we need to add to the getMTF function. The first step is to square the IR:

ir_2 = ir .^ 2;

We need to get the total energy of the squared IR at some point so may as well do it now. This can be done by integrating which is a summation of the discrete values of the IR array:

ir_2_int = sum(ir_2);

After this we need to transform the squared IR into the frequency domain. This can be done using a fast Fourier transform which MATLAB conveniently has a function for:

IR_2_fft = fft(ir_2);

Once we have our frequency domain signal (the envelope spectrum) we should normalise it based on the total energy we calculated earlier:

IR_MTF = IR_2_fft / ir_2_int;

Lastly, the MTF we want to return for use in our STI function should only be the absolute value of the complex envelope spectrum. This can be done like so:

MTF = abs(IR_MTF);

That is the end of the getMTF function.

Now we can get started on finding the actual STI value, the fun part. For this we need to run the entire IR (the full bandwidth IR) through octave band filters and then find the MTF of each of the filtered signals. To filter the IR with octave band filters we must create a filter for each octave band, then filter the IR and then save the result each time. This can be done with the following code:

BandsPerOctave = 1;
N = 6; % Filter Order
F0 = 1000; % Center Frequency (Hz)
f = fdesign.octave(BandsPerOctave,'Class 1','N,F0',N,F0,fs);
F0 = validfrequencies(f);
F0(F0<125)=[]; F0(F0>min([8000,fs/2]))=[]; % Only keep bands in the range required for the STI calculation
Nfc = length(F0);

for i=1:Nfc
  if nargin < 3
    f.F0 = F0(i);
    H = design(f,'butter');
    ir_filtered = filter(H, ImpulseResponse);
    ir_filtered = filter(OctaveFilters(i), ImpulseResponse);
  MTF_octband(:,i) = getMTF(ir_filtered);

If you read the code above you may have noticed the conditional statement that says that if we have three or more input arguments then we should use the third input argument, OctaveFilters. I have added this because creating the octave band filters is computationally expensive, so if we end up calling our function many times it would be better to calculate the filters once and reuse them for each function call, I will show how this can be done in a future post. Also note the function call to our function, getMTF().

Now that we have our octave band filtered signals we should get the amplitude at different modulation frequencies. For the STI there are 14 different modulation frequencies which have 1/3 octave spacing. That is to say, we should get the magnitude at 14 specific frequencies from each MTF. The following code will accomplish this:

modulation_freqs = (0.63) * ((2.0) .^ ([0:13]./3));
freqs = linspace(0, fs/2, size(MTF_octband,1)/2+1); freqs=freqs(1:end-1);%No nyquist frequency
for i=1:Nfc
  for j=1:14
    m(i,j) = interp1(freqs,MTF_octband(1:end/2,i),modulation_freqs(j));
good_freqs = ~any(isnan(m),2);

The last couple of lines here are a little bit special, I have added them so that if we have any frequencies which return bad results they will be ignored.

The amplitude values we now have are known as the “m” values and the next step is to convert them into an “apparent” signal-to-noise ratio (SNR). This is how it is done in MATLAB:

SNR_apparent = 10*log10( m ./ (1-m) );

The STI says we should limit the apparent SNR to ±15dB. I did this like so:

SNR_apparent( SNR_apparent > 15 ) = 15;
SNR_apparent( SNR_apparent < -15 ) = -15;

Once we have our apparent SNRs all we need to do is find the average for each of the octave bands. With our matrix of apparent SNR values we can do this quite simply in MATLAB:

SNR_avg = mean(SNR_apparent,2);

At this point we still have more than a single STI value, though.

To get these octave band SNR values into a single STI value we need to find a weighted average of them and then turn that weighted average into a value between 0 and 1. The weighted average means we need to know the different weights for each band, these are described in the STI standard but if you’re clever you can work them out from the following code:

W = [0.13, 0.14, 0.11, 0.12, 0.19, 0.17, 0.14];
SNR_Wavg = sum(SNR_avg' .* W(good_freqs));
SNR_Wavg_approx = sum(SNR_avg' .* W(good_freqs)/sum(W(good_freqs)));

I have also added a weighted average which is performed on only the good frequencies which we defined before and called it an approximation. This means that if, for instance, the sampling frequency is too low and we dont have enough octave bands to fulfill the proper STI standard, then we can still approximate what the result might be by ignoring the bands that are too high.
Of course, the rescaling needs to be done too:

STI_val = (SNR_Wavg + 15) / 30;
STI_val_approx = (SNR_Wavg_approx + 15) / 30;

That’s it! but of course, I think it is nice to add just a couple more convenient return values before we end our function:

ALcons = 170.5405 * exp(-5.419*STI_val);
ALcons_approx = 170.5405 * exp(-5.419*STI_val_approx);

Those calculations are for the articulation loss of consonants (ALcons) which is another useful value.

Ok, now we are done and the function can be ended with the end keyword. If you have done it all properly you should have code that works like the one you can download from here.

How do you use your newly created STI function?

Easy, just pass your IR array and the sampling frequency (fs) to the function:


Want the articulation loss of consonants instead? No problem:


What if the signal doesn’t meet the sampling requirements for the STI standard? Well, let’s approximate the STI and ALcons and compare them with the true standard-abiding values:


If you’d like to pass the octave filters to the function you can read over this code and try write a separate function to create them. I will go over this in a future post.

That’s it, you can now pass the impulse response of any audio channel through this function to find the STI value.