Swept sine deconvolution

bassik

Hello Katja,

one other comment:

gain formula should be:

GAIN = (2^15)-1/MAX for a 16-bit file.

or GAIN = (2^23)-1/MAX for a 24-bit file

CHeers

Bassik

katjav

@bassik said:

In the case of the IR it applies a gain in order to have the maximum peak digital value (0.9999999999) and then consequently the same gain to all the rest of the IR (24-bit integer).

Ah, so normalisation of the IR is done in such a way that the resolution of the integer format is maximally employed. That makes sense.

And then for convolution of the anechoic signal, a second normalisation factor should be applied, in such a way to prevent clipping. This is the difficult part. How to compute this factor from an arbitrary IR? I don't see how it could be computed from the IR array itself, since the convolution output magnitude depends on input frequency. But I can imagine a test method: convolve a sweep with the IR and find the maximum output, then scale down to have maximum output 0.999999 when expressed in floats. Is that how it is done?

All taken together, lots of routines to be written in Pd. Oh yeah, there is something else: for proper analysis, the impulse response of the test set (speaker&mic) should be factored out. Taking close-mic sweep response, invert magnitude spectrum, compensate, etc. Is this regularly done in IR measurement practice? If so, how?

Katja

katjav

@katjav said:

How to compute this factor from an arbitrary IR? I don't see how it could be computed from the IR array itself, since the convolution output magnitude depends on input frequency.

Oops, this is of course very simple to find out in frequency domain since the trimmed impulse response will easily fit in one FFT frame. This thread is now getting full with my stupid remarks, sorry.

Katja

katjav

I have now written an object class [chirp~], generating linear or logarithmic chirps calculated with 64 bit float precision. It has user parameters for start / stop frequency and chirp length. It has two outlets, one for the chirp and one for the inverse chirp.

The chirps from this object have neglegible numerical error, as opposed to the earlier chirp-generating patches. It is a pity that you can no longer see how the chirps are calculated, without reading the C-code. But this is the only road to quality IR measurement within Pd.

Attached is a zip containing chirp~.c, chirp~.pd_darwin, and chirp~-help.pd. If you happen to be on OSX, you can check [chirp~] rightaway. Otherwise you may compile the class for your platform, or wait till I have done that on someone else's computer.

Katja

http://www.pdpatchrepo.info/hurleur/chirp~files.zip

bassik

Hello Katjav,

I am finally back even if still with some problems as I don't have internet at home and work is manic....

Anyway let's put everything in the right order:

[chirp~]: I am not on mac but I can test it on mac at home but it would be good to compile it for all platforms...do you have any reference where I can understand how to do the compiling?
also when you consider linear sweep, are you considering a different inverse filter which only the time reversal of the test signal?
In attachment there is an extract of the CATT acoustics manual on calibration of IR and convolution; It is related to audio convolution for auralisation simulations but the same principle can be applied to our purposes to calibrate IR and convoluted responses.
Katjav wrote: Oh yeah, there is something else: for proper analysis, the impulse response of the test set (speaker&mic) should be factored out. Taking close-mic sweep response, invert magnitude spectrum, compensate, etc. Is this regularly done in IR measurement practice? If so, how?

Normally in IR measurement practices with Swept sine, the convoluted result will appear in the software with all the non-linearities on its left hand side. In all common software (as far as I know) the removal of the non linearities is done by hand just by changing the start point of the result IR and by windowing the end part of it....after this you start analyzing all the parameters you want to derive from the IR.
I hope I answered your question.

Calibration of IR and resulted convoluted IR
as far as I know there isn't anything in literature that explains exactly the routine for calibrations but the discussed process in the previous posts will do the job with the important task of storing the gain information in order to be able to calibrate the convoluted anechoic file later.

also please have a look at this paper: http://pcfarina.eng.unipr.it/Public/Papers/238-NordicSound2007.pdf

from page 13 to 24 it explains all the issues that can occurs with sweep sine method.

I will try to start the tests soon and to try to find more info in literature.

Hope this helps

Thank you

Bassik

http://www.pdpatchrepo.info/hurleur/CATT_acoustics_IR_calibration.pdf

katjav

Thanks Bassik for pointing to the Farina article. Particularly interesting is the section on Kirkeby inverse filters for compensating spectral deficiencies in measurement equipment.

Meanwhile, I have made a patch to inspect characteristics of [chirp~] output in detail, with horizontal and vertical zoom options on the IR plot, and spectrum magnitude and phase plots. Real-signal chirps can never have a completely white spectrum. It turns out that logarithmic chirps are quite sensitive to parameter settings, while linear chirps are less so. Therefore, a log chirp may have considerable ripple near start and stop frequency, but I have also found settings which induce low ripple (+/- 0.3 or 0.4 db). It may have been coincidence that I stumbled upon such favourable settings. There is probably no simple rule to derive them numerically.

Because the log chirp is so sensitive to parameter settings, it needs special attention. For an IR measurement tool, we could provide 'approved' chirps in stereo wave files (32 bit floating point) together with their inverse. Chirp generator / tester and IR measurement tool would then be two different Pd root patches which can be developed independantly.

So, the chirp generator / tester is attached to this post. Again, there is only an OSX binary for [chirp~] (same one as before). Sorry. Building on other platforms is easy pie if you have developer tools installed and know how to operate them, but otherwise, don't spoil your time on it. I'll do it later on a friend's computer.

Katja

http://www.pdpatchrepo.info/hurleur/chirp~test.zip

bassik

Hello Katjav,

hope you are well.

I have performed some test on a real case and I have to say that despite the first 2 measurements worked very well, after that the deconvolution process stopped to work properly.

Attached there is one of the wrong results I have obtained from the soft.

you can clearly hear that instead of the IR you get the inverse filter at the same time where the IR should be.

Is it a user problem?
Is it a multi measurement issue with the software?

I will investigate further before any conclusion.

thanks

katjav

Bassik, there may be some confusion. This last patch chirp~test.pd is only a patch to test the qualities of the generated chirp itself.

I am kind of swallowed by details of the method. Instead of quickly producing a working IR measurement tool, I am now trying to refine the method, starting with refining the chirp itself. Last time I mentioned the sensitivity for parameter settings. Now I have found a systematic method to optimize a chirp. The formula is different from the regular ones as found in texts. This has lead to yet another chirp object [expochirp~] (yes it's exponential, not logarithmic, also see Farina's 2007 article).

The regular exponential chirp formula's make a chirp start at sine phase to avoid a step, but they do not care about the phase at the end of the function, therefore it may well end with a step and produce excessive ripple at the high spectrum end. This is traditionally relieved by windowing, or by manually cutting the chirp at the last zero-crossing as Farina suggests. My new formula computes the chirp in such a way that it will start and stop in sine phase. No need for windowing or cutting, and the ripples are minimized.

Still there will always remain ripples in a chirp's spectrum, and I am now trying to produce inversion-filters to clean up a system's measured IR. For the high spectrum end, this is fairly well possible. But in the low spectrum end, where it is more important, the ripple character of exponential chirps is such that conventional methods fail. I did find some solution but it's not without compromise. Therefore I am investigating alternative methods.

I am sorry that proceedings are slow now. But my hope is to contribute to a good quality tool in the end. I'll post details on the new chirp formula soon.

Katja

bassik

Hello Katjav,

I haven't used the latest chirp~test patch but the previous version of the soft with teh old chirp computation.

So it was supposed to work at least in terms of recording the chirp and deconvolve it; but as i said it needs more testing as it was just a one off mesurements set the one i performed.

I am sorry as well I haven't contributed much in teh last time but I am full of stuff at work.

Anyway I will post a comparison between the PD measurement and a professional soft measurement in teh same positions soon.

Also I am not able to use your latest chirp~test patch as I am on windows.

thank you for keeping the study alive!!!

bassik

HEllo Katjav,

As promised I have performed some comaprisons between the IR measured by EASERA software and the one done with the PD patch we are developing.

As said, I have used the old chirp computation patch and have obtained useful IRs only for the first 2 measurements.

comparison in RT results are in the attached file showing:

RT results
Waterfall plots (where possible to compute)
Magnitude plots

Interesting is to see that EASERA is not able to compute the dB SPL of the wav file as it doesn't have any reference of the microphone calibration.
I need to investigate more if I can insert this info for the wav file so for the moment the waterfall plots are not comparable (EASERA measurements are in dB SPL while PD measurements in dB FS).

Anyway this is just a start and you can see that the RT measurements are quite similar apart for the 125 Hz results.

Hope this helps

More to come

Cheers

http://www.pdpatchrepo.info/hurleur/RT.xls

katjav

Hey Bassik, thanks so much for your test results.

Apart from the 125 Hz the found RT are comparable indeed. But the magnitude plots of the second case? Should M0002_S01_R02.etm be comparable to Test 2b proc.wav? Even without a reference, the figure should show a similar shape because it is in deciBels, no?

But well, at least it shows that the Pd patch is not complete nonsense. We're on the right track.

I wonder how the magnitude of a full IR is computed in EASERA. For a 20 second chirp the IR it is at least 40 seconds, anticausal and causal part taken together. Or is the analysis limited to the causal part? No, I would think that full IR means the whole thing with the left side included. Pd can not do such long FFT's, 131072 points is the maximum. Probably the magnitude could be computed as an average of several overlapping frames. But the phase information is lost then.

I consider writing an FFT object for Pd that operates on a named buffer, and works with 64 bit floats internally. Maybe it is then possible to do larger FFT's. This is also important for calculating inverse filters (correction filters), since the full IR spectrum is sometimes needed for that. There is also something wrong with [fft~] and [rfft~] phase interpretation: it does not do a centered FFT. So with an IR centered you get these alternating phases instead of a zero-phase representation. Therefore, complex multiplication of spectra will give wrong results. Quite annoying. A new FFT object should have an option to select centered and non-centered FFT so you could work well with centered IR, and with causal part only. Hmm, plans....

Katja

katjav

Here is my new formula / method for an exponential test chirp.

sin((((pi/2^p)*L)/ln(2^p)) * e^((n/N)*ln(2^p)))

condition: ((pi/2^p)*L/ln(2^p)) = M*2*pi

M = a positive integer
L = a decimal number (floating point number) expressing chirpsize in unity samples
N = ceil(L), that is L rounded up to an integer, the real arraysize
p = the integer number of octaves in the chirp (user argument)

With this method, you get a chirp with evenly-spaced octaves. The chirp starts in sine-phase and also ends in (approximate) sine-phase, thus minimizing the ripple height. It is even the case that each octave starts and ends in approximate sine-phase, whatever use that may have.

The crucial element in the method is the exponential curve at index 0 and N.

e^((0/N)*ln(2^p)) = e^0 = 1
e^((N/N)*ln(2^p)) = e^ln(2^p) = 2^p

Notice that ((pi/2^p)*L/ln(2^p)) is a constant. If this constant is an integer multiple of 2*pi, the exponential curve will start with 2*pi and end with an integer multiple of 2*pi, since 2^p is also an integer.

Using this method, the start frequency depends on the sampling rate and number of octaves. The start frequency is (sr/2)/2^p. For example: (44100/2)/(2^10) = 21.53 Hz. The stop frequency is always near Nyquist, with a small deviance resulting from the rounding of L to N. For large N, the deviance is negligible for all practical purposes.

If y[n] is the chirp, the inverse chirp is formulated:

y[N-n] * -ln(1/2^p) * (2^(p/N))^-n

This calculates an inverse chirp with average power equal to the forward chirp. That means, more than unity peak level in the high frequencies, and less than unity in the low frequencies.

The user supplies a maximum length argument in nr of samples, and the required number of octaves. From this, a new object [expochirp~] calculates the first ideal length L below the maximum length. It returns the integer chirp size N as a message (so you can resize arrays), and starts feeding chirp and inverse chirp to the signal outputs.

Chirp and inverse chirp can be stored in a 32 bit float stereo .wav file, but the risk is that someone will play the stereo file in a regular soundfile player, which for the inverse chirp could lead to severe clipping. The reason for normalising to amplitude instead of unity peak level is that no further information is now needed about the chirp for normalisation after deconvolution. That normalisation will be simply N/2, like it would be the case for a linear chirp. For safety, chirp&inverse could be stored as .txt files instead of .wav. I don't know yet what is wise.

Although my expochirp method so far guarantuees a straightforward chirp with minimized ripple, the ripple is still considerable, and at the low frequency side quite problematic in my view. Some speaker systems may already have their own resonance frequencies in the region of the start frequency. Therefore this is exactly the region where a flat chirp spectrum is most important, to not push eventual resonances to further height. I have experimented with correction filters on the IR, but it would be better to relieve the problem at it's origin again, the shape of the chirp itself. I am now trying to find a fade-in window for that purpose. Windowing means amplitude modulation, causing sum- and difference frequencies. A proper window may cancel ripples, but a bad window could easily amplify them, or not? For me this is a new research topic. Only when results are acceptable I will finish [expochirp~] and share it.

Katja

bassik

@katjav said:

Hey Bassik, thanks so much for your test results.

Apart from the 125 Hz the found RT are comparable indeed. But the magnitude plots of the second case? Should M0002_S01_R02.etm be comparable to Test 2b proc.wav? Even without a reference, the figure should show a similar shape because it is in deciBels, no?

But well, at least it shows that the Pd patch is not complete nonsense. We're on the right track.

I wonder how the magnitude of a full IR is computed in EASERA. For a 20 second chirp the IR it is at least 40 seconds, anticausal and causal part taken together. Or is the analysis limited to the causal part? No, I would think that full IR means the whole thing with the left side included. Pd can not do such long FFT's, 131072 points is the maximum. Probably the magnitude could be computed as an average of several overlapping frames. But the phase information is lost then.

I consider writing an FFT object for Pd that operates on a named buffer, and works with 64 bit floats internally. Maybe it is then possible to do larger FFT's. This is also important for calculating inverse filters (correction filters), since the full IR spectrum is sometimes needed for that. There is also something wrong with [fft~] and [rfft~] phase interpretation: it does not do a centered FFT. So with an IR centered you get these alternating phases instead of a zero-phase representation. Therefore, complex multiplication of spectra will give wrong results. Quite annoying. A new FFT object should have an option to select centered and non-centered FFT so you could work well with centered IR, and with causal part only. Hmm, plans....

Katja

Hello Katja,

Magnitude is expressed in dBFS and for definition our recorded signal should not go above 0 dBFS hence clipping in the analog domain.
So there is something wrong in EASERA when computing our IRs; I haven't figure out yet as the manual does not contain any reference to wav file loading and analysis.

Regarding which part is computed in Easera:
For acoustic measurements (RT, STI etc) only the casual part should be used.
The purpose of the swept sine technique is to get all the reproduction system non-linearities in the anti casual part (leftmost part) of the deconvoluted IR in order to then eliminate them.
So all the analysis should be done on a IR that has the time 0 where the peak of the IR is; also it will be then windowed to eliminate the late part that is not useful to be investigated as it might contain other artifacts of the deconvolution process due to e.g. the noise in the environment during measurements.

I hope I have understood your question.

Glad this keeps you thinking about new objects for PD.

I will review your post on the new chirp formula.

Thank you

Bassik

bassik

Hello Katja,

please have a look at the attached file!
Magnitude plots and waterfall plots with our wav file processed (arrival to IR peak moved and windowing) made in Easera; the magnitude is now about right!!!

Probably Audacity applies some processing to the file when exported after processing (just cutting) that lead EASERA to misleading the IR characteristics.

You can see that our IR has its peak normalized to 0 dBFS, did you apply already all our thoughts on IR scaling??

Still need to find out how to import mic calibration in EASERA....

Bassik

http://www.pdpatchrepo.info/hurleur/RT.xls

katjav

Hi Bassik,

These comparisons look quite promising! And this is done with the old wiggly chirp. The low frequency ripples over which I'm cracking my brains hardly play a role because the system's response starts at around 100 Hz. I'm curious, what is actually the system under test?

The promising results are of course no reason to stop the quest for the ideal chirp. I have done some fade-in window tests, and indeed they can completely remove the low frequency ripple. But the fade-in window has to be crazy long before it has good result, stretching fully over the first octave. The ripples actually result from the chirp's sudden start. A regular smooth fade-in like the half Hann window I am using now does not effectively counteract this effect. So now I'm thinking of a critically damped oscillator, applied time-reversed as a window. Or maybe a windowed sinc.

About normalization I had some more thoughts. It occurred to me that the direct response will (in most cases) be spreaded over two neighbouring indexes. Ideally there should be just one index representing the indentity diagonal in a Toeplitz convolution matrix. With the two largest peaks in the center of an FFT frame, the group delay can be estimated over a region which has linear phase. Then, the complete IR phase spectrum should be rotated, to center the IR exactly on one index. After this, peak normalization of the IR should be done. Then, the maximum spectrum magnitude must be found. The multiplicative inverse of this maximum provides the normalization factor to prevent clipping. To do all these things, we really need that FFT-XXL object yet to write.

For acoustic measurements (RT, STI etc) only the casual part should be used. The purpose of the swept sine technique is to get all the reproduction system non-linearities in the anti casual part (leftmost part) of the deconvoluted IR in order to then eliminate them.

This is clear. So 'full IR' actually means the causal part for this purpose. I can't remember having heard about STI. Found this DIY article on the topic:

http://www.synaudcon.com/site/articles/a-do-it-yourselfers-guide-to-computing-the-speech-transmission-index/

Seems rather complicated. Should we build this in Pd too?

Katja

katjav

More on chirp windowing, good news.

Yesterday I reported about a Hann-fade-in window eating so much of the chirp, before producing good result. Actually that need not be a problem. The chirp can be extended with an extra octave, and this octave is then sacrificed to the fade-in. With 11 octaves in the chirp, and the first octave fully fade-in-windowed, a beautiful spectrum results, with a fairly steep transition band climbing up to precisely the start of the second octave (around 21 Hz at sr 44K1) and with only the faintest remainders of a ripple.

1/11 of the chirp is now spent on a fade-in. For example, almost 2 seconds of a 20 seconds chirp. So what? The spectral result is excellent, and the chirp length may be extended with a few seconds to compensate the lost time. I see no disadvantages so far.

Think I'll implement this long Hann-fade-in by default in my chirp generator.

Katja

katjav

So I have this object [expochirp~] finished and done some simple IR measurement on my MacBook internal soundcard (out >> in loopback). Earlier tests had already shown a substantial dip around Nyquist but that seems reasonable as the signal must be filtered to not generate aliases in the discrete domain.

But now the surprise, at least for me. A couple of days ago I uttered the assumption that the direct impulse response will be spreaded over two neighbouring samples. After all, when a pulse is received from acoustic or analog source, it would be very coincidental if it falls exactly on a single discrete index. From my first experiments however, I learned that the pulse is not spreaded over two neighbouring samples, but... over a whole bunch of samples. It is a sin(x)/x figure! (edit: sin(pi*x)/(pi*x) actually). Like spectral smearing in frequency domain. Should have known that....

Ok then, time zero lies somewhere inbetween two indexes. And because of the sin(x)/x smearing, there is no way of exactly taking the causal part of the IR. Everywhere you try to cut it near the highest coefficient, it gives a completely different spectrum. This can not be resolved until the impulse response is time-shifted by a fraction of a sample-interval, to position the direct response exactly on a single index. It's funny that Farina did not mention this issue in his texts on the ESS method.

Katja

p.s. I'll post [expochirp~] and patches soon, whenever I find opportunity to compile on different systems.

bassik

@katjav said:

Hi Bassik,

These comparisons look quite promising! And this is done with the old wiggly chirp. The low frequency ripples over which I'm cracking my brains hardly play a role because the system's response starts at around 100 Hz. I'm curious, what is actually the system under test?

Katja

Hello Katja,

the system under test is a big exhibition hall in london (http://www.vam.ac.uk/collections/fashion/galleries/40/index.html) measured using a genelec 8030A monitor and an omni directional mic only.

Keeping it simple to understand teh results quicker.

Regarding your exp chirp computation, I am afraid this goes above my current knowledge so I need to study a bit before giving you an answer or even a comment.

But we should bear in mind that exp Sweep sine technique is done for measuring acoustic spaces (hence the storing of non linearities of the reproduction and recording system in the anti casual part of the IR) and to obtain high quality IR for convolution and auralisation use.

In this case I don't really get the point why you are researching the exact sample value of the peak value.
In my experience, analysis of simple IR don't give substantial different results if the time zero of the IR is not exactly on the peak but probably I am missing something here expecially regarding the signal processing.

base of the exponentianl or logaritmic sweep is that the instantaneous frequency vary exponentially with time (see http://www.acoustics.net/objects/pdf/article_farina02.pdf)
Hope this can be of help.

Also Farina described a method for fast convolution for ambiophonics that can maybe be of inspiration for the deconvolution engine (see http://www.acoustics.net/objects/pdf/article_farina04.pdf)

Regarding STI, it is a measure of the speech intelligibility in a given space and takes into account many factors of which th emost important are the S/N ratio and the frequency masking.

Voice is a very complex mathematical beast as it has a characteristics frequency respone that vary with spl and also a frequency modulation mechanism that we use to articulate words. this is obvioulsy different for different languags but has been standardised in a certain way.

I will send to you the ISO standard that clearly explains th emethod of calculation impplemented in any commercial software in teh next days.

Also I would like to discuss possible development after the chirp issues are resolved.

I believe that it can be useful to implent the following in the patch:

Sound editor to delete the anti casual part of the IR and move the arrival time of the peak to zero
Room acoustics parameters calculation (I will send you the relative ISO and I will write more info later)
Multichannel IR recording and processing including an Ambisonics module; IEM in Gratz (http://iem.at/Members/noisternig/bin_ambi) has done a great work on this. Our implementation would be much simpler and would include A-format to B-Format encoding and additional analysis.

Hope everything is well in life in the meantime

speak to you soon

Bassik

katjav

ExpoChirpToolbox.

This will be the name of an IR measurement toolchain, based on the [expochirp~] object. The first two patches are built now:

[expochirp-generator], generates an exponential chirp of excelllent quality, together with inverse chirp. Deconvolution-test and inspection of the chirp's spectral characteristics within the patch. Chirp & inverse chirp can be stored with metadata in a textfile.
[IRrecorder], loads a chirpfile as created with [expochirp-generator], applies the chirp to the system under test. Deconvolution, and inspection of spectral characteristics of the impulse response. The raw IR can be stored as 32 bit floating point .wav file.

Next component on my program will be [IReditor], where you can load a raw IR as produced with [IRrecorder] and edit it: trim the useful portion, and eventually window the tail if necessary. The editor should apply the IR as a filter in a soundfile player, so you can check the resulting sound. When these patches work satisfactory, we can think of additional tools: [IRinverter] to produce correction filters, [IRanalyser] for extracting acoustics parameters, [IRambisonic] for multichannel IR...

Attached is ExpoChirpToolbox.zip containing [expochirp~] binaries for OSX and Windows (I'll do Linux later) plus patches [expochirp-generator] and [IRrecorder]. Please post comments or suggestions.

edit: can't attach a file at the moment, download from here;

http://www.katjaas.nl/temp/ExpoChirpToolbox.zip

Katja

bassik

Thank you very much for this Katjav.

Much appreciated.

Hope that my post yesterday have been of some help.
I was just reading it today and it contains some incongruence.

Anyway, I will start some testing on the patch asap and let you know.

I will also prepare some more sketches for B-format encoding and more info for the analyzer.

Cheers

Bassik