I'm currently trying to figure out how to get an FFT 'freeze' effect to work in pure data - that is, to take and hold a snapshot of a particular moment's harmonic/spectral content. A Max version of what I'm trying to achieve is demonstrated nicely here:

Although I have a reasonable understanding of how FFT works, I don't have any experience whatsoever of the FFT objects in pure data, and at the moment I'm a little confused by some aspects of how they work. Consequently, I'm having some trouble getting this to work. One useful thing I have found is this post on the (very nice!) rumblesan blog: http://www.rumblesan.com/?p=223 . It describes and contains a patch to do exactly what I have in mind. However, the sound of the frozen version of the sound seems quite distorted, and not very faithful to the original sound - particularly when compared to the max patch shown in the youtube video above.

Does anyone know how I could improve the sound of this patch? Would fiddling with parameters like block size and overlap help?

Thanks in advance!

J

I'm currently trying to figure out how to get an FFT 'freeze' effect to work in pure data - that is, to take and hold a snapshot of a particular moment's harmonic/spectral content. A Max version of what I'm trying to achieve is demonstrated nicely here:

Although I have a reasonable understanding of how FFT works, I don't have any experience whatsoever of the FFT objects in pure data, and at the moment I'm a little confused by some aspects of how they work. Consequently, I'm having some trouble getting this to work. One useful thing I have found is this post on the (very nice!) rumblesan blog: http://www.rumblesan.com/?p=223 . It describes and contains a patch to do exactly what I have in mind. However, the sound of the frozen version of the sound seems quite distorted, and not very faithful to the original sound - particularly when compared to the max patch shown in the youtube video above.

Does anyone know how I could improve the sound of this patch? Would fiddling with parameters like block size and overlap help?

Thanks in advance!

J

this topic does interest me a lot too.

Would you post your patch ? It will be easier to understand it

Nau

]]>the pvoc~ object is also nice for time stretching/freezing. if you use some circular buffers you can do nice time stretching of realtime input. i'll post a patch when i find the time to.

i've been trying around a lot with this kind of stuff recently, but i haven't found a way to get rid of the weird phasing/oscillations when time stretching is applied to spectral output. i guess it has to do with phase locking and channel interferences, but the maths are way over my head. i wish there were some more spectral objects that handle that kind of stuff better. the phasing sounds cool when you just freeze input though.

]]>yeah, that patch sounds much better. I can definitely work with this. Will have to have a look into it and see if I can follow exactly how it works, as I could only half-follow maelstorm's description. I think I get why running phase is important, though. Thanks both for your help!

Another, only partly-related thing: can anyone explain the difference in practice between fft~ and rfft~? I understand that one is complex and the other is real (and quicker), but what information does the imaginary part of the data actually represent? When is it is reasonable to discard it? Thanks!

]]>With real signals, if you do a spectral analysis from 0 Hz all the way up to the sampling rate, you'll find that the spectrum between Nyquist and the sampling rate is a mirror image of the spectrum between 0 Hz and Nyquist. (When you think about it, it makes sense; a frequency above Nyquist will alias below to another frequency, so to a digital signal both those frequencies are really the same thing.) Since they are just mirror images, it is computationally pointless to calculate the second half of the spectrum after calculating the first half, because we already know it. So [rfft~] doesn't compute the frequencies above Nyquist, and [rifft~] assumes the spectrum should be mirrored and so doesn't calculate an imaginary part for the synthesized signal. In other words, they're more efficient.

Complex signals, on the other hand, may not have a mirrored spectrum. In that case, you need to use [fft~]/[ifft~] to calculate the full spectrum. Music signals are typically real, though, so for most situations it is best to use [rfft~].

Also, just to be clear, the real or complex signals I'm talking about here are in the time domain. In the frequency domain they will always be complex.

]]>www.richarddudas.com/pdfs/dudas_lippe_pvoc2006.pdf

www.richarddudas.com/pdfs/dudas_lippe_pvocII2007.pdf

i was able to put this cartesian version of the freeze patch together. because it works without the polar conversion it should be more efficient than the other version (not that it matters for a tiny patch like this, but anyway).

i also added puckette style phase locking, and a purity parameter, that can filter out the weak frequencies.

enjoy!

Another thing I am interested in is pitch-shifting these drones, to play them at different frequencies. Given that the signals are already in the frequency domain, is this a relatively easy procedure, or is it actually non-trivial?

]]>the values are just for scaling the parameter right. the amplitudes before resynthesis have a really weird scaling, and it's hardly possible to have any kind of dB representation for the parameter. so the values are a product of trail and error and tweaking. dbtorms is for logscaling of course. it spreads out the sweetspot. ]]>

i messed up the blocksize. this one should work better.

]]>If you don't mind using quite large FFT sizes (2^16 to 2^17), it may be useful to completely randomize the phase data. Paulstretch works that way, and I've found a lot of useful variations of this basic idea.

]]>My patch is still a little messy, and I think I'm still pretty naive about this frequency domain stuff. I'd like to get it cleaned up more (i.e. less incompetent and embarrassing) before sharing. I'm not actually doing the time stretch/freeze here since I was going for a real time effect (albeit with latency), but I think what I did includes everything from Paulstretch that differs from the previously described phase vocoder stuff.

I actually got there from a slightly different angle: I was looking at decorrelation and reverberation after reading some stuff by Gary S. Kendall and David Griesinger. Basically, you can improve the spatial impression and apparent source width of a signal if you spread it over a ~50 ms window (the integration time of the ear). You can convolve it with some sort of FIR filter that has allpass frequency response and random phase response, something like a short burst of white noise. With several of these, you can get multiple decorrelated channels from a single source; it's sort of an ideal mono-to-surround effect. There are some finer points here, too. You'd typically want low frequencies to stay more correlated since the wavelengths are longer. This also gives a very natural sounding bass boost when multiple channels are mixed.

Of course you can do this in the frequency domain if you just add some offset signal to the phase. The resulting output signal is smeared in time over the duration of the FFT frame, and enveloped by the window function. Conveniently, 50 ms corresponds to a frame size of 2048 at 44.1 kHz. The advantage of the frequency domain approach here is that the phase offset can be arbitrarily varied over time. You can get a time variant phase offset signal with a delay/wrap and some small amount of added noise: not "running phase" as in the phase vocoder but "running phase offset". It's also sensible here to scale the amount of added noise with frequency.

Say that you add a maximum amount of noise to the running phase offset- now the delay/wrap part is irrelevant and the phase is completely randomized for each frame. This is what Paulstretch does (though it just throws out the original phase data and replaces it with noise). This completely destroys the sub-bin frequency resolution, so small FFT sizes will sound "whispery". You need a quite large FFT of 2^16 or 2^17 for adequate "brute force" frequency resolution.

You can add some feedback here for a reverberation effect. You'll want to fully randomize everything here, and apply some filtering to the feedback path. The frequency resolution corresponds to the reverb's modal density, so again it's advantageous to use quite large FFTs. Nonlinearities and pitch shift can be nice here as well, for non-linear decays and other interesting effects, but this is going into a different topic entirely.

With such large FFTs you will notice a quite long Hann window shaped "attack" (again 2^16 or 2^17 represents a "sweet spot" since the time domain smearing is way too long above that). I find the Hann window is best here since it's both constant voltage and constant power for an overlap factor of 4. So the output signal level shouldn't fluctuate, regardless of how much successive frames are correlated or decorrelated (I'm not really 100% confident of my assessment here...). But the long attack isn't exactly natural sounding. I've been looking for an asymmetric window shape that has a shorter attack and more natural sounding "envelope", while maintaining the constant power/voltage constraint (with overlap factors of 8 or more). I've tried various types of flattened windows (these do have a shorter attack), but I'd prefer to use something with at least a loose resemblance to an exponential decay. But I may be going off into the Twilight Zone here...

Anyway I have a theory that much of what people do to make a sound "larger", i.e. an ensemble of instruments in a concert hall, multitracking, chorus, reverb, etc. can be generalized as a time variant decorrelation effect. And if an idealized sort of effect can be made that's based on the way sound is actually perceived, maybe it's possible to make an algorithm that does this (or some variant) optimally.

]]>Anyway, here's a page where the algorithm is described (and demonstrated): http://zynaddsubfx.sourceforge.net/doc/PADsynth/PADsynth.htm

By the way, zynaddsubfx (and this algorithm) are by the same guy who built Paulstretch. He seems to have a pretty solid understanding of how to make computers sound good.

]]>Acreil - I'll admit that some of that was a little over my head, but some aspects of it sound a little like what Zynaddsubfx (a softsynth) does in its 'padsynth' algorithm? Basically it takes a simple waveform and spreads/"smears" it, in a gaussian distribution, over a range of frequencies, with some slightly complex-looking frequency domain mathematics. As you suggest, it's quite similar to a chorus effect, really...

I'll have to admit it's partly over my head too. I don't really know that much about frequency domain stuff. I just read some papers, made some connections between them, and dicked around with example I09. But I guess I hit on something a little more unique than I expected. I'll upload the patch if I get it cleaned up a little more. It should be more efficient too...

I think as far as Padsynth goes, you can imagine playing a sound into a reverberator with an infinite decay time, then sampling and looping the output. Only it leaves out the reverberator (and the coloration, etc. that it can add) and produces the result (randomized phase) directly. The output is inherently periodic, so it loops perfectly with no additional effort. I think Image Line's Ogun uses the Padsynth algorithm, and that NOTAM Mammut program can do much the same thing (I think it actually illustrates the effect really nicely). Padsynth does smear out the frequency components a little (I guess windowing sorta does that for STFT...), but the phase randomization is the important part if you're processing arbitrary audio input.

Here are some samples of my patch (I probably should have posted them before):

http://www.mediafire.com/?6nf525vnv1xew58 (wet and dry mix)

original material:

http://www.mediafire.com/?c62tr5ox07r4tdc (wet only, you may be able to guess the original)

Both use time variant decorrelation, feedback (reverb), pitch shift, nonlinear filtering, and also random filtering. But these don't demonstrate some of the more extreme effects since it favors very, very slow and sparse source material. I didn't use any other effects, just the one patch.

You can pitch shift the FFT data with [vd~], just like in time domain, only it works the opposite way, i.e. stretching the spectrum shifts the pitch up and compressing it shifts it down. [rifft~] ignores the second (redundant) half of the frame, but if you're doing extreme pitch shifting you have to take care not to spill over into the next frame. Or you can just write the IFFT's output to a table and transpose it like you would any sampled data (as Padsynth does). I'm no expert in this department either; I just messed around until it worked. And I'm not entirely sure what you're going for, either.

]]>https://github.com/rumblesan/Patch-A-Day-month-2010-11

And specifically FFT Freeze:

https://github.com/rumblesan/Patch-A-Day-month-2010-11/tree/master/27-FFTPitchShiftandFreeze

I made a little abstraction based on the first freeze patch ralph posted which I love for its simplicity.

What I realy needed was something that produces warm textures, so I naïvely implemented Paul nasca's idea of randomizing the phase. I didn't think it would work but it does (with large block) and it sounds realy good!

I have a question but first I have to explain what I tried to do:

To continue with this very naïve version of the PADsynth, I wanted to record a sample out of the output that would be naturally looped, but I didn't manage to do so, this is from Paul Nasca's description of his algorithm:

"This algorithm generates some large wavetables that can played at diferent speeds to get the desired sound. This algorithm describes only how these wavetables are generated. The result is a perfectly looped wavetable".

What I did was to put a [once] under a [bang~] and write the sample in a table the length of the block~. From my limited understanding of fourier transform I thougth that a hole window would produce a periodic signal.

What should I do to produce a looping soundfile? Is there a problem with the overlapping perhaps?

cheers

Allister

]]>Just thought I’d share a video of this [freeze~] patch I made based on what I’ve found in this thread: thanks everyone!

Still a lot to learn, but fits my needs …

]]>