Upsampling audio signals using automatable sampling frequency?

elden

Hey guys,

I want to build a device that's able to independantly pitch and formant shift an incomming signal live - like the z-plane-algorithm that's used in their
plug-in "Elastique Pitch" or in Melodyne as well.
Therefor I need to upsample the signal by automation.
I know there're objects like [block~] or [switch~] , but as far as I know they are not up- or downsampling in a kind of "portamento" way, right?

Any ideas?

regards

katjav

Hey Elden,

[block~] and [switch~] can not do fractional upsampling, it is always a power of two so 'portamento upsampling' is definitely impossible with the regular resampling objects. However you could write blocks of audio in an upsampled patch and read them at fractional speed with [tabread4~]. (Edit: forgot to mention filtering of the upsampled signal).

Z-plane and Melodyne, what details do you know of their techniques, it is all closed source? Please share your knowledge. It would be great to have that in Pd. There's always the option to write an external if it can't be done with regular objects.

Katja

sunji

Maybe if you upsample with [switch~] and then downsample portamento with [samphold~]. I'm not familiar with the hardware you describe, YMMV.

katjav

Even though there's no practical info on the technique in Melodyne, I found this video portrait of it's inventor Peter Neubäcker very inspiring:

Peter Neubäcker was originally a musical instrument maker and astrologist. Seems he developed his software without formal engineering education. His profound interest in harmony was decisive for creating one of the most innovative music tools in recent time. That's nice eh?

Katja

ShawnPD

Wow inspirational!

elden

Hey again,

sorry for the delay.
As I'm absolutely convinced, Melodyne and the Z-Plane Elastique plug-in are using the "Lent's" or "Brian Charles Gibson method" for changing the timbre and/or pitch of audio signals and a certain method of timestretching that separates sinusoids, noise and transients before stretching or shrinking and afterwards mixes all back together leaving the transients intact.

How the Direct Note Access works...I can only guess, but I think it's about bandpass-filtering integer multiples of the deepest and most present monophonic fundamental tone, where the filter frequency follows the pitch of that tone. the resulting monophonic signal is then separately saved to disc and added phase inverted back to the original signal so that it's energy cancels out of the original signal resulting in a residual without that one monophonic tone. From that residual the same process of finding and filtering the deepest and loudest fundamental and its integer overtones is happening repeatedly until, for instants, no more loud enough tones are inside the residual signal. the rest of the original signal should then be fairly noisy and could just be sliced as far as it has transients and made available for rhythmical editing...
As far as I know, a lot of people thought of neural network pattern tracking, but i think that melodyne's DNA is not really hard to understand. The power of melodyne is its pitch-formant-time handling that sounds incredible.

Jwif

Thanks for the link Katjav, great video!

mod

yeah, wonderful! cheers

katjav

@elden said:

I'm absolutely convinced, Melodyne and the Z-Plane Elastique plug-in are using the "Lent's" or "Brian Charles Gibson method"

There happens to be an implementation of Lent's method in the open source STK library, C++ class LentPitShift:

https://ccrma.stanford.edu/software/stk/classstk_1_1LentPitShift.html

And here is the project description:

http://www.music.mcgill.ca/~francois/MUMT_618/Report/Report.pdf

Lent's method is a time domain pitch-synchronous approach based on pitch tracking by lo-pass filtering and finding zero-crossings. The STK implementation is slightly more advanced, it's pitch tracker is based on autocorrelation. It is stated that Lent's method preserves the formants, but I fail to understand how it can do so.

Before trying to build this STK class into Pd I would like to hear examples, or read user reviews and comparisons with other pitch shift libs (notably Soundtouch which is popular in open source softs and already ported to Pd). Does anyone have a clue?

Katja

katjav

While looking for more info via Wikipedia I stumbled upon this article by Henning Thielemann:

http://arxiv.org/abs/0911.5171

This article is relatively recent (2010) and it describes how an audio stream can be interpreted as a helix, and how phase and time can be isolated by interpolation, and rearranged in order to stretch, pitch shift and more.

I am exited about it because the approach is completely different from the older, well described techniques like TDHS, SOLA, PSOLA and phase vocoder. It rather seems to reveal details of the new techniques which are still proprietary and secret. If you have seen the Neubäcker video portrait you may recall the loo roll on which he had drawn the signal samples as a helix, the concept which was the basis for Melodyne. It is this concept which is mathematically described by Thielemann.

To bring this concept to Pd will be a substantial undertaking. There's experimental sample code written in Haskell available, but I found no pointers to ready-to-use C or C++ libraries.

So where to start? A practical sinc interpolation method for fractional resampling is described in detail here:

https://ccrma.stanford.edu/~jos/resample/

There's also a few resampling C libs based on this interpolation method. By coincidence, I was already studying the fractional resampling topic last week, to see if this could be done in real time: taking an input stream and write it upsampled or downsampled into an array, with arbitrary sample rate not restricted to a power of two. Fractional speed reading is easy, but as it turn out, fractional speed writing is not. I feel that this stuff must be deeply understood before one could move on to the helix approach of signal manipulation.

Katja

elden

the helix approach is nothing really new. Every sine wave in a spectrum is oscillating in a different frequency what causes clicks and pops when timestretching in a granular way, because no matter where you cut a signal, there will always be frequencies that's phases are not crossing zero at that position.
therefor you need to find the right points in the signal where to loop parts that are the smallest pieces possible without changing fundamental frequency or timbre. the necessary loop points are derived from the helix-calculation.
the big thing on the helix approach is that if you change the starting point of the loop, you change the end point, too, so that the signal is always liquidly played back in loop. if you want to time stretch or shrink a signal you just need to move the starting point of the loop slower or faster to the end of the sound - you could also move it back and forth or jump to different places in time (at the right phase angle) - it will always play back liquidly without pops or clicks.
And if you increase the speed of playing back the loop, you get pitch shifting effects - of course with chipmunk effect, what is the reason why that approach is not working for polyphonic material or shifting up or down to extreme high or deep tones.

The Melodyne DNA is a great tool to solve the polyphony problem to this! Because of it's separation of polyphonic into monophonic material, the Helix-Looping is not anymore influencing the right tonality of the signal. I think the Helix-Looping is involved in Melodyne DNA, too, but not for time or pitch control, but for making a tone audible when clicking on the note object.

elden

the gibson method (which is better than Lent's because it not only preserves timbre when pitchshifting, but makes timbre controllable separately from pitch) :
http://www.google.de/patents?id=79oZAAAAEBAJ&printsec=frontcover&dq=brian+gibson&hl=en&sa=X&ei=-lAQT7TrDM-ZOonJ4JwD&sqi=2&ved=0CDkQ6AEwAw

katjav

@elden said:

the helix approach is nothing really new

You're right. It is only new for me. Are there more publications on the helix approach? I want to read everything about it.

I've also read through the Brian Gibson patent application. It is so detailed that one could almost code every paragraph directly into a C function. Still I fail to understand the essence of Lent's method which is part of Gibson's invention.

How could one change a pitch but preserve formants and timbre with a time domain method? I would think that spectral envelope is a contour of all frequency components, which is determined by formants for sinusoidal and stochastic components and by the source harmonic recipe for the sinusoidal components. The weight of the harmonics is reflected in every period so if you change the period length by resampling, the contour changes together with it, no? And, since pitch is determined by the fundamental period length, I see no other way for pitch shifting in time domain than resampling the signal stream and recombine fragments in one way or another. I just have a frustrating blind spot here.

Katja

katjav

@elden said:

The Melodyne DNA is a great tool to solve the polyphony problem to this! Because of it's separation of polyphonic into monophonic material, the Helix-Looping is not anymore influencing the right tonality of the signal.

Yes it is stated by Peter Neubäcker that Direct Note Access requires a different approach than the helix view. But I am not in the illusion to reproduce the technique of Melodyne DNA, fruit of many years research and development, in a Pd patch overnight. I would be happy with any progress on the pitch shifting topic in Pd. The status quo is that we have this available:

G09.pitchshift.pd, most basic time domain approach featuring heavy amplitude modulations (help browser patch)
I07.phase.vocoder.pd, basic frequency domain approach suffering from phase randomisation which is tradable for other artifacts by phase locking
bsaylor/pvoc~ class, same principle as I07.phase.vocoder.pd (Pd-extended)
the efforts of Alexandre Torres Porres to create a realtime phase vocoder based on the same principles (pd-list)
soundtouch~ class, Pd-port of a decent SOLA implementation for monophonic periodic signals, but suffering from considerable latency and from clicks at pitch factor change (external not included in Pd-extended)

Both the Brian Wilson method and the monophonic helix approach seem interesting to me, as they could overcome some of the constraints in existing options for pitch shifting in Pd. Admittedly, I have no clue yet how these methods work, have to study a bit more on the texts. I seriously doubt that it can be done using Pd as the language, rather expect it must be done for Pd, using C as the language. But a lot of subprocesses can be prototyped and tested using Pd as the language. For example I'm now doing a patch to calculate a sinc interpolation kernel for resampling, and testing the frequency response for different cut off frequencies. Will post this soon as it's a useful kernel generator in practice.

Katja

katjav

One more relevant document for this topic: Harold Hildebrand's patent application for what has become Autotune. The claim is made that it's autocorrelation method for pitch tracking is very robust. Downsampling of the signal for analysis is a key element.

http://www.google.com/patents/US5973252?printsec=description&dq=5,973,252#v=onepage&q=5%2C973%2C252&f=false

Katja

Maelstorm

Why did I not know about Google Patents? Thanks, katjav!

nau

what an exciting topic ! Thank you Katjav and al. !

katjav

@Maelstorm said:

Why did I not know about Google Patents? Thanks, katjav!

Elden came up with the first google patents link in this thread.

Found another interesting one. Peter Neubäcker's patent application for Melodyne DNA:

http://www.google.com/patents/US8022286?printsec=description#v=onepage&q&f=false

The description shows that there is indeed no special trick, like Peter Neubäcker already mentioned in the video portrait. It is just a very detailed analysis, mainly of frequency domain data, where frequencies are found by the phase-delta method using 4 times or more FFT overlap and Hann windowing. For anyone who has ever done basic frequency domain pitch shifting plus pitch detection and attack detection from scratch, all the elements of DNA technique will be very familiar. But I can imagine how it takes months or years of patient coding to get all the details right. Understanding a concept is one thing, building a robust implementation is something completely different.

Anyhow, a DNA approach is really suitable for studio work but it could not operate on live input. In Pd, we need a low-latency and preferably efficient pitch shifter which doesn't produce artifacts like amplitude modulation, phase randomization, or false pitch detection. For vocals, one would like to control pitch and formants independently. How to do this in Pd was the initial question of this forum thread. So let's go back to that question.

From the info's so far it seems that a time domain pitch synchronous method is most appropriate for this purpose, the Brian Gibson method essentially. This method assumes a monophonic periodic signal. What elements are needed to build this in Pd?

To start with, an accurate and robust periodicity tracker. This could be loosely modeled on Hildebrand's concept, where the signal stream is downsampled to limit the analysis frequency range and improve efficiency. Alternatively, autocorrelation could be done in frequency domain as a multiplication, where it would also be easy to restrict the frequency range. Prototyping and test for this can be done as Pd patches.

Next, a method for fractional resampling. Using [tabread4~], it is easy to interpolate between samples, and it's even better than the minimum requirement as proposed by Gibson (i.e. linear interpolation). The interpolation quality of [tabread4~] is good enough for moderate resampling ratio's like +/- 1 octave. But the problem is, both the input and output of [tabread4~] operate at the current samplerate. Say if you have read 48 values at increased speed from a 64 pt block and you're done reading, there is no easy way to store these values at index 0 - 47 in a resampled array, and continue writing at index 48 in the next signal block. There may be some trick employing [phasor~] objects to address the correct write indexes. Alternatively, a specialized Pd class could be written for this.

These things are only the very first steps towards a pitch shifter implementation. But they're interesting experiments anyway, and such subroutines can be employed for other purposes as well, if they succeed.

Katja

elden

Cool, I was right regarding Melodyne DNA! Just finding out integer multiples of the fundamental frequency and filtering them out from the input signal note by note! Wow - that's scary! :D
Of course they implemented it in a very virtuous way. BTW there's stated in the Melodyne DNA patent, that Neubäcker uses the Gibson method!

Well, to come back to the shifting problem:

What if you would pitch shift a signal just the usual granular way and detect the frequency of that pitched signal to control a sine wave LFO that crossfades between the pitched and the original signal? Would that eventually conserve the formants partially when pitching without any upsampling?

I mean, Gibson also said something about combining portions of the original signal with the pitched signal.
The interesting point is, what attributes of a signal are essential for the perception of pitch, right?
When crossfading in an oscillating way between these two signals, you would need to find out at what position inside a single wave cycle the timbre information is oscillating (not using FFT).
when you start the oscillating crossfading at an offset that is exactly fitting at the right position(s) inside the wave cylce, you could possibly get a shifted signal with attributes of the timbre of the original signal, or what do you guys think?

Maelstorm

@katjav said:

@Maelstorm said:

Why did I not know about Google Patents? Thanks, katjav!

Elden came up with the first google patents link in this thread.

Oh...uh...right. That's...um...that's what I meant...

Thanks, elden!

katjav

@elden said:

Cool, I was right regarding Melodyne DNA! Just finding out integer multiples of the fundamental frequency and filtering them out from the input signal note by note!

Yeah you were completely right with your educated guess about the method of Melodyne DNA. So, I have good faith that we'll figure out some method to implement in Pd as well, even if it can not be so advanced as Melodyne.

I am at the moment writing a fractional resampling class in C, as I feel it is too cumbersome or even impossible to do it as an abstraction using existing objects.

How to keep the formants when at the same time changing the pitch in time domain, that is still the question. So many authors and developers claim that it can be done in time domain, I do not despair or disbelieve. It is just that I do not see the light of day yet.

The hardest part is a robust pitch detection mechanism, they say. See for example Stephen Bernsee, section 4.2:

http://www.dspdimension.com/admin/time-pitch-overview/

In my straight-edge view, pitch is defined by the length of periodicity in seconds, and that length is inversely proportional to the number of cycles per second, so there would be no way to fool around with the ratio's of perceived pitch, length and number of cycles per seconds.

We need to dissect Gibson's patent to understand how it is actually done. If you look at Gibson's drawing, you see block 192 [resampled pitch shifter]. Resampled data goes (Hann-windowed) into block 200 [format preserving pitch shifter]. As it seems, windowed periods are replicated at a rate according to the desired perceived pitch. At this point I loose track. If you don't exactly copy the right amount of periods per second, you'll get ludicrous amplitude modulations, no? Or... is this exactly the trick? Do these amplitude modulations achieve the desired effect? After all, any sum of harmonics effectuate amplitude modulations, it is not by definition bad. Hmmm.

How embarrassing: some people invent, and others wreck their brains on what is already invented. And think of this: these patents were never issued to help us building a pitch shifter, but rather to prevent us from doing so.

Katja