Voice morphing/conversion. PD & voice conversion

ahmad

I want to convert person A voice into person B's voice. so can I do this using PD.
and if yes, what objects of pd would be used.

eg consider I want to change the pitch of voice into another pitch (male voice into female voice) , what objects of pd I should use.

As I am a new to pd, so I would highly appreciate the detailed answer.

Thanks.

nestor

It would have to be a very detailed answer... Try looking in the documentation under

3.Audio.Examples

obiwannabe

Hi Ahmad,

This is a hard problem with a bunch of other tough computer science and DSP problems as part of it. I think it's ambitious. However, nothing like a mountain to challenge the spirit of man... so here's what I know about it.....

To change the voice of a speaker in real time you need to do a 3 stage process, analysis, resynthesis, and a magical intermediate stage of transformation in the "parametric domain".

The analysis / sysnthesis part is fairly easy. Mobile phone technology already uses applications of LPC (linear predictive coding) and phase vocoders that split up the voice into a set of filter coefficients and an excitation signals. These are recombined in the recieving handset by a resynthesis stage. So, something that few people realise, when you are listening to your friend talk on a mobile phone you are not hearing their real voice, you are hearing a resynthesised voice. The signal is split up this way because it is good to reduce bandwidth and compress the data sent, but it has another possibility.....

If you alter the filter coefficients it's possible to change the voice, even to another age or gender.
It will sound artificial unless you get the mapping exactly right. Getting this part to work is at the front of research into speaker independent speech recognition, to deal with the words as matrices in a "parameter space" rather than as simply time or frequency signals. Perry Cook and Eduardo Miranda have done some of this, but going only one way, from the physical parameters to the signal, however to make a voice changer as you describe you need to do it both ways, to be able to derive the physical parameters from the signal, alter the physical parameters, and then resynthesise the voice.

This would make a good post doctoral reseach project for team of 2-5 programmers....Just to let you know what you're getting into ! And it has no practical commercial uses other than deception, so outside an artistic context I would remain mindful of that if I were you.

A good place to start would be with the phase vocoder and experiment goofing with the analysis data to shift the formants. A better system is probably (edit: *wavelet analysis and Fourier resynthesis*) Linear Predictive Coding because that makes it easier to transform formants independently of anything else. See the Tapestrea software for sound design, which could have interesting applications on this. A dirty solution would be a form of cross synthesis with a limited dictionary of recognised transformations.

As a quick and practical solution you might find that certain VST plugins similar to Antares auto tune can be subverted to alter speech in a way that renders the speaker unrecognisable. This is used in TV documentaries for interviews where the person wants to be anonymous. Not to be alarmist, but it is actually possible to reverse this process and obtain the original voice if you know what you are doing, and speaker identification software works by analysing the mannerisms of speech not the exact signals...so to truly disguise a speaker it's best to get an actor to read their words.

In sumary: changing a voice so that it sounds like another (generic) person - quite easy,
changing speaker A into speaker B so that a human would be fooled....very difficult.

ahmad

Your detailed answer provides me some ways to start it...

I found some examples in PD that do the pitch shifting work..
. Thanks for the both of you for reply.

hardoff

a good trick for making the harmonics of a voice get fucked up is:

pitch shift up, and then pitch shift down again..or vice versa.

obiwannabe

I lost your email, sorry. Was it Ahmad or Daisy. Anyway hope your literature search is going well.

I found this and thought you might find it interesting.

http://www.ling.su.se/staff/hartmut/manipul.htm

PS, This is what my brain was thinking about while I was talking bollocks...it's LPC that's good for letting you shift the filter independently of the exciter (post above edited)

arif

Wont something like the FFT Fade made by Sunji (in this thread http://puredata.hurleur.com/sujet-1970-fourier-resynthesis-saving-morphing-between-tables) be exactly the kind of thing? Have real-time FFT analysis going into the array windows?

sunji

every chance I get, I will exult the virtues of ScrambleHackz by Sven Konig. That is the piece of software you want to emulate. It is a database relator for fft windows.

So you want to make the pope say something he's never said- collect a bunch of recordings of his speeches, break them all up into single fft windows and build an ndimensional cloud out of them. then record yourself saying whatever it is that you want the pope to say, and then fft the recording and replace every window with the most similar frame from your cloud. I think that's how he does it.

read for yourself
http://www.wired.com/entertainment/music/commentary/listeningpost/2006/04/70664

ichabod

Wow, great thread! Now I can cancel the surgery and turn myself into a castrato the easy way.

sunji

maybe instead of cancelling you should ask for a different upgrade.

http://tech.slashdot.org/article.pl?sid=08/11/17/1737209

get something installed that will run pd natively.