Audio comparison externals
thank you!
So basically:
[==~] --> [expr~ $v1==$v2]
[||~] --> [expr~ $v1||$v2]
[&&~] --> [expr~ $v1&&$v2]
[<~] --> [expr~ $v1<$v2]
[>~] --> [expr~ $v1>$v2]
is this right?
Error: tabsend~: $O-hann: no such array
- To cross synthesise two voices you must ensure that two speakers make exactly the same utterances which are phonetically aligned. This is hard as I can tell you from experience of recording many voice artists. Even the same person will not speak a phrase the same way twice.
<< This is not possible in my experiment, as I am supposed to morph the actual conversation, so it is upto subjects what ever they want to speak. There is some work done by (Oytun Türk , Levent M.Arslan) who conducted experiment in passive enviornment (not at real time).
- The result is not a "timbral morph" between the two speakers. The human voice is very complex. Most likely the experiment will be invalidated by distracting artifacts.
Here's some suggestions.
- Don't "morph" the voices, simply crossfade/mix them.
<< yes I also want to do this, crossfading and mix, as I just want to create illusion, so that listner start thinking whether it is B's voice or A's voice>>
- For repeatable results (essential to an experiment) a real-time solution is probably no good. Real time processing is very sensitive to initial conditions. I would prepare all the material beforehand and carefully screen it to make sure each set of subjects hears exactly the same signals.
<<Yes, I agree, but it is demand of experiment, I can not control the environment, but to create some good illusion (or to distract listner, I may add noise in the siganl at real time, it would challange the listner brain in identification, so I may use such kind of tricks for the sucess factor)
- If you want a hybrid voice (somewhere between A and
then vocoding is not the best way. There are many tools that would be better than Puredata which are open source and possible to integrate into a wider system.
<<Actually now I have a little familarity with pure data, so it is more good for me to stick with it for a while (due to short time), yes if i continue my phD in this domain, I would explore other tools as well. for the time being it is a kind of pilot study>>
Question :
Is VoCoder alone is sufficient for morph/mix/crossfade among two voices?
or I should also add the pitch shifting module with VoCoder to get some more qualitative results.
Question 2:
I already tried this VoCoder example, but could not change it according to my requirements. In my requirements, I have a target voice (the target voice is phonetically rich) and now source speaker is speaking (what ever he want to speak) and source voice is changing into target voice. (illusion/crossfade/mix)
The (changetimbre1.pd ) file that I attached first, give you an idea of what kind of operational interface I am looking for.
Question 3:
what should be the ideal length of target wave file?
Before the start of experiment, I would collect the voice sample of all participants.
I am highly oblige for your earlier help and looking for more (greedy). Meanwhile I would once again study this vocoder example to change it according to my requiement. ( though i doubt I may change it).
Thanks.
Error: tabsend~: $O-hann: no such array
Aha! This is a very interesting subject. Please forgive my presumptions.
The full FFT vocoder patch is attached. To use it you must experiment with loading different files and cross synthesising them. It is from the help files and should work as is.
Some problems will present themselves very quickly...
-
To cross synthesise two voices you must ensure that two speakers make exactly the same
utterances which are phonetically aligned. This is hard as I can tell you from experience of recording many voice artists. Even the same person will not speak a phrase the same way twice. -
The result is not a "timbral morph" between the two speakers. The human voice is very complex. Most likely the experiment will be invalidated by distracting artifacts.
Here's some suggestions.
-
Don't "morph" the voices, simply crossfade/mix them.
-
For repeatable results (essential to an experiment) a real-time solution is probably no good. Real time processing is very sensitive to initial conditions. I would prepare all the material beforehand and carefully screen it to make sure each set of subjects hears exactly the same signals.
-
If you want a hybrid voice (somewhere between A and
then vocoding is not the best way. There are many tools that would be better than Puredata which are open source and possible to integrate into a wider system.
Csound has a LPC suite. Linear predictive coding is particularly well suited to speech.
ii) Tapestrea is a wonderful tool that uses wavelet analysis. It also has a graphical front end that makes alignment of phonemes easier.
iii) Praat (Boersma and Weenink - Amsterdam Institute of Phonetic Sciences) is a great voice synthesis system based on articulatory tract models, where you can morph speaker models. You may find that a purely synthetic method yields data more suitable for this experiment.
> It is really hard (impossible or even sin) to convience TEACHERS.
Did you mean even cos, sin is an odd function, write it out 50 times!
A simple problem regarding PD patch initialization
I really don't why it is like this.
Anyway, this have been quite a nice surprise to use the expr instead of anything else. Because when an expr block is banged it seems that is "reading" (?) the number2 objects which are feeding the expr inlet but without even touching the number2 objects once you open a patch.
I know use expr object with small bang object beside it. Every bang of this type receive a global loadbang. And it work ... while keeping the number inlet opened. ...
Ok, back to work I am just trying to catch up something with fractional delays and block convolution which drive may crazy because it consume time!!!