• Jona

    @whale-av thanks, everything works fine. I was just wondering if every PD instance needs its own port. obviously not if it receives udp, thanks for your explanations.

    posted in technical issues read more
  • Jona

    @whale-av thanks a lot for your explanations, i think i understand the netsend object better now. and it works with multiple pd instances. it seems that each pd instance needs its own port?
    pdtexttospeechwindows.zip
    this ist the gui instance:
    textospeechgui.PNG
    and this are two of the netreceive instances:
    textospeech1.PNG
    textospeech2.PNG

    posted in technical issues read more
  • Jona

    I recognized one problem: while the text to speech command prompt is running via [sytem], the pure data patch is "frozen". is there a way to solve it (then I can play for example a female and a male voice together)?
    I also tried the [sys_gui] method: https://forum.pdpatchrepo.info/topic/10168/is-it-possible-to-execute-an-exe-from-within-puredata/10
    But somehow I can not execute an .exe file with it (message to [sys_gui] : exec "path to .exe")...

    posted in technical issues read more
  • Jona

    it does work now on windows with this tool: https://www.elifulkerson.com/projects/commandline-text-to-speech.php
    and the [sytem] object from the motex library.
    with this python code:

    import speech_recognition as sr
    import socket
    s = socket.socket()
    host = socket.gethostname()
    port = 3000
    s.connect((host, port))
    
    while (True == True):
      r = sr.Recognizer()
      with sr.Microphone() as source:
        r.adjust_for_ambient_noise(source, duration = 1)
        print("Say something!")
        audio = r.listen(source,phrase_time_limit = 5)
     
      try:
        pdMessage = r.recognize_google(audio, language = "en-US") + " ;"
        message = r.recognize_google(audio, language = "en-US")
        s.send(pdMessage.encode('utf-8'))
        print("PD Message: " + message)
      except sr.UnknownValueError:
        print("Google could not understand audio")
      except sr.RequestError as e:
        print("Google error; {0}".format(e))
    

    and this patch: speechtoquote.pd
    here is the csv file that I used: https://github.com/akhiltak/inspirational-quotes/blob/master/Quotes.csv


    and here is a texttospeech minimal example (which does not need python):
    texttospeech.PNG

    posted in technical issues read more
  • Jona

    @Johnny-Mauser exactly that (even if it is not a real world project (yet) but discovering the possibilities of speech recognition and pure data). I can probably transfer it to Windows if you (or somebody else) can give me some more hints how that is done with Pure Data and OSX. That is what i found about speech recognition and OSX in the mailing list: https://www.mail-archive.com/pd-list@iem.at/msg45054.html

    posted in technical issues read more
  • Jona

    this is a combination of speech to text and text to speech it is mainly copied from here: https://pythonspot.com/speech-recognition-using-google-speech-api/
    it works offline with sphinx too, but then it is less reliable.

    import speech_recognition as sr
    from gtts import gTTS
    import vlc
    import socket
    s = socket.socket()
    host = socket.gethostname()
    port = 3000
    s.connect((host, port))
    
    while (True == True):
      r = sr.Recognizer()
      with sr.Microphone() as source:
        r.adjust_for_ambient_noise(source, duration=1)
        print("Say something!")
        audio = r.listen(source,phrase_time_limit=5)
     
      try:
        pdMessage = r.recognize_google(audio, language="en-US") + " ;"
        message = r.recognize_google(audio, language="en-US")
        s.send(pdMessage.encode('utf-8'))
        print("PD Message: " + message)
        tts = gTTS(text = str(message), lang="en-us")
        tts.save("message.mp3")
        p = vlc.MediaPlayer("message.mp3")
        p.play()
      except sr.UnknownValueError:
        print("Google could not understand audio")
      except sr.RequestError as e:
        print("Google error; {0}".format(e))
    

    What I would like to achieve: I have an CSV file with ~60000 quotes. The recognized word is send with [netreceive] to Pure Data and the patch chooses randomly one of the quotes where the recognized word appears. That does work.
    My question: Is it possible to send the choosen quote with [netsend] back to python and transform it there into speech?
    Somebody says a word and the computer answers with a quote where the word appears...
    Does it make sense to use Pure Data for that task or better just use Python (but I do not know how to do that only with Python yet...)?
    The Universal Sentence Encoder sounds really promising for "understanding" the meaning of a sentence and finding the most similar quote. But that is far too complicated for me to implement...
    https://www.learnopencv.com/universal-sentence-encoder/

    posted in technical issues read more
  • Jona

    @EEight that is nice. Your 10 year old approach seems almost more reliable than mine with google speech recognition now. What language or library did you use back then?

    posted in technical issues read more
  • Jona

    the initial idea of my "research" was to find the most similar subtitle of a movie compared to the spoken input and jump to the according position.
    it seems to be more complicated than I naivly thought, but still theoretically possible with machine learning technologies like word2vec or doc2vec ;) i do not think that it is possible with pure data alone (perhaps i am wrong)...
    still nice that it is possible to implement basic speech recognition in pure data...

    posted in technical issues read more
  • Jona

    i try to use speech recognition with this python script in pure data:

    import speech_recognition as sr
    import socket
    s = socket.socket()
    host = socket.gethostname()
    port = 3000
    s.connect((host, port))
    
    while (True == True):
      r = sr.Recognizer()
      with sr.Microphone() as source:
          r.adjust_for_ambient_noise(source, duration = 1)
          print("Say something!")
          audio = r.listen(source, phrase_time_limit = 5)
    
      try:
          message = r.recognize_google(audio) + " ;"
          s.send(message.encode('utf-8'))
          print("Google PD Message: " + message)
      except sr.UnknownValueError:
          print("Google Speech Recognition could not understand audio")
      except sr.RequestError as e:
          print("Could not request results from Google Speech Recognition service; {0}".format(e))
    

    this is the test patch: pythonspeech.pd
    it should change the canvas color if the patch recognizes the word red/ blue/ green/ yellow /white /black from the microphone input while the python script is running.

    I have two questions regarding the patch:

    It seems that the python script sometimes stops to work, does anybody know what is the error? I think it has something to do with the while loop, because it worked fine without, but I am not sure (and i do want the loop).

    Is there something like pyext for python 3.x and Pure Data 64bit (Windows)?

    At the moment the script uses Google Speech Recognition and needs an internet connection because of that, but it seems to be possible to use CMUSphinx offline instead (I do not know a lot about all that).

    pyspeech.PNG

    posted in technical issues read more
  • Jona

    @Balwyn your solution is much easier :)

    posted in technical issues read more
  • Jona

    @ingox thanks, thats great. it works very well in the markov patch and is a better solution :)

    posted in abstract~ read more
  • Jona

    I played a bit with the settings from the concatenative example from the timbreId library. Really fascinating, even if the result is far from perfect. I used a snippet from a speech from Bercow as the source and used a snippet from Adorno as realtime input. The voice from Adorno is synthesized with the voice from Bercow if I understand it right? Would be nice to improve the result.
    And it would be really nice if it is possible to recognize a whole word or a sentence, jump to the most similar position in a movie and play from there on. But I do not really understand how it works yet.
    Here is the modified example:
    concatenative_video_mod.zip

    posted in Off topic read more
  • Jona

    i modded the csv_file from danomatika https://github.com/danomatika/rc-patches/tree/master/extra a little so that .csv data can be used with the [text] object. it is also vanilla now besides the [binfile] object from the mrpeach library.
    i made the changes for my own needs but perhaps it could be useful for someone else.
    csv_file-mod.zip

    posted in abstract~ read more
  • Jona

    @oystersauce very nice project. it seems to be made with the timbreID library.
    i would also like to know how to do something like that.
    regarding your questions:
    1:
    the patch is mentioned in this paper: http://williambrent.conflations.com/papers/timbreID.pdf
    "4.2. Target-based Concatenative Synthesis
    Some new challenges arise in the case of comparing a con-
    stant stream of input features against a large database in
    real-time. The feature database in the vowel recognition
    example only requires about 20 instances. To obtain inter-
    esting results from target-based concatenative synthesis, the
    database must be much larger, with thousands rather than
    dozens of instances. This type of synthesis can be achieved
    using the systems mentioned in section 1, and is practiced
    live by the artist sCrAmBlEd?HaCkZ! using his own soft-
    ware design [5]. The technique is to analyze short, overlap-
    ping frames of an input signal, find the most similar sound-
    ing audio frame in a pre-analyzed corpus of unrelated audio,
    and output a stream of the best-matching frames at the same
    rate and overlap as the input."
    2:
    just discovered that pure data can read csv files with binfile from mrpeach.
    perhaps that is a way to read an external database?

    posted in Off topic read more
  • Jona

    i improved the patch. midi control change, bend and touch messages and the time between notes are now part of the information.
    i think thats all (useful) data for this patch i can get from a midifile.
    there is one cvs file preloaded in the patch, for testing you just need to bang "createMarkovMatrix" and then toggle "playMarkovChain".
    LuaMIDICSVMarkov.pd

    posted in patch~ read more
  • Jona

    here is another update. there were some bugs in the last version that are gone now. and finally i managed to also include the time between notes in milliseconds. interestingly the time information makes some midifile as a markov chain sound quite chaotic rhytmwise, depending on the quality of the midifile. to get rid of that problem it is possible to quantize the notes of the midifile before importing. but the results are sometimes surprising if it is a "good" midifile.
    i suggest not to use the standard midi soundfont. i use the fluidr3 soundfont for example, but others are good too. http://www.synthfont.com/soundfonts.html
    i think the patch has all features i wanted to have now, of course it can be optimezed here and there. and i think i have to write a small documentation because it is not very intuitive.
    (updated again)
    LuaMIDICVSMarkov.pd

    here is a version of bob marleys "war":

    posted in patch~ read more
  • Jona

    Basically this update is a new patch. Instead of midi files it needs cvs files that i prepare with https://www.gnmidi.com/ (no advertisement) for now. For loading the CVS files I use a modified CVS loader from Dan Wilcox (which is now vanilla except for [binfile] from mrpeach). This way I can get note length and rhytm information from a midi file which makes the markov chain sound much more like the original midi file. With rhhytm information I mean the time that passes until the next note starts. There is also a Lua MIDI library which could possible do what I did with Gnmidi http://www.pjb.com.au/comp/lua/MIDI.html but I did not make that work yet.
    It takes a few second after loading the CVS file, depending on the size, until the list is ready for the markov generator.
    Here is the patch and a few cvs files:
    LuaMIDICVSMarkov.pd
    Bossa Nova USA - Brubeck.csv
    Mario-Sheet-Music-Overworld-Main-Theme.csv
    a_walk_in_the_black_forest_dfk.csv
    And here two more texts about that topic:
    http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.52.721&rep=rep1&type=pdf
    http://oa.upm.es/48942/1/TFG_ALVARO_SANCHEZ_HIDALGO.pdf

    posted in patch~ read more
  • Jona

    i use [l2s] and [s2l] a lot recently. @whale-av i did not know that there is a vanilla version of [l2s]. thanks for that.
    is there perhaps a vanilla [s2l] too (i am fine with using the external, but still)?

    posted in technical issues read more
  • Jona

    Here is an update of the patch.
    It is now possible to choose the GM midi instruments that are stored in the midifile, but mostly it sounds more interesting to choose different sounds.
    And it is possible to store collections of midi data to a file.
    For now I could not store the length of the notes, because between note on and note off events of one note are often other notes in the midifile and because of the markov characteristic the note off can happen in a markov chain a long time after the note on, or even before.
    Because of that the sound would hang a lot of times.
    The only way to have the note length information from a midifile that I can think of would be to calculate the milliseconds of a note with counting the ticks from one note on event to the corresponding note off event and to save this millisecond value together with the other midi values like pitch, velocity, program change etc..
    But that seems to be quite complicated.
    I hope the patch is more or less self explainig but I will try to answer any questions if that is not the case (and perhaps write a little documentation).
    I am also happy to hear about errors.
    Here is an interesting read about markov chains and jazz improvisation:

    https://vtechworks.lib.vt.edu/bitstream/handle/10919/36831/dmfetd.pdf?sequence=1

    LuaMidiMarkovX.pd

    posted in patch~ read more
Internal error.

Oops! Looks like something went wrong!