Optimizing pd performances to fit an RPI 3

lysergik

Hi @whale-av thanks to give your outlook on this. My performance on groovebox 2 is similar to yours, a bit more CPU use but my CPU is a bit slower so nothing strange. My "fork" of this patch runs with similar specs. Then the error message I got is happening both to my rpi and laptop, so it would explain why I got messed up performances with pd~. I'll try to see if upgradung to new kernel in both OS helps or to find other way to get rid of this. If this solve my issue with pd~ it would be nice, I'll give you guys an update as soon as I have looked into that.

whale-av

@lysergik I ran a few tests, although they are a bit meaningless for your PI as I have an I7 2.3GHz.
And so I deleted a bit of my last post.
I suppose the GUI using 5% of an 8 core processor would translate to 60% or more on your Broadcom chip.
The Pd process at around 3% ..... about 40% maybe.
Simply minimising the GUI window removes nearly all of the GUI cpu usage as @Eeight suggests.
If you can effectively split the audio equally over 2 cores with [pd~] you might be able to get down to 20-25% per core on the RPI.
Is it possible in Linux to force the GUI...... the tcl/tk app..... to run on a set core..... core 1 GUI / core3 Pd / core4 [pd~] for example.
The wish app and Pd communicate through ports anyway, so that will be possible if there is a way to permanently assign the executables to the cores.
David

Guest

Running groovebox2 on my T430 Core i5 i get roughly 35 % of CPU consumed.

I profiled and i get following results (at first look normals):

standard.txt
nochildren.txt

lysergik

Hi guys, so I manage to get rid of the "output snd_pcm_delay failed" error by running pd with Jack. It doesn't do anything for the CPU, still got my four cores running at 50%. I did some profiling on groovebox 2 and my first my frist version of the "fork" I was working on and I notcied that I didn't get any vfprintf calls, so the problem doesn't come from neither martin Birkmann patch or my first tweaks on it, but only from the implementation of pd~. Since the vfprintf isn't due to the alsa error message, it must be something else. It's not the synth of the patch or the mixer and fx that works well somewhere else. Then the only thing that are remains in this audio processing patch is the following : an object that permits me to get all the value that come from my GUI patch via netsend/netreceive and another object which take the value received via netsend to then them with a message to the pd~ object.

My netsend/netreceive object seems to have worked fine until now I didn't notice any CPU issues when using it. The other object maybe the source of the problem, thoough it's strange because the value that I send from the GUI are sent properly to the audio processing patch. So I'don't know what to do. Should I pack an archive of my code and put it here so you could have a better idea of whats going on ?

mnb

groovebox2 (the groovebox2b example) has a load of about 23 percent on my 3.2 ghz haswell i5 using integrated gfx. depending on the fx selected. i can load 6 instances of this patch before i get audio-dropouts (at about 95 percent cpu-load)

while there is probably not much that can be done regarding the gui (it is updated at 1/16 step rate), it should be possible to save cpu by replacing the bandlimited oscillators with simpler ones, and using less, or different fx.
(the xy-one uses quite a lot cpu) and maybe using a simpler reverb. and of course using multithreading/pd~
which would require a bit of work though...

yealace

@Nicolas-Danet hello, im a mostly non-programmer but I'm trying to teach myself pd and pd optimization currently. Could you please explain what this means and maybe whether this method might be approprachable by a non-programmer to use to track cpu issues/prototype things? I'm very curious/interested based on your answers/this comment, thanks!

Guest

IMHO the base of optimization is to find what worths the cost to be optimized. Don't think you know it without measures. Intuitions are often wrong. In the groovebox2 example above with a lot of GUI it appears that it is negligeable in front of the DSP. You can benchmark objects with [cputime] [realtime] reports to evaluate the cost. But it doesn't matter to have a slow module if it is not used a lot. On the other hand, small gain on an object that is highly polled can save much more CPU.

What makes things more complex In Pure Data is that the DSP and the control commands are interleaved and computed in the same thread. That means that each time a process is too long to achieved (e.g. network, I/O file, anything that can wait) the DSP is rendered too late and you have a click. It means that not only the average of CPU cost for each object and/or algorithm matters but also the instantaneous cost. That ruins all what i said in the first paragraph above! That's why tracking each individual small cost can save Xruns, and why optimizing is a top rated topic.

Is profiling something usable by a non-programmer? TBH i think it is not. To understand the measures (and what's going one i.e. the main loop, scheduling, clocks...) you need to look at under the hood. Even for a programmer, it is not easy. You can have wrong results sometimes when the tests are not really representatives (and thus bad assumptions). It requires a bit of experience (trial and error).

Sometimes i forget that not everybody really wants to spend all that time for nothing!

EEight

@Nicolas-Danet said:

Sometimes i forget that not everybody really wants to spend all that time for nothing!

You are not alone!