samplerate~ delayed update
@seb-harmonik.ar said:
@alexandros in one case you're telling pd to start DSP
In the other pd is telling you that it has indeed been started
"telling pd to start DSP" = [s pd]
But here, the message is coming out of [r pd] -- so it isn't telling pd to start DSP. (Don't get confused by the "; pd dsp 1" -- to demonstrate the issue, it is of course necessary to start DSP. But the troublesome chain begins with an [r].)
@alexandros said:
So this:
[r pd] | [route dsp] | [sel 1]
gives different results than this?
[r pd-dsp-started]
I guess this should be fixed as well? Shouldn't it?
Not necessarily.
"dsp 1" / "dsp 0," as far as I can see, only reflect the state of the DSP switch.
The pd-dsp-started "bang" is sent not only when starting DSP, but also any time that the sample rate might have changed -- for example, creating or destroying a block~ object. That is, they correctly recognized that "dsp 1" isn't enough. If you have an abstraction that depends on the sample rate, and it has a [block~] where oversampling is controlled by an inlet, then it may be necessary to reinitialize when the oversampling factor changes.
It turns out that this is documented, quite well actually! I missed it at first because [samplerate~] seems like such a basic object, why would you need to check the help?
hjh
Autogenerate onparent gui from abstraction patch
@FFW donecanvas.txt
You can use [donecanvasdialog( to change the gop properties. When you right-click on a patch and choose Properties, all of that stuff is contained in the [donecanvasdialog( message. The format is this:
[donecanvasdialog <x-units> <y-units> <gop> <x-from> <y-from> <x-to> <y-to> <x-size> <y-size> <x-margin> <y-margin>(
The <gop> argument is 0 = gop off; 1 = gop on, show arguments; and 2 = gop on, hide arguments.
You can [send] the message to the abstraction just like you would with dynamic patching. If it's an abstraction, I'd recommend using [namecanvas] so you can give it a unique $0 name. Also, you should immediately follow it with a [dirty 0( message, otherwise you'll get some annoying "do you want to save" dialogs popping up:
[donecanvasdialog <args>, dirty 0( [namecanvas $0-myabs]
|
[send $0-myabs]
VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV
AND [coords( can be more easily used than donecanvasdialog for graphs.
https://puredata.info/docs/developer/PdFileFormat
But you will still need to place the guis that you want to show through the gop window within its range.
I can't see how that could be automated...... but most things are possible in Pd one way or another.
Maybe using [find( [findagain( messages to the abstraction followed by an edit+move, but then the parent would need to know the names of the guis required.
The Python parse method might work better for that?
David.
20 mode pole mixing and morphing filter. Updated
@oid said:
What are you using to measure CPU?
I am just counting on pd's native [cputime] object as you can see in the screenshot. I also have a widget in my task bar that graphs cpu cores and memory usage, but I only use that to tell me when it's time to close some fans in my browser and whatever programs I have left open. I just realized I can mouse hover the widget and get a reading
with htop I get ~8% cpu use (6% being pd's DSP) for [lop~] and ~14% for [rpole~].
That is when running the entire morphing filter, right?
Is your computer that ancient or is the CPUs governor not kicking up to the next step with the low demands of your test patch?
Nah. Bought it last christmas- 3.5gHz / 4 cores / 16gb ram. I just have a lot of stuff going on at the same time (gimp, geogebra, pd, multiple tabs in firefox, various documents.)
I tend to test audio stuff with the governor set on performance since most any practical audio patch will kick it up to the highest settings anyways, gives a more accurate result.
I rarely run into problems even with all my excess of open garbage haha. I did have pd freeze on me a couple of times yesterday while experimenting- but that was due to sheer recklessness
s~/r~ throw~/catch~ latency and object creation order
IIRC (sorry if i'm wrong).
Building the DSP chain means doing a topological sort to linearize the graph of the DSP objects.
For that all the DSP objects of the topmost patch are collected in reverse order of their creations. Then the first node in the collection that has no DSP entry is fetched. The graph is explored depth-first (outlet by outlet) appending the DSP method/operation of the connected objects to the DSP chain. For that all the inlets of the tested node must have been visited. Otherwise the algorithm stop and fetches the next outlet, or the next object in the collection, and restart the exploration from it. That process is recursively done in subpatches discovered.
Notice that it is not easy to explain (and i don't have it fresh in mind). The stuff in Spaghettis is there. AFAIK it is more or less the same that in Pure Data (linked in previous post).
I can bring details back in my head, if somebody really cares.
.
.
.
You can also watch the beginning of that video of Dave Rowland about Tracktion topological process.
EDIT: The first ten minutes (the rest is not relevant).
WARNING: the way things are done in that talk is NOT how it is done in Pd.
It is just an illustration of the problem needed to be solved!
Introducing Tracktion Graph: A Topological Processing Library for Audio - Dave Rowland - ADC20
.
.
.
@whale-av said:
@jameslo I agree. I did report a bug once, but have never dared ask a question on the list.
The list is the domain of those creating Pd. I often read what they have to say, but I would not dare to show my face.
David.
It is really sad to see experienced users intimidated by developers.
But at the same time i totally understand why (and i still have the same felling).
Paradigms useful for teaching Pd
@ingox Ha, I guess this highlights the number 0 paradigm for Pd-- always measure first!
It looks to me like message-passing overhead dwarfs the time it takes to copy a list once into [list store]
. That single full list-copy operation (a for loop if you look in the source) is still significant-- about 10% of total computation time when testing with a list of 100,000 elements.
However, if you remove all message-passing overhead and just code up [list drip]
as a core class, you consistently get an order of magnitude better performance.
So we have:
- original list-drip: slow, but I can't find the code for it. But I hope it was just copying a list once before iteration, and not on every pass.
- matju's insane recursive non-copy list-drip: faster than the original implementation
list store]
implementation of[list-drip]
: looks to be consistently 3 times faster than number 2 above, lending credence to the theory that message-passing overhead is the significant factor in performance. (Well, as long as you aren't copying the entire list on every pass...)- Add core class
[list drip]
to x_list.c and use a for loop to output each element from the incoming vector of atoms without copying anything. I quickly coded this up and saw a consistent order of magnitude performance improvement over the[list store]
implementation. (Didn't test with gpointers, which will probably slow down every single implementation listed here.)
I think Matt Barber even had a case where removing a [trigger]
object and replacing it with bare wires (and just depending on creation order) generated enough of a performance difference in a complex patch to warrant doing it. (Edit: this may have been in an abstraction, in which case the speedup would be multiplied by the total number of abstractions used...)
It's too bad that message-passing overhead makes that much of a difference. And while I can think of a potential speedup for the case of [trigger]
, I'm not sure there's a simple way to boost message passing performance across the board.
Few questions about oversampling
Thanks guys for the tips and links I will look into that in order to figure out the best trick to get a cleaner sound with the less CPU usage possible. I've actualy understood why my soustractive/additive bandlimited oscilators had some noise/clicking and it has nothing to do with aliasing but bad signal use in my design that I could fix easily.
Then while running my osc's with antialising/oversampling I did'nt notice an audible difference with or without antialising/oversampling, at least for the soustractive and additive synthesis. For FM synthesis I ran different test and got a good CPU use/antialising solution when oversampling two times. In order to get the best performance possible I could only apply the antialising method when using my FM osc and not applying it to my banlimited oscilators.
I've also tested inscreasing my block values and the result are interesting though I've heard that doing so leads to increase latency and since I want to make a patch meant for live performance it could became an issue if I rely on that to lower my CPU use. Though I might find a solution to the aliasing within use of a low pass filter which could offer a good alternative to the antialising method I used.
@gmoon I've used once pd~ and I don't know if I poorly implemented it or if the object isn't ready yet to deliver an interesting use of multiple cores but when I used it pd~ managed to multiply by four the CPU use of the patch I was working on. From my experience I won't recomend to anyone using pd~ for CPU optimization but maybe there's someone out there that knows how to use it properly and had succesfully devided his sound processing within pd.
Few questions about oversampling
Hi,
About a year ago I started to learn a bit pure data in order to create a patch that would act as a groovebox and that should perform on limited cpu resources since I want it to run on a raspberry pi. First I tried to make somekind of fork of the Martin Brinkmann groovebox patch, even if it allowed me to learn a lot about data flow I didn't went to the core of the patch tweaking with sound generation. This led me to end this attempt at forking MNB groovebox patch because even if I could seperate GUI stuff from sound generation and run it on different thread ect... I couldn't go further in optimization in order to reduce the cpu use.
Then a few weeks ago I decided to start again from scratch my project and this time I wanted to be more patient and learn anything needed in order to be capable of optimizing my patch as much as possible. After making a functional drum machine which runs at 2/3% of cpu with 8 different tracks, 126 steps sequencer, a bit of fx ect... I tried to find synths that would opperate well aside the drum machine. And I basicly didn't find any patch that wouldn't use massive amount of cpu time. So I created my own synths, nothing incredible but I'm happy with what I got, though I noticed some aliasing. I read a bit the floss manual about anti aliasing and apply the method used in the manual(http://write.flossmanuals.net/pure-data/antialiasing/), it work well but my synths almost trippled their cpu use, even if I put all my oscilators in the same subpatch in order to use only one instance of oversampling.
I didn't tried to oversample it less than 16 time but since oversampling is so cpu intensive I'm wondering if there's no other option in order to get a good sound definition at a lower cpu cost. I'm already using banlimited waveform so I don't know what I could do in order to limit the aliasing, especialy for my fm patch where bandlimited waveform isn't very useful in order to reduce aliasing.
Since I want to have at least 4 synth track with some at least one synth having 5 voice polyphony I want to know what the best thing to do. Letting FM aside for this project and use switch~ for oversampling 2 or 4 time my synths that use bandlimited waveform ? Or should I try to run different instances of pd for each synth and controling it from a gui/control patch with netsend(though it wouldn't bring down the cpu use at least it would provide somekind of multithreading for my patch) ? Or is there another way to get some antiliasing ? Or should I review lower my expectation because there is no solution that could provide a decent antialiasing for 4 or more synth running at the same time with a low cpu use in pure data in 2021.
Thanks to everyone that would read my topic and try to give some advice in order to get the best antialising/low cpu use solution.
CPU usage of idle patches, tabread4~?
Hello there,
I started developing a sampler/looper with Purr Data a couple of months ago (and I've used this forum a lot!), to be used with a 8*8 MIDI launcher. In order to avoid audio drop-out and glitches, I spent the past 2 weeks separating the UI and DSP patches, communicating via OSC (so the UI can be launched with -nrt and -noaudio while DSP is using -rt -jack -nogui) (I'm on Debian 9.0).
To avoid dynamic patching and array creation, I then have 64 instances of the track's DSP abstraction, waiting for the UI to use them, even if only 3 or 4 tracks are actually used.
I find however that the idle DSP instance of Pd - all tracks' abstractions loaded, but not recording or playing any audio - uses 28-29% of my CPU when monitoring with Pd's load meter, while the idle UI instance uses only 5% (I'm on an old machine, but still). I have narrowed it down to the tracks abstractions, since I get only 5% CPU use (DSP side) when there is only one idle track patch instead of 64.
Looking deeper into it, I find that if I delete the 2 [tabread4~] objects (each track has 2 arrays for stereo) I significantly reduce the CPU load, from ~28% to ~17% with 64 idle patches. 17% is still significant, but I couldn't identify any other object with such an important impact.
Is it known that [tabread4~] (tabread~ does no better) uses significant CPU even if idle, both with a phasor~ at 0Hz or not connected at all to the phasor~? Is there a way to make it "sleep" completely? or is there no/little risk of audio drop-out if I create that tabread4~ object dynamically once I actually need the track?
The CPU load increases too ~40% when I turn DSP off, I guess because of errors or confused messages with empty arguments when processing audio signals.
Any other advice to reduce the CPU load is of course also welcome
Thank you!
Reblocking under the hood
-
The arrows point to the place where the signal is put (during the prologue) into the subpatch buffer in, and from where the signal is pulled out (during the epilogue) from the subpatch buffer out. Note that each letter represents 64 samples (oops, i should have said this first).
-
The #numbers refer to those in the log file (i.e. starting with "BLOCK FREQUENCY") for each example.
It is the DSP tick tag (e.g. "#+0#"). -
The exclamation marks when the DSP is triggered (computed) in the subpatch.
-
On the left "a b c d e f g h i" is the signal in, whereas on the right "- - - A B C D E F"" is the signal out. That means that for the first 64 samples labeled "a" you get 64 samples of "-" (zeros). Then for "b" you have zeros again, for "c" also. For the fourth vector "d" you obtain "A" (that is the computed signal of "a").
-
BLOCK FREQUENCY ... OUTLET HOP
Those are messages logged in my fork when the DSP graph is builded. It helps me to understand what's going on.
< https://github.com/Spaghettis/Spaghettis/blob/054786098f340d8683efd9b5b5f2c1df4c8f1f56/src/dsp/graph/d_block.c#L85 >
Roughly:
BLOCK FREQUENCY is the number of times the child ticks for each DSP tick in parent.
BLOCK PERIOD at contrary is the number of times the parent ticks for one tick in child.
INLET SIZE is the size of the buffer in.
INLET WRITE is the position to write into the buffer at start.
INLET HOP ... is the hop.
OUTLET SIZE is the size of the buffer out.
OUTLET HOP ... is the hop too!
Optimizing pd performances to fit an RPI 3
Thank you, I've read that removing GUI object helps for good performance, so if you recomend it I will remove all GUI. Since it's a lot of change to apply to my patch, I just want to know how effective this methode is in terms of CPU usage optimization. Giving the patch performance(audio processed with 35% of CPU) could I get the patch running at 20% of CPU or even less ?
Then for the use of switch~ I don't know how I could implement that. The easier way would to place it to my FX section, but I don't know if spliting my FX subpatch in 8 or 9 would helps in term of efficiency. I noticed than I got better performance in my mother patch when I hosted only two subpatch instead of nine(like a gain of 5% of CPU).
For the DSP crash on the RPI it seems that I'm not the only one to have some issues with it. The problem is coming from Alsa, I've tried running the dsp on jack and it works, for this one problem solved.