On Tue, Apr 21, 2015, at 03:12 PM, Marcus Müller wrote: > By the way: This currently *is* getting more interesting: Because you > typically don't want to copy memory needlessly in a > performance-critical application, it's bad that blocks that wrap some > kind of accelerator (GPU, FPGA card, DSP core...) can't define where > their buffers are -- so there's work going on in the coprocessors > working group (Doug Geiger is the person to ask, I guess) to allow > single blocks to define their own special buffers.
Doug Geiger has lead the CoProc working-group (WG) effort for a while, but he's time limited as of recent, as are most of the usual candidates for this work. I -might- pick up the torch in May, if/as my time allows; we'll see. If there's demand for doing this work, it would help. The CoProc work is basically to create egress and ingress base block types that provide their own specialized buffers. They would be able to use the current double-buffered type, or a single-buffer if that's all that's available. *** The concepts of the double buffer that we current use include (assuming the request was a buffer of N items): + good: we can always guarantee that N items are available for R/W, no matter where the R/W pointers are, by allocating the buffers somewhat larger (2x) than the request (and, rounded up to the nearest pagesize() boundary); + good: buffer wrap -- when the R/W pointers are moved -- is a simple remainder computation; no memcpy or even branching required; - bad: not all OSs / hardware easily provide these buffer types, or getting them requires root access, or whatnot. *** The concepts of the single buffer include: + good: really easy to guarantee that N items are available for R/W (etc); - bad: have to use memcpy eventually for buffer wrap; but, can mitigate this issue by allocating the buffer to be, say, 10x larger than needed so that memcpy happens only 1:10 of the time; we don't in general want to allocated large buffers all of the time, but if this is the way to get GR working on some systems then it's a good option to have in place; will require branching somewhere in the process; + good: can use any memory, anywhere, which makes it more portable across OSs & hardware. *** egress is the transport from the local CPU/memory to the CoProc, which might be as simple as using the same shared memory & not even having to memcpy / DMA; or, it might be more complicated, such as using OpenCL to set up a memory map, moving the data over, then closing the map. *** ingress is the transport from the CoProc to local CPU/memory; just the reverse of egress. *** The WG decided to limit the use cases for now to just these 2; if this work does get done, and there's a need for CoProc blocks where the scheduler is involved, then we'll address those cases at that time. Anyway, those are the basic ideas. I'd love to hear some more discussion & folks interested in having this change in place. - MLD _______________________________________________ Discuss-gnuradio mailing list Discuss-gnuradio@gnu.org https://lists.gnu.org/mailman/listinfo/discuss-gnuradio