On Mon, Apr 20, 2015 at 10:21 AM, Marcus Müller <marcus.muel...@ettus.com>
wrote:

>  Hi Marco,
>
> If I may recommend something, it would be having a look at VOLK [1]. It's
> the optimizations library that comes with GNU Radio.
> If you could implement some of these algorithms in CUDA, then every block
> currently using VOLK (which is the majority of the arithmetically
> challenging blocks at the moment) could automatically make use of your
> accelerations, without having to change anything! Also, VOLK comes with
> volk_profile, which it uses to test the different implementations that work
> on your hardware, looking for the fastest one. That would be the ultimate
> benchmark for your kernels, as it directly compares the efficiency of the
> "general C" and CPU-SIMD implementations to your CUDA kernels.
>

We've never been hot on the idea of using VOLK for GPU stuff. VOLK kernels
tend to do one thing at a time and don't worry about data movement (too
much) because the SIMD registers are right there. Going to GPUs takes a lot
longer, so you want to spend more of your time there once you get the data
moved across. With VOLK, we'd be going back and forth, which is a huge
performance killer.


> Furthermore, gr-theano is worth a visit [2], because it actually does CUDA
> to accellerate channel models. The point here is that GPUs and their high
> memcpy latency (and CPU cost) aren't practical for all problems. If I just
> want to add a small number of samples, doing it on a CPU might simply pay
> off better; gr-theano for example offers a FFT, which might be one of the
> algorithms typically working on large vectors where the CPU/GPU boundary
> crossing might be worth it.
>
> Best regards,
> Marcus
>
> [1] http://nathanwest.us/volk/
> [2] http://www.cgran.org/pages/gr-theano.html
>

I'm also not the biggest fan of CUDA for GNU Radio simply because it's too
hardware specific. I'd be more interested in seeing OpenCL implementations
-- but even that has it's limitations for support. Theano looks nice from
what I've heard (mostly from Tim and his gr-theano work), and I don't
believe that it's necessarily CUDA.

Tom



> On 04/20/2015 04:09 PM, marco Ribero wrote:
>
> I cannot do it.
> For my thesis,I'm trying do bring various part of GnuRadio over CUDA..
> My idea is to rewrite already existing blocks with CUDA, possibly without
> breaking compatibility with actual implementation of gnuradio. In this way
> a normal user can use these blocks without problems.
>
> For the moment, I've token more confidence with gnuradio, made an FM CUDA
> receiver and started to port over CUDA some blocks. Is mandatory to
> minimize host-device memcpy.
> My actual approach is : each block loads its code and communicate with
> neighboors using async transfers,streams and other(so I need to pass
> addresses of memory locations,lock bits,etc..
>
>  My next step will be: at the beginning,each block will send down its
> device code and parameters..the block at the and of the chain will make a
> dynamic compilation (CUDA 7).. if I'll have additional time I'll also use
> warp parallelism(reducing global-shared memcpy)
>
>  Thanks in any case,
> marco
>
>
> Il giorno lun 20 apr 2015 alle ore 12:48 Marcus Müller <
> marcus.muel...@ettus.com> ha scritto:
>
>>  Hi Marco,
>>
>> I just realized: Things might be much more easy than that, even:
>>
>> What you do sounds like a job for a hierarchical block; if you're not
>> used to that concept: It's just a "subflowgraph", represented as a block
>> with in- and outputs.
>> If you put both your blocks inside, you'll always have them together.
>> And: in the constructor of your hierarchical block, you can for example
>> first construct your cuda block, and then give your "downstream" block the
>> pointer to that in its constructor.
>>
>> To the user, this will look like one block, though there are two (or
>> more) inside.
>>
>> Greetings,
>> Marcus
>>
>>
>> On 04/20/2015 12:29 PM, marco Ribero wrote:
>>
>>
>> Thank you very much. Your solution is much cleaner.
>>
>> Have a good day,
>> Marco
>>
>>  Il giorno lun 20 apr 2015 alle ore 09:29 Marcus Müller <
>> marcus.muel...@ettus.com> ha scritto:
>>
>>>  Hi marco,
>>>
>>> what you describe as ID already exist: every block has a function
>>> alias(), giving it a string "name", which can be used with
>>> global_block_registry::block_lookup(name) [1].
>>>
>>> You will need to wrap your alias in a pmt::intern to get it into a
>>> stream tag, so use that with block_lookup, and cast the result to
>>> your_block_type::sptr.
>>>
>>> Greetings,
>>> Marcus
>>>
>>> [1]
>>> http://gnuradio.org/doc/doxygen/classgr_1_1block__registry.html#a67a83c42e2030bba463c99d51e7a8f92
>>>
>>>
>>>
>>>
>>
>>   _______________________________________________
>> Discuss-gnuradio mailing 
>> listDiscuss-gnuradio@gnu.orghttps://lists.gnu.org/mailman/listinfo/discuss-gnuradio
>>
>>  _______________________________________________
>> Discuss-gnuradio mailing list
>> Discuss-gnuradio@gnu.org
>> https://lists.gnu.org/mailman/listinfo/discuss-gnuradio
>>
>
>
> _______________________________________________
> Discuss-gnuradio mailing 
> listDiscuss-gnuradio@gnu.orghttps://lists.gnu.org/mailman/listinfo/discuss-gnuradio
>
>
>
> _______________________________________________
> Discuss-gnuradio mailing list
> Discuss-gnuradio@gnu.org
> https://lists.gnu.org/mailman/listinfo/discuss-gnuradio
>
>
_______________________________________________
Discuss-gnuradio mailing list
Discuss-gnuradio@gnu.org
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio

Reply via email to