Johannes, you forgot to mention you will presenting your stuff at GRCon in Washington DC in a few weeks :)
Cheers, Martin On 31.07.2015 02:50, Johannes Demel wrote: > Hey community! > > Here we go again. Another project update. > I'm working with VOLK and SIMD for two weeks now. I could fix some > hiccups with last weeks pack and unpack kernels. They run just fine > during test now. > Also, I added a 'volk_8u_x3_encodepolar_8u_x2' kernel. It operates on > the the assumption that there is one active bit in a byte and it is > located in the LSB. A quick performance test with a 2^32 samples head > block after the encoder shows that generic crunches ~160MSps. So far I > had an encoder which operated on packed bytes and did ~300MSps. An > unpack block was added to the flowgraph with the 'extended_encoder' in > use. The vector optimized version does ~570MSps. So it is ~3.5x as fast > as the generic version. Some more optimization might yield even better > results. > At first glance it is weird that the output signature of the encoder is > '8u_x2'. The kernel internally needs a temporary buffer which has the > same size as the output buffer. Instead of malloc'ing and free'ing it on > every call, it can be created once and be used all the time. > During the week I was struggling with VOLK tests. Finally I solved those > issues. But I'd like to refer to the mail I sent out the other day. > SIMD code tends to have quite a few lines of code. In order to make it > easier to read and understand, it would be great if it was possible to > implement multiple functions within one '#ifdef LV_HAVE_ARCH ... #endif' > paragraph. But so far the compiler refuses to compile if I did this. It > is possible to add functions in the general section but that's only > appropriate for a generic kernel or common functions. > All the intrinsics I used so far are available on SSSE3. Although, I > created aligned and unaligned versions of those kernels only store[u] > and load[u] might make a difference here. My benchmarks don't show any > significant difference. All benchmarks are done on a Sandy Bridge i7. > > I suspect the encoder was easier to optimize than the decoder will be. > So for the upcoming week and beyond I will focus on creating kernels for > polar decoding. > > More info and current project progress can be found in [1], [2] and [3]. > > Cheers > Johannes > > [1] https://github.com/jdemel/gnuradio > [2] https://github.com/jdemel/socis-proposal > [3] https://github.com/jdemel/volk > > _______________________________________________ > Discuss-gnuradio mailing list > Discuss-gnuradio@gnu.org > https://lists.gnu.org/mailman/listinfo/discuss-gnuradio > _______________________________________________ Discuss-gnuradio mailing list Discuss-gnuradio@gnu.org https://lists.gnu.org/mailman/listinfo/discuss-gnuradio