I would use fftw source from their site not rhels source rpm, unless you need to deploy it on a large number of machines. (Even then I would pull latest source and update the srpm)
You can just build fftw from source. Add almost all the configure options. We do this on rhel 6 because their version doesn't support wisdom or take advantage of any recent CPU releases. It should install in /usr/local by default. Just add the library path to /etc/ld.so.conf and run ldconfig and you'll be set. On Mar 9, 2016 8:47 AM, "devin kelly" <dwwke...@gmail.com> wrote: > Thanks for the help, I don't think I could have figured this out on my own. > > This is because I'm on RHEL7 (argh!). My libfftw.so doesn't contain any > references to AVX. For me there are a couple of options for fixing this: > > 1) Use Nathan's branch. > 2) Rebuild fftw with AVX support > 3) Rebuild GR and Volk without AVX. > > I tried 2) first and noticed this in the spec file that was in the source > RPM I was trying to rebuild: > > %ifarch %{ix86} x86_64 > # Enable SSE2 support for x86 and x86_64 > # (no avx as it is claimed to drastically slower) > for((i=0;i<2;i++)); do > prec_flags[i]+=" --enable-sse2" > done > %endif > > Is the spec file author right? Now I'm a little confused about the > approach I should take. I'll probably just go with 1) in the mean time. > > Thanks again Nathan, > Devin > > On Wed, Mar 9, 2016 at 1:06 AM, West, Nathan <n...@ostatemail.okstate.edu> > wrote: > >> The a and c vectors come from gr:fft objects' internal buffers. These are >> internally created with fftwf_malloc (lines 152/156 of gr-fft/lib/fft.cc). >> fftwf_malloc is obviously not generating buffers with proper alignment so >> you're seeing a 50% (per buffer) that this segfaults. I'll note that this >> is also only an issue with fftwf buffers when fftwf isn't built with AVX >> support (and therefore nothing in fftwf requires a 32-byte aligned buffer). >> >> Andy Walls (thanks!) pointed out on IRC that we had a similar issue years >> ago with a QT sink. >> >> I have a branch that should fix this ( >> https://github.com/n-west/gnuradio/tree/fft-avx-alignment). I also >> suggest you look in to getting a version of fftwf built with AVX. I don't >> know if there's a good way to tell, but if I run readelf -a on my >> libfftw3.so I see some functions with avx in the name. >> >> Cheers, >> nw >> >> >> On Tue, Mar 8, 2016 at 1:31 PM, devin kelly <dwwke...@gmail.com> wrote: >> >>> OK, here's my C program: >>> >>> #include <stdio.h> >>> #include <stdlib.h> >>> #include <volk/volk.h> >>> #include <stdint.h> >>> >>> int main() { >>> >>> size_t alignment = volk_get_alignment(); >>> >>> uint8_t* ptr; >>> >>> ptr = (uint8_t*)volk_malloc(1000 * sizeof(uint8_t), alignment); >>> printf("alignment = %lu, ptr = %x, *ptr = %u\n", alignment, ptr, >>> *ptr); >>> volk_free((void*)ptr); >>> ptr = NULL; >>> >>> >>> return 0; >>> } >>> >>> >>> Compile: >>> >>> $ gcc volk_test.c -o volk_test -lvolk -L/local_disk/gr_3.7.9_debug/lib >>> >>> It's output: >>> >>> $ ./volk_test >>> Using Volk machine: avx2_64_mmx_orc >>> alignment = 32, ptr = 151b040, *ptr = 00 >>> >>> Also, I've attached the output from the preprocessor, this command: >>> >>> $ /usr/bin/cc -DHAVE_AVX_CVTPI32_PS -DHAVE_CPUID_H -DHAVE_DLFCN_H >>> -DHAVE_FENV_H -DHAVE_POSIX_MEMALIGN -DHAVE_XGETBV -Wall -fvisibility=hidden >>> -g -I/local_disk/gr_3.7.9_src/volk/build_debug/include >>> -I/local_disk/gr_3.7.9_src/volk/include >>> -I/local_disk/gr_3.7.9_src/volk/kernels >>> -I/local_disk/gr_3.7.9_src/volk/build_debug/lib >>> -I/local_disk/gr_3.7.9_src/volk/lib -I/usr/include/orc-0.4 -E -fPIC -o >>> volk_malloc_preprocessed -c >>> /local_disk/gr_3.7.9_src/volk/lib/volk_malloc.c >>> >>> I just found the compiler step from from doing 'VERBOSE=1 make' then >>> changed the output and added -E. I attached volk_malloc_preprocessed as >>> well. >>> >>> It looks like this is my volk_malloc(): >>> >>> >>> void *volk_malloc(size_t size, size_t alignment) >>> { >>> void *ptr; >>> >>> >>> >>> >>> if (alignment == 1) >>> return malloc(size); >>> >>> int err = posix_memalign(&ptr, alignment, size); >>> if(err == 0) { >>> return ptr; >>> } >>> else { >>> fprintf(stderr, >>> "VOLK: Error allocating memory " >>> "(posix_memalign: error %d: %s)\n", err, strerror(err)); >>> return ((void *)0); >>> } >>> } >>> >>> >>> >>> Devin >>> >>> >>> >>> On Tue, Mar 8, 2016 at 11:37 AM, West, Nathan < >>> n...@ostatemail.okstate.edu> wrote: >>> >>>> >>>> On Tue, Mar 8, 2016 at 10:58 AM, devin kelly <dwwke...@gmail.com> >>>> wrote: >>>> >>>>> Calling 'info variables' (or args or locals) the last few frames >>>>> didn't give me any real info so I built a copy of GR/Volk with debug >>>>> symbols. I ran the FG again, this time from GDB, here's my back trace. >>>>> In >>>>> this backtrace you can see the arguments passed in each call. I have an >>>>> i7-5600U CPU @ 2.60GHz, the volk_profile is appended at the bottom. >>>>> >>>> >>>> Excellent. Thanks for going through that extra step. It really helps. >>>> >>>> >>>>> >>>>> Here's are the links for the relevant code: >>>>> >>>>> >>>>> https://github.com/gnuradio/volk/blob/f0b722392950bf7ede7b32f5ff60019bce7a8592/kernels/volk/volk_32fc_x2_multiply_32fc.h#L232 >>>>> >>>>> https://github.com/gnuradio/gnuradio/blob/master/gr-filter/lib/fft_filter.cc#L323 >>>>> >>>>> https://github.com/gnuradio/gnuradio/blob/222e0003f9797a1b92d64855bd2b93f0d9099f93/gr-digital/lib/corr_est_cc_impl.cc#L214 >>>>> >>>>> Could the problem be that nitems is 257 and num_points is 512? Or >>>>> should nitems really be 256 and not 257? >>>>> >>>> >>>> I don't think so. I'm not familiar with the details of the fft_filter >>>> implementations, but usually these things will take in some history if they >>>> don't have enough points to operate on (in this case 512). >>>> >>>> The much more worrying thing is your vector addresses. >>>> >>>> >>>>> >>>>> Thanks, >>>>> Devin >>>>> >>>>> (gdb) bt >>>>> #0 0x00007fffdcaccb57 in volk_32fc_x2_multiply_32fc_a_avx2_fma >>>>> (__P=0x3b051b0) >>>>> at /usr/lib/gcc/x86_64-redhat-linux/4.8.5/include/avxintrin.h:835 >>>>> #1 0x00007fffdcaccb57 in volk_32fc_x2_multiply_32fc_a_avx2_fma >>>>> (cVector=0x3b1f770, aVector=0x3b051b0, bVector=0x3b240e0, num_points=512) >>>>> >>>> >>>> 0x3b1f770 % 32 = 16 (bad) >>>> 0x3b051b0 % 32 = 16 (bad) >>>> 0x3b240e0 % 32 = 0 (good) >>>> >>>> Unfortunately it looks like volk_get_alignment is returning the wrong >>>> thing or there's a bug in volk_malloc. Can you tell us what >>>> volk_get_alignment returns? The easiest thing is probably to write a simple >>>> C program that prints out the result (hmm, I should add that to >>>> volk-config-info). I'd also like to know which volk_malloc implementation >>>> you're using. Unfortunately I don't think we have an easy way to discover >>>> that (hmm, something else that should be added to volk-config-info). I >>>> think the best way might be to look at volk_malloc.c intermediate files >>>> after the preprocessor has done its work. >>>> >>>> If you want to move on while we figure this out then you can edit >>>> ~/.volk/volk_config and replace the avx2_fma with sse3 on the line that has >>>> this kernel name on it. >>>> >>>> >>>>> at >>>>> /local_disk/gr_3.7.9_src/volk/kernels/volk/volk_32fc_x2_multiply_32fc.h:242 >>>>> #2 0x00007fffdc945a75 in __volk_32fc_x2_multiply_32fc_a >>>>> (cVector=0x3b1f770, aVector=0x3b051b0, bVector=0x3b240e0, num_points=512) >>>>> at /local_disk/gr_3.7.9_src/volk/build_debug/lib/volk.c:7010 >>>>> #3 0x00007fffd3f8e360 in >>>>> gr::filter::kernel::fft_filter_ccc::filter(int, std::complex<float> >>>>> const*, >>>>> std::complex<float>*) (this=0x3b02f40, nitems=nitems@entry=257, >>>>> input=input@entry=0x7fffc9cc7000, output=output@entry=0x3b36460) >>>>> at >>>>> /local_disk/gr_3.7.9_src/gnuradio/gr-filter/lib/fft_filter.cc:323 >>>>> #4 0x00007fffd42910df in gr::digital::corr_est_cc_impl::work(int, >>>>> std::vector<void const*, std::allocator<void const*> >&, >>>>> std::vector<void*, >>>>> std::allocator<void*> >&) (this=0x3b01560, noutput_items=257, >>>>> input_items=..., output_items=std::vector of length 1, capacity 1 = {...}) >>>>> at >>>>> /local_disk/gr_3.7.9_src/gnuradio/gr-digital/lib/corr_est_cc_impl.cc:237 >>>>> #5 0x00007fffdd064907 in gr::sync_block::general_work(int, >>>>> std::vector<int, std::allocator<int> >&, std::vector<void const*, >>>>> std::allocator<void const*> >&, std::vector<void*, std::allocator<void*> >>>>> >&) (this=0x3b015b8, noutput_items=<optimized out>, ninput_items=..., >>>>> input_items=..., output_items=...) at >>>>> /local_disk/gr_3.7.9_src/gnuradio/gnuradio-runtime/lib/sync_block.cc:66 >>>>> #6 0x00007fffdd02f70f in gr::block_executor::run_one_iteration() >>>>> (this=this@entry=0x7fff83ffedb0) >>>>> at >>>>> /local_disk/gr_3.7.9_src/gnuradio/gnuradio-runtime/lib/block_executor.cc:438 >>>>> #7 0x00007fffdd06da8a in >>>>> gr::tpb_thread_body::tpb_thread_body(boost::shared_ptr<gr::block>, int) >>>>> (this=0x7fff83ffedb0, block=..., max_noutput_items=<optimized out>) at >>>>> /local_disk/gr_3.7.9_src/gnuradio/gnuradio-runtime/lib/tpb_thread_body.cc:122 >>>>> #8 0x00007fffdd062761 in >>>>> boost::detail::function::void_function_obj_invoker0<gr::thread::thread_body_wrapper<gr::tpb_container>, >>>>> void>::invoke(boost::detail::function::function_buffer&) (this=0x3bc3ec0) >>>>> at >>>>> /local_disk/gr_3.7.9_src/gnuradio/gnuradio-runtime/lib/scheduler_tpb.cc:44 >>>>> #9 0x00007fffdd062761 in >>>>> boost::detail::function::void_function_obj_invoker0<gr::thread::thread_body_wrapper<gr::tpb_container>, >>>>> void>::invoke(boost::detail::function::function_buffer&) (this=0x3bc3ec0) >>>>> at >>>>> /local_disk/gr_3.7.9_src/gnuradio/gnuradio-runtime/include/gnuradio/thread/thread_body_wrapper.h:51 >>>>> #10 0x00007fffdd062761 in >>>>> boost::detail::function::void_function_obj_invoker0<gr::thread::thread_body_wrapper<gr::tpb_container>, >>>>> void>::invoke(boost::detail::function::function_buffer&) >>>>> (function_obj_ptr=...) at >>>>> /usr/include/boost/function/function_template.hpp:153 >>>>> #11 0x00007fffdd016cd0 in >>>>> boost::detail::thread_data<boost::function0<void> >::run() >>>>> (this=<optimized >>>>> out>) >>>>> at /usr/include/boost/function/function_template.hpp:767 >>>>> #12 0x00007fffdd016cd0 in >>>>> boost::detail::thread_data<boost::function0<void> >::run() >>>>> (this=<optimized >>>>> out>) >>>>> at /usr/include/boost/thread/detail/thread.hpp:117 >>>>> #13 0x00007fffdbe4f24a in thread_proxy () at >>>>> /lib64/libboost_thread-mt.so.1.53.0 >>>>> #14 0x00007ffff7800dc5 in start_thread () at /lib64/libpthread.so.0 >>>>> #15 0x00007ffff6e2528d in clone () at /lib64/libc.so.6 >>>>> >>>>> Here are the locals on the last few frames: >>>>> >>>>> (gdb) f 0 >>>>> #0 0x00007fffdcaccb57 in _mm256_load_ps (__P=0x3b051b0) at >>>>> /usr/lib/gcc/x86_64-redhat-linux/4.8.5/include/avxintrin.h:835 >>>>> 835 return *(__m256 *)__P; >>>>> (gdb) info locals >>>>> No locals. >>>>> (gdb) f 1 >>>>> #1 volk_32fc_x2_multiply_32fc_a_avx2_fma (cVector=0x3b1f770, >>>>> aVector=0x3b051b0, bVector=0x3b240e0, num_points=512) >>>>> at >>>>> /local_disk/gr_3.7.9_src/volk/kernels/volk/volk_32fc_x2_multiply_32fc.h:242 >>>>> 242 const __m256 x = _mm256_load_ps((float*)a); // Load the ar >>>>> + ai, br + bi as ar,ai,br,bi >>>>> (gdb) info locals >>>>> y = {-4.87433296e+17, 4.59163468e-41, -3.92813517e+17, 4.59163468e-41, >>>>> 5.15677835e-43, 0, 5.26888223e-43, 0} >>>>> tmp2x = {6.389921e-43, 0, -512.314453, 4.59163468e-41, 1.26116862e-44, >>>>> 0, -4.87433296e+17, 4.59163468e-41} >>>>> x = {-512.314453, 4.59163468e-41, 0, 0, 2.76102662, -3.64918089, >>>>> -4.92134571, -1.06491208} >>>>> yl = {4.14784345e-43, 0, 1.26116862e-44, 0, -4.87442367e+17, >>>>> 4.59163468e-41, -4.87439343e+17, 4.59163468e-41} >>>>> yh = {-1674752, 4.59163468e-41, 0, 0, -1.50397414e-36, 4.59163468e-41, >>>>> -3.31452625e+17, 4.59163468e-41} >>>>> tmp2 = {6.72623263e-44, 1.2751816e-43, 2.24207754e-44, 0, >>>>> 7.17464814e-43, 0, -3.31440427e+17, 4.59163468e-41} >>>>> z = {0.794147611, 0, 0.263988227, 0, -0.380019426, 0, -0.953325868, 0} >>>>> number = 0 >>>>> quarterPoints = 128 >>>>> c = 0x3b1f770 >>>>> a = 0x3b051b0 >>>>> b = 0x3b240e0 >>>>> (gdb) f 2 >>>>> #2 0x00007fffdc945a75 in __volk_32fc_x2_multiply_32fc_a >>>>> (cVector=0x3b1f770, aVector=0x3b051b0, bVector=0x3b240e0, num_points=512) >>>>> at /local_disk/gr_3.7.9_src/volk/build_debug/lib/volk.c:7010 >>>>> 7010 volk_32fc_x2_multiply_32fc_a(cVector, aVector, bVector, >>>>> num_points); >>>>> (gdb) info locals >>>>> No locals. >>>>> (gdb) f 3 >>>>> #3 0x00007fffd3f8e360 in gr::filter::kernel::fft_filter_ccc::filter >>>>> (this=0x3b02f40, nitems=nitems@entry=257, >>>>> input=input@entry=0x7fffc9cc7000, output=output@entry=0x3b36460) >>>>> at /local_disk/gr_3.7.9_src/gnuradio/gr-filter/lib/fft_filter.cc:323 >>>>> 323 volk_32fc_x2_multiply_32fc_a(c, a, b, d_fftsize); >>>>> (gdb) info locals >>>>> a = <optimized out> >>>>> b = <optimized out> >>>>> c = <optimized out> >>>>> i = 0 >>>>> dec_ctr = 0 >>>>> j = <optimized out> >>>>> ninput_items = 257 >>>>> >>>>> My volk profile results: >>>>> >>>>> $ volk_profile -R 32fc_x2_multiply >>>>> Using Volk machine: avx2_64_mmx_orc >>>>> RUN_VOLK_TESTS: volk_32fc_x2_multiply_32fc(131071,1987) >>>>> u_avx2_fma completed in 220ms >>>>> u_avx completed in 220ms >>>>> u_sse3 completed in 240ms >>>>> generic completed in 2810ms >>>>> a_avx2_fma completed in 200ms >>>>> a_avx completed in 220ms >>>>> a_sse3 completed in 230ms >>>>> a_generic completed in 2810ms >>>>> u_orc completed in 280ms >>>>> Best aligned arch: a_avx2_fma >>>>> Best unaligned arch: u_avx2_fma >>>>> RUN_VOLK_TESTS: volk_32fc_x2_multiply_conjugate_32fc(131071,1987) >>>>> u_avx completed in 230ms >>>>> u_sse3 completed in 230ms >>>>> generic completed in 2790ms >>>>> a_avx completed in 220ms >>>>> a_sse3 completed in 230ms >>>>> a_generic completed in 2800ms >>>>> Best aligned arch: a_avx >>>>> Best unaligned arch: u_avx >>>>> Writing "/home/devin/.volk/volk_config"... >>>>> >>>>> >>>> Well I'm both jealous and happy that AVX2 is actually an improvement on >>>> newer processors. Also matches the folklore that these new technologies are >>>> usually not faster in the first silicon products that they come out in. >>>> >>> >>> >>> _______________________________________________ >>> Discuss-gnuradio mailing list >>> Discuss-gnuradio@gnu.org >>> https://lists.gnu.org/mailman/listinfo/discuss-gnuradio >>> >>> >> > > _______________________________________________ > Discuss-gnuradio mailing list > Discuss-gnuradio@gnu.org > https://lists.gnu.org/mailman/listinfo/discuss-gnuradio > >
_______________________________________________ Discuss-gnuradio mailing list Discuss-gnuradio@gnu.org https://lists.gnu.org/mailman/listinfo/discuss-gnuradio