On Fri, Feb 21, 2014 at 10:12 PM, Kelly Boswell <krbosw...@gmail.com> wrote: > I'm encountering the same problem on maint. And I did remember to rebuild. > I removed the build directory, recreated it, and started over with cmake > just to be sure. It's the same stack trace.
Yeah, false alarm on volk_malloc. Turns out my issue was coming from qtgui::freq_sink_c because we've introduced a new AVX proto-kernel and that block was using the aligned kernel always (not the dispatcher). There's one input that is a std::vector<float> which only guarantees alignment to 16-bytes, so non-AVX calls were safe. I'm going to push a patch for this soon. So this will not address the problem you're having, which I cannot reproduce (but hopefully Nathan can figure out since he can reproduce the issue). Tom > On Fri, Feb 21, 2014 at 7:54 PM, West, Nathan <n...@ostatemail.okstate.edu> > wrote: >> >> If you just want to get back to a system that passes QA you should >> just be able to build off of maint. >> >> On Fri, Feb 21, 2014 at 6:05 PM, Kelly Boswell <krbosw...@gmail.com> >> wrote: >> > I removed the implementation of volk_malloc that uses posix_menacing by >> > commenting everything from the #if to #else and the final #endif but the >> > segmentation fault remains. I noticed it's being called in a few other >> > files >> > as well. Do I need to remove those, too? Thanks in advance. >> > >> > On Feb 21, 2014 10:21 AM, "Kelly Boswell" <krbosw...@gmail.com> wrote: >> >> >> >> Thank you, Tom. I'll try that after I'm off of work tonight. And thank >> >> you >> >> for the great ideas, Nathan. >> >> >> >> On Fri, Feb 21, 2014 at 2:39 AM, West, Nathan >> >> <n...@ostatemail.okstate.edu> wrote: >> >> > On Thu, Feb 20, 2014 at 11:25 PM, Kelly Boswell <krbosw...@gmail.com> >> >> > wrote: >> >> >> After the make test failed for this module, I decided to poke around >> >> >> to >> >> >> see >> >> >> if there is an easy fix. I made a script that simply executes the >> >> >> test >> >> >> over >> >> >> and over until it seg faults and exits after the core file is >> >> >> created. >> >> >> >> >> >> xxxxx@xxxx:~/src/gnuradio/build/gr-digital/python/digital$ >> >> >> ./runtests.sh >> >> >> Using Volk machine: avx_64_mmx >> >> >> Segmentation fault (core dumped) >> >> >> >> >> >> xxxxx@xxxx:~/src/gnuradio/build/gr-digital/python/digital$ gdb >> >> >> /usr/bin/python2.7 core >> >> >> (gdb) bt >> >> >> (gdb) bt >> >> >> #0 0x00007fe8f627fb17 in volk_32fc_32f_dot_prod_32fc_a_avx () >> >> >> from /home/kelly/src/gnuradio/build/volk/lib/libvolk.so.0.0.0 >> >> >> #1 0x00007fe8f52dd25f in >> >> >> gr::filter::kernel::fir_filter_ccf::filter(std::complex<float> >> >> >> const*) >> >> >> () >> >> >> from >> >> >> >> >> >> >> >> >> /home/kelly/src/gnuradio/build/gr-filter/lib/libgnuradio-filter-3.8git.so.0.0.0 >> >> >> #2 0x00007fe8f143c45b in >> >> >> gr::digital::pfb_clock_sync_ccf_impl::general_work(int, >> >> >> std::vector<int, >> >> >> std::allocator<int> >&, std::vector<void const*, std::allocator<void >> >> >> const*> >> >> >>>&, std::vector<void*, std::allocator<void*> >&) () >> >> >> from >> >> >> >> >> >> >> >> >> /home/kelly/src/gnuradio/build/gr-digital/lib/libgnuradio-digital-3.8git.so.0.0.0 >> >> >> #3 0x00007fe8f653809e in gr::block_executor::run_one_iteration() () >> >> >> from >> >> >> >> >> >> >> >> >> /home/kelly/src/gnuradio/build/gnuradio-runtime/lib/libgnuradio-runtime-3.8git.so.0.0.0 >> >> >> #4 0x00007fe8f6573622 in >> >> >> gr::tpb_thread_body::tpb_thread_body(boost::shared_ptr<gr::block>, >> >> >> int) >> >> >> () >> >> >> from >> >> >> >> >> >> >> >> >> /home/kelly/src/gnuradio/build/gnuradio-runtime/lib/libgnuradio-runtime-3.8git.so.0.0.0 >> >> >> #5 0x00007fe8f6565ea1 in >> >> >> >> >> >> >> >> >> boost::detail::function::void_function_obj_invoker0<gr::thread::thread_body_wrapper<gr::tpb_container>, >> >> >> void>::invoke(boost::detail::function::function_buffer&) () >> >> >> from >> >> >> >> >> >> >> >> >> /home/kelly/src/gnuradio/build/gnuradio-runtime/lib/libgnuradio-runtime-3.8git.so.0.0.0 >> >> >> ---Type <return> to continue, or q <return> to quit--- >> >> >> #6 0x00007fe8f6526610 in >> >> >> boost::detail::thread_data<boost::function0<void> >> >> >>>::run() () >> >> >> from >> >> >> >> >> >> >> >> >> /home/kelly/src/gnuradio/build/gnuradio-runtime/lib/libgnuradio-runtime-3.8git.so.0.0.0 >> >> >> #7 0x00007fe8f9adc94a in ?? () >> >> >> from /usr/lib/x86_64-linux-gnu/libboost_thread.so.1.53.0 >> >> >> #8 0x00007fe8fc8a3f6e in start_thread (arg=0x7fe8e2ffd700) >> >> >> at pthread_create.c:311 >> >> >> #9 0x00007fe8fc5ce9cd in clone () >> >> >> at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 >> >> >> >> >> >> Of course, I had to recompile it with debugging info to glean >> >> >> anything >> >> >> useful from the stack trace. So, I did that and I traced the bug to >> >> >> this >> >> >> line: >> >> >> >> >> >> c0Val = _mm256_mul_ps(a0Val, b0Val); >> >> >> >> >> >> I can't dump the values in a0Val or b0Val, though, because they're >> >> >> intermediate values that are optimized away by the optimized kernel >> >> >> code. I >> >> >> tried stepping through the assembler instructions but I'm not >> >> >> familiar >> >> >> with >> >> >> the various sse and avx extensions. Heck, I'm not even familiar with >> >> >> the >> >> >> x86_64 instruction set. So I have a huge learning curve ahead of >> >> >> me, >> >> >> there. >> >> >> Is it possible to just dump the values in these __m256 data types to >> >> >> a >> >> >> file >> >> >> so I can debug it that way? If that's not easy to do, then I'm >> >> >> willing >> >> >> to >> >> >> learn what I have to about the instruction set so I can debug this >> >> >> thing. >> >> >> But I would sure appreciate some help if anyone has some advice to >> >> >> offer. >> >> >> >> >> >> Software version: >> >> >> I rebased to the latest version of the next branch last night before >> >> >> I >> >> >> went >> >> >> to bed at around 1:30 am CDT. >> >> >> >> >> >> Operating System: >> >> >> kelly@octs2:~/src/gnuradio/volk/kernels/volk$ uname -a >> >> >> Linux octs2 3.11.0-17-generic #31-Ubuntu SMP Mon Feb 3 21:52:43 UTC >> >> >> 2014 >> >> >> x86_64 x86_64 x86_64 GNU/Linux >> >> >> It's Ubuntu 13.10 >> >> >> >> >> >> Hardware: ASUS X750J >> >> >> Intel Quad Core i7 4700HQ 2.4GHz >> >> >> >> >> >> cpuinfo: >> >> >> processor : 7 >> >> >> vendor_id : GenuineIntel >> >> >> cpu family : 6 >> >> >> model : 60 >> >> >> model name : Intel(R) Core(TM) i7-4700HQ CPU @ 2.40GHz >> >> >> stepping : 3 >> >> >> microcode : 0x8 >> >> >> cpu MHz : 2401.000 >> >> >> cache size : 6144 KB >> >> >> physical id : 0 >> >> >> siblings : 8 >> >> >> core id : 3 >> >> >> cpu cores : 4 >> >> >> apicid : 7 >> >> >> initial apicid : 7 >> >> >> fpu : yes >> >> >> fpu_exception : yes >> >> >> cpuid level : 13 >> >> >> wp : yes >> >> >> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge >> >> >> mca >> >> >> cmov >> >> >> pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx >> >> >> pdpe1gb >> >> >> rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology >> >> >> nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl >> >> >> vmx >> >> >> est >> >> >> tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt >> >> >> tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat >> >> >> epb >> >> >> xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid >> >> >> fsgsbase >> >> >> tsc_adjust bmi1 avx2 smep bmi2 erms invpcid >> >> >> bogomips : 4789.27 >> >> >> clflush size : 64 >> >> >> cache_alignment : 64 >> >> >> address sizes : 39 bits physical, 48 bits virtual >> >> >> power management: >> >> >> >> >> > >> >> > >> >> > Hi Kelly, >> >> > >> >> > First, this is great debugging, thanks for getting so much info and >> >> > trying to go for a fix on your own. >> >> > >> >> > On to the good stuff. I was able to reproduce this on my i7-4700MQ. >> >> > Here's some additional info for the logs: >> >> > >> >> > * constellation_receiver is a hier block with a fir_filter_ccf inside >> >> > that is calling the volk avx dot product. >> >> > * The avx dot product proto-kernel passes VOLK QA >> >> > * The qa_fir_filter.py is testing a fir_filter_ccf that passes its >> >> > QA. >> >> > * Just for kicks, I forced VOLK to use the generic kernel and I still >> >> > see the segfault. >> >> > >> >> > A couple of things I'd like to try (and please feel free to give >> >> > these a >> >> > try): >> >> > * Go back to a commit just before fir_filter.cc started using >> >> > volk_malloc and volk_free. (or for bonus points go back to some >> >> > point >> >> > in time when this test always passes and do a git bisect) >> >> > * fiddle with parameters of the test, data length, number of taps in >> >> > filter, etc. >> >> > * Doubtful this would change, but test on different processors. It >> >> > would be pretty wild if there was something off in the 4700 line, but >> >> > the fact that the generic proto-kernel had the same result and nobody >> >> > else has reported this yet is suspicious. My guess is GCC is actually >> >> > emitting *very* similar code for the generic and avx dot product >> >> > proto-kernels. >> >> > >> >> > Nathan >> >> >> >> >> >> I was having similar issues this week with some AVX boxes. It looks >> >> like it's a problem using posix_memalign (which is called by >> >> volk_malloc if posix_memalign is available). Removing the use of >> >> posix_memalign solves my problem. I'll work with Nathan off-list to >> >> see about fixing this, possibly by removing the use of that version of >> >> malloc. >> >> >> >> Tom >> > >> > >> > _______________________________________________ >> > Discuss-gnuradio mailing list >> > Discuss-gnuradio@gnu.org >> > https://lists.gnu.org/mailman/listinfo/discuss-gnuradio >> > > > > > _______________________________________________ > Discuss-gnuradio mailing list > Discuss-gnuradio@gnu.org > https://lists.gnu.org/mailman/listinfo/discuss-gnuradio > _______________________________________________ Discuss-gnuradio mailing list Discuss-gnuradio@gnu.org https://lists.gnu.org/mailman/listinfo/discuss-gnuradio