On Sat, Mar 28, 2015 at 5:32 PM, Andy Walls <a...@silverblocksystems.net> wrote:
> When testing, I used 5 float streams rumning at over 150 Msps each, with > 15 microsecomd bursts of 50 MHz at about 10 microseconds apart. I used > enough x points to see two bursts on the gui. Normal trigger. (Free or auto > trigger moght be too taxing.) > > -Regards > Andy > Andy, if you have a chance, can you check out this new branch: https://github.com/trondeau/gnuradio/tree/qtgui/controlpanel It adds the fixes that we talked about. I just want to verify that things are still looking and behaving well for you. The other trick of this branch is if you go into the QT GUI Time Sink properties and turn "Control Panel" to Yes. I wouldn't mind a quick bit of feedback there, either. Tom > On March 28, 2015 8:06:08 PM EDT, Tom Rondeau <t...@trondeau.com> wrote: >> >> On Sat, Mar 28, 2015 at 12:50 PM, Andy Walls <a...@silverblocksystems.net >> > wrote: >> >>> On Sat, 2015-03-28 at 14:45 -0400, Andy Walls wrote: >>> > Hi Tom: >>> > >>> > >>> > On Sat, 2015-03-28 at 11:12 -0700, Tom Rondeau wrote: >>> > > On Sat, Mar 28, 2015 at 11:00 AM, Andy Walls >>> > > <a...@silverblocksystems.net> wrote: >>> > >>> > > Can this memmove() be safely skipped >>> > > >>> > > >>> https://github.com/gnuradio/gnuradio/blob/master/gr-qtgui/lib/time_sink_f_impl.cc#L627 >>> > [snip] >>> > > The volk_32f_convert_64f_u_avx() call is unavoidable as Qwt >>> > > wants >>> > > doubles for plotting and not floats. But it might also be >>> able >>> > > to be >>> > > deferred to the very end when the decision to plot is known >>> > > for sure. >>> > > (But that's more surgery than I care to take on at the >>> > > moment.) >>> > >>> >>> > >>> > > But thinking about the volk convert function, that's both copying >>> the >>> > > data from the input buffer into the internal buffer as well as >>> > > performing the conversion. We can't just hold data in the input since >>> > > we don't want to back up the data until we're ready to plot both with >>> > > timing and with a full enough buffer -- it's just sampling a section >>> > > at a time and drops everything in between. >>> > >>> > Right. >>> > >>> > > That part could be converted into a memcpy instead of the volk >>> > > convert. Then, when we're ready to plot, we call the volk convert >>> that >>> > > also does the move from d_start to 0, so it combines those two >>> > > elements. >>> > >>> > Yeah, that's the surgery part. :) It would require adding a new set of >>> > buffers to hold floats objects, and then convert them when a >>> > determination to plot was made. >>> > >>> > This also affects the memmove() of the tail for the trigger delay. It >>> > would operate on the new set of float buffers (vs the buffers holding >>> > doubles). >>> > >>> > > Thoughts on those proposals? >>> >>> Your proposal for implementing memcpy() and deferring volk_*() to do the >>> conversion and "memmove" in one step is great! :) >>> >>> I just implemented it, and the time_sink_f thread has gone from 41.5% >>> CPU down to 29.1% CPU in my tests. :) memcpy() now dominates the >>> thread, but that's to be expected. >>> >>> >>> >>> With my initial hack: >>> >>> > CPU: Intel Sandy Bridge microarchitecture, speed 3.5e+06 MHz >>> (estimated) >>> > Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a >>> unit mask of 0x00 (No unit mask) count 100000 >>> > samples % image name symbol name >>> > 78158 39.0737 libvolk.so.0.0.0 volk_32f_convert_64f_u_avx >>> > 22777 11.3870 no-vmlinux /no-vmlinux >>> > 13972 6.9851 libgnuradio-qtgui-3.7.7git.so.0.0.0 >>> gr::qtgui::time_sink_f_impl::_test_trigger_slope(float const*) const >>> > 7781 3.8900 libgnuradio-qtgui-3.7.7git.so.0.0.0 >>> gr::qtgui::time_sink_f_impl::_test_trigger_norm(int, std::vector<void >>> const*, std::allocator<void const*> >) >>> > 7236 3.6175 libpthread-2.18.so pthread_mutex_lock >>> > 6163 3.0811 libgnuradio-runtime-3.7.7git.so.0.0.0 >>> boost::detail::sp_counted_base::release() >>> > 5942 2.9706 libpthread-2.18.so pthread_mutex_unlock >>> > 4947 2.4732 libgnuradio-runtime-3.7.7git.so.0.0.0 >>> gr::block_executor::run_one_iteration() >>> > 3826 1.9127 libgnuradio-runtime-3.7.7git.so.0.0.0 >>> gr::block_detail::input(unsigned int) >>> > 3555 1.7773 libstdc++.so.6.0.19 >>> /usr/lib64/libstdc++.so.6.0.19 >>> > 3206 1.6028 libc-2.18.so __memmove_ssse3_back >>> > [...] >>> >>> With my implementation of your suggestion: >>> >>> CPU: Intel Sandy Bridge microarchitecture, speed 3.5e+06 MHz (estimated) >>> Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a >>> unit mask of 0x00 (No unit mask) count 90000 >>> samples % image name symbol name >>> 27595 35.6051 libc-2.18.so __memcpy_sse2_unaligned >>> 12225 15.7736 no-vmlinux /no-vmlinux >>> 4051 5.2269 libpthread-2.18.so pthread_mutex_lock >>> 3739 4.8243 libgnuradio-runtime-3.7.7git.so.0.0.0 >>> boost::detail::sp_counted_base::release() >>> 3362 4.3379 libpthread-2.18.so pthread_mutex_unlock >>> 2876 3.7108 libgnuradio-runtime-3.7.7git.so.0.0.0 >>> gr::block_executor::run_one_iteration() >>> 2364 3.0502 libgnuradio-runtime-3.7.7git.so.0.0.0 >>> gr::block_detail::input(unsigned int) >>> 2091 2.6980 libstdc++.so.6.0.19 /usr/lib64/libstdc++.so.6.0.19 >>> 1388 1.7909 libgnuradio-runtime-3.7.7git.so.0.0.0 >>> gr::tpb_detail::notify_upstream(gr::block_detail*) >>> 1138 1.4683 libc-2.18.so __memmove_ssse3_back >>> [...] >>> 2 0.0026 libvolk.so.0.0.0 __volk_32f_convert_64f_d >>> [...] >>> 1 0.0013 libvolk.so.0.0.0 volk_32f_convert_64f_a_avx >>> >>> >>> Regards, >>> Andy >>> >> >> >> Andy, >> >> Excellent! >> >> I've got a few other minor patches for some things, I'll put this in >> there to and test on my end as well. >> >> Tom >> >> >>
_______________________________________________ Discuss-gnuradio mailing list Discuss-gnuradio@gnu.org https://lists.gnu.org/mailman/listinfo/discuss-gnuradio