I forgot some important information. There are four things I am doing out of normal:
1) I have a couple of data structure maintenance threads that run once a second. They are created like this: static std::thread builder( gr::adsb::do_build ); These threads have a std::mutex lock around their respective data structures. I also include -pthread on the compile and link lines. I am unsure whether this is a contributor. 2) I use OpenMP via -fopenmp (e.g., "#omp parallel for"). Removing -fopenmp from the compile line has no impact on overflow. 3) I am compiling using -std=c++11 against g++ 4.9, the stock compiler. I am using some of c++11's keywords and constructs. I suspect this is part of the problem however removing it will require work. I remember reading somewhere "c++11 IS NOT supported" but nowhere did it say it won't work. 4) I am NOT using boost, rather I am using standard data structures (e.g., vector<tuple<FOO>>). I am also using high level math, such as std::log10() and std::pow(). If I remove most of the blocks from my graph with the result: source --> dc block --> Preamble --> null with the statement: return noutput_items; at the beginning of general_work() in Preamble, I have overflows and gr-perf-monitorx shows a thick red line from: optimize_c0 -> hack_rf_source_c0 -> dc_blocker_cc0 --> Preamble with dc_blocker_cc0 depicted as a large blue square. My suspicion is there is some low level interaction that is in conflict with the scheduler/runtime but presently struggling how to debug/prove/disprove it. On Sun, 2015-07-12 at 14:06 -0700, Dennis Glatting wrote: > (Resent with pix removed.) > > > I am looking for pointers and papers on the overhead of the scheduler, > its performance, and high(?) data rates. > > > I enclosed a partial pix of my graph. The essence is: > > HackRF -> DC Block -> My Preamble Detect > > There are other blocks in the graph but they do very little. BTW, the > sample rate is 10msps. > > What is happening is overflow events from the HackRF code inside > osmocom. Even if I modify my Preamble detector to return noutput_items > at the begging or general_work() I still get overflow events. > > I increased the buffer size between blocks from 32k to 128k > (GR_FIXED_BUFFER_SIZE in flat_flowgraph.cc). No impact. > > I increased the priority on the DC blocker (dc_blocker_ff_impl.cc) and > my Preamble detector in their constructors (below). No impact. > > set_thread_priority( thread_priority() + 1 ); > > According to gr-ctrlport-monitor, the average time in the DC Blocker is > 1,600,000, which I believe is 1.6ms, Preamble 400us, and HackRF 40us > against the clock tick of 1,000,000,000 (gr::high_res_timer_tps()). > > (I should mention I'm running on an 8core, 5GHz processor with 32GB of > memory. You can't do much better than that.) > > The preamble detector has a lot of variance because a signal has to meet > a list of criteria and is rejected after failing any one of them. > Variance says: 1,100,000,000 but the interesting thing is variance > substantially decays over time so I'm not sure that number is > meaningful. Regardless, even if I put a return statement at the head of > general_work() in Preamble I still get buffer overflows. > > (Over the ten minutes I wrote this, the Preamble variance decayed from > 1.1e9 to 7.9e8. I've seen it substantially, albeit slowly, decay and I > suspect variance (block_detail.cc) has an initialization problem.) > > I spent a day inside the HackRF osmocom source (much coffee was > involved) and substantially modified its innards. However, this problem > persisted before I "operated." One of the source's problems was /inside/ > general work where it waited for a minimum of three buffers from the > device executing a condition wait against a boost::condition_variable > (below). > > { > boost::mutex::scoped_lock lock( _buf_mutex ); > > while (_buf_used < 3 && running) // collect at least 3 buffers > _buf_cond.wait( lock ); > } > > I suspect that code fragment (and others) is badness because it worked > outside the graph/scheduler framework. Also, three buffers means 3x128k > I/Q 8bit samples, which seems like a ridiculous amount to wait. > > Samples are converted into gr_complex and pumped down the stream at the > stream's capacity, so nproduced is always near the stream size. Yet the > average work time is pretty low. > > A curious set of variables are shown in control port. The average > nproduced is 15,773 but the average "output % full" is 0.52. How can > that be? I read some comment somewhere that a buffer is split in half > which, if true, the output buffer is really 15,773/16,536=0.95 (95%) > however contrasted against GR_FIXED_BUFFER_SIZE (flat_flowgraph.cc), > 131,072/sizeof(gr_compex)=16,384. Consequently, I'm really confused what > those two numbers are telling me. > > (BTW, I also added a couple of perf rpc variables, notably overflow > events and average (work in progress), to hackrf_source_c.cc (osmocom) > because I suspected the output of "O" (below) causes the scheduler to > hiccup.) > > int > hackrf_source_c::hackrf_rx_callback(u_char *buf, uint32_t len) { > > .. > std::cerr << "O" << std::flush; > .. > > The interesting thing about that code fragment is it is inside the > device's callback, which I felt had unknown consequences and > incrementing an event counter is a far better approach (i.e., to work > within the framework as much as possible). > > I am considering adding code on overflow where all of the dirty buffers > are flushed but I experimented and it had no impact. (Oh, and I modified > the buffer management code from hard coded management to standard > containers with an effort to minimize allocation/deallocation. It's a > little cleaner.) > > (BTW, to anyone looking at HackRF, have you even wondered why you can > modify the number of buffers but not buffer lengths (buflen)? It is > because the buffer length is hard coded in libhackrf and > hackrf_source_c.cc simply mirrors it.) > > > UPDATE: > -------------------------------------------------------------- > I missed a very important debug step. I went back through my graph > deleting blocks. As deleted blocks the rate of overruns slowed but DID > NOT reach zero, even when the graph looked liked this: > > HackRF -> DC Block -> Null Sink > -------------------------------------------------------------- > > > The result of all this nonsense is I am wondering about the scheduler's > management overhead which IS NOT tracked in any way I found. (Please > correct me if you know different.) It could be the scheduler's impact is > zero and my code simply sucks -- like, that's never happened before! :) > > There are graphical output blocks in my graph, specifically two QT Time > Sinks and one QT GUI Sink. From an average work time perspective I > suspect these are non-issues but I also suspect the drawing is done > outside the running of the blocks. > > I am working on a Python equivalent of my graph but I'm not a Python > hacker so that will take some time. I am curious to compare that > performance against GRC. > > At this point I am somewhat clueless as to why I am getting overflow > events. > > > > > > > _______________________________________________ > Discuss-gnuradio mailing list > Discuss-gnuradio@gnu.org > https://lists.gnu.org/mailman/listinfo/discuss-gnuradio _______________________________________________ Discuss-gnuradio mailing list Discuss-gnuradio@gnu.org https://lists.gnu.org/mailman/listinfo/discuss-gnuradio