I forgot some important information. There are four things I am doing
out of normal:

1) I have a couple of data structure maintenance threads that run once a
second. They are created like this:

    static std::thread builder( gr::adsb::do_build );

These threads have a std::mutex lock around their respective data
structures. I also include -pthread on the compile and link lines.

I am unsure whether this is a contributor.

2) I use OpenMP via -fopenmp (e.g., "#omp parallel for"). Removing
-fopenmp from the compile line has no impact on overflow.

3) I am compiling using -std=c++11 against g++ 4.9, the stock compiler.
I am using some of c++11's keywords and constructs. I suspect this is
part of the problem however removing it will require work. I remember
reading somewhere "c++11 IS NOT supported" but nowhere did it say it
won't work.

4) I am NOT using boost, rather I am using standard data structures
(e.g., vector<tuple<FOO>>). I am also using high level math, such as
std::log10() and std::pow(). 

If I remove most of the blocks from my graph with the result:

  source --> dc block --> Preamble --> null

with the statement:

      return noutput_items;

at the beginning of general_work() in Preamble, I have overflows and
gr-perf-monitorx shows a thick red line from:

 optimize_c0 -> hack_rf_source_c0 -> dc_blocker_cc0 --> Preamble

with dc_blocker_cc0 depicted as a large blue square.

My suspicion is there is some low level interaction that is in conflict
with the scheduler/runtime but presently struggling how to
debug/prove/disprove it.

On Sun, 2015-07-12 at 14:06 -0700, Dennis Glatting wrote:
> (Resent with pix removed.)
> I am looking for pointers and papers on the overhead of the scheduler,
> its performance, and high(?) data rates.
> I enclosed a partial pix of my graph. The essence is:
>   HackRF -> DC Block -> My Preamble Detect
> There are other blocks in the graph but they do very little. BTW, the
> sample rate is 10msps.
> What is happening is overflow events from the HackRF code inside
> osmocom. Even if I modify my Preamble detector to return noutput_items
> at the begging or general_work() I still get overflow events.
> I increased the buffer size between blocks from 32k to 128k
> (GR_FIXED_BUFFER_SIZE in flat_flowgraph.cc). No impact.
> I increased the priority on the DC blocker (dc_blocker_ff_impl.cc) and
> my Preamble detector in their constructors (below). No impact.
>       set_thread_priority( thread_priority() + 1 );
> According to gr-ctrlport-monitor, the average time in the DC Blocker is
> 1,600,000, which I believe is 1.6ms, Preamble 400us, and HackRF 40us
> against the clock tick of 1,000,000,000 (gr::high_res_timer_tps()). 
> (I should mention I'm running on an 8core, 5GHz processor with 32GB of
> memory. You can't do much better than that.)
> The preamble detector has a lot of variance because a signal has to meet
> a list of criteria and is rejected after failing any one of them.
> Variance says: 1,100,000,000 but the interesting thing is variance
> substantially decays over time so I'm not sure that number is
> meaningful.  Regardless, even if I put a return statement at the head of
> general_work() in Preamble I still get buffer overflows.
> (Over the ten minutes I wrote this, the Preamble variance decayed from
> 1.1e9 to 7.9e8. I've seen it substantially, albeit slowly, decay and I
> suspect variance (block_detail.cc) has an initialization problem.)
> I spent a day inside the HackRF osmocom source (much coffee was
> involved) and substantially modified its innards. However, this problem
> persisted before I "operated." One of the source's problems was /inside/
> general work where it waited for a minimum of three buffers from the
> device executing a condition wait against a boost::condition_variable
> (below).
>   {
>     boost::mutex::scoped_lock lock( _buf_mutex );
>     while (_buf_used < 3 && running) // collect at least 3 buffers
>       _buf_cond.wait( lock );
>   }
> I suspect that code fragment (and others) is badness because it worked
> outside the graph/scheduler framework. Also, three buffers means 3x128k
> I/Q 8bit samples, which seems like a ridiculous amount to wait. 
> Samples are converted into gr_complex and pumped down the stream at the
> stream's capacity, so nproduced is always near the stream size. Yet the
> average work time is pretty low.
> A curious set of variables are shown in control port. The average
> nproduced is 15,773 but the average "output % full" is 0.52. How can
> that be? I read some comment somewhere that a buffer is split in half
> which, if true, the output buffer is really 15,773/16,536=0.95 (95%)
> however contrasted against GR_FIXED_BUFFER_SIZE (flat_flowgraph.cc),
> 131,072/sizeof(gr_compex)=16,384. Consequently, I'm really confused what
> those two numbers are telling me.
> (BTW, I also added a couple of perf rpc variables, notably overflow
> events and average (work in progress), to hackrf_source_c.cc (osmocom)
> because I suspected the output of "O" (below) causes the scheduler to
> hiccup.)
> int
> hackrf_source_c::hackrf_rx_callback(u_char *buf, uint32_t len) {
> ..
>    std::cerr << "O" << std::flush;
> ..
> The interesting thing about that code fragment is it is inside the
> device's callback, which I felt had unknown consequences and
> incrementing an event counter is a far better approach (i.e., to work
> within the framework as much as possible).
> I am considering adding code on overflow where all of the dirty buffers
> are flushed but I experimented and it had no impact. (Oh, and I modified
> the buffer management code from hard coded management to standard
> containers with an effort to minimize allocation/deallocation. It's a
> little cleaner.)
> (BTW, to anyone looking at HackRF, have you even wondered why you can
> modify the number of buffers but not buffer lengths (buflen)? It is
> because the buffer length is hard coded in libhackrf and
> hackrf_source_c.cc simply mirrors it.)
> --------------------------------------------------------------
> I missed a very important debug step. I went back through my graph
> deleting blocks. As deleted blocks the rate of overruns slowed but DID
> NOT reach zero, even when the graph looked liked this:
>   HackRF -> DC Block -> Null Sink
> --------------------------------------------------------------
> The result of all this nonsense is I am wondering about the scheduler's
> management overhead which IS NOT tracked in any way I found. (Please
> correct me if you know different.) It could be the scheduler's impact is
> zero and my code simply sucks -- like, that's never happened before! :)
> There are graphical output blocks in my graph, specifically two QT Time
> Sinks and one QT GUI Sink. From an average work time perspective I
> suspect these are non-issues but I also suspect the drawing is done
> outside the running of the blocks. 
> I am working on a Python equivalent of my graph but I'm not a Python
> hacker so that will take some time. I am curious to compare that
> performance against GRC.
> At this point I am somewhat clueless as to why I am getting overflow
> events.
