[Discuss-gnuradio] Diagnosing why a flowgraph occasionally stops processing samples

Joe K Tue, 19 Feb 2019 10:55:32 -0800

Hi everybody,

I have a very complex flowgraph that sometimes simply stops processing
receive samples.  It doesn't crash, the transmit side of the flowgraph is
still fully operational.  There are no exceptions being thrown and nothing
indicating that any threads are dying.


This makes use of a lot of out-of-tree custom modules.  I've seen similar
things in situations where I've screwed up in my OOT modules--for example,
if a sync block uses set_min_noutput_items with a large value and the
upstream block can only produce a small number of samples, it seems
possible to stall the flowgraph.

The source is definitely not the cause of the stoppage--this happens even
in simulation where the input is just a noise source.  It may take hours of
running for the flowgraph to stall.  Or minutes.  It seems very random.

My question is:  are there any tools available to me to help determine
what's causing the stall?  I've tried using GDB, which is difficult with
~100 threads and it just seems that most things are at semaphore waits.  I
don't know that I can deduce anything else from GDB.  I really do think
it's ultimately a logical issue like what I previously described where I'm
mistreating the scheduler and giving it a situation it cannot cope with.

Basically, can I poke the scheduler and say "tell me what's going on"?  I'd
love to get some data on the last several rounds of forecast() calls and
details on the work function calls, e.g., "Block such-and-such was provided
X inputs and space for Y outputs, it consumed A and produced B"

Thanks!

Joe

_______________________________________________
Discuss-gnuradio mailing list
Discuss-gnuradio@gnu.org
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio

[Discuss-gnuradio] Diagnosing why a flowgraph occasionally stops processing samples

Reply via email to