Re: [Discuss-gnuradio] comments on stream tags and metadata storage

Peter A. Bigot Fri, 18 Jul 2014 04:17:28 -0700

On 07/17/2014 10:04 PM, Nowlan, Sean wrote:

I don't see this requirement on ordered generation documented.  In some
cases, it may be inconvenient to do this, e.g. when a block's analysis
discovers after-the-fact that something interesting can be associated
with a past sample.  Similarly, a user might want a block to associate
a tag with sample that not yet arrived, to notify a downstream block
that will need to process the event.

I don't think that ordered generation is required per se, but certain blocks sort and 
others don't. For instance, the tag_work function in usrp_sink_impl.cc "does" 
sort precisely because get_tags_in_range doesn't.

My point is really that, because the infrastructure doesn't sort, onlyblocks that are aware of the problem have compensated for it. Otherblocks are dropping data. This could be solved in the infrastructurewith a stable sort in get_tags_in_range or add_item_tags. (If thelatter, then the infrastructure could also diagnose violations of theoffset-must-be-in-valid-range expectation, which might be helpful.)

A simple solution for the infrastructure is to require that tags only be
generated from within work(), with offsets corresponding to samples
generated in that call to work(), and in non-decreasing offset order
(though this last requirement could be handled by add_item_tag()).  The
developer must then handle the too-late/too-early tag associations
through some other mechanism, such as carrying the effective offset as
part of the tag value.

As far as I'm aware, adding tags from within work is the only safe way to add 
tags to a stream. Also, it is required that offsets correspond to the valid 
range spanning the buffer of input items passed to work. The scheduler prunes 
others outside this range. It's also worth noting that although the history 
mechanism allows viewing past samples (filters use this, for example), 
attempting to add tags to samples in history will not work; those tags will be 
pruned.

If tags need to be stored for future processing in subsequent calls to work, 
it's up to the programmer to push them onto a stack/queue/whatever inside the 
block. The scheduler won't handle this.


Thanks; that confirms and is consistent with my expectations.

(4) The in-memory stream of tags can produce multiple settings of the
same key at the same offset.  However, when stored to a file only the
last setting of the key is recorded.

I believe this last behavior is incorrect and that it's a mistake to use
a map instead of a multimap or simple list for the metadata record of
stream tags associated with a sample.

One argument is that it's critical that a stream archive of a processing
session faithfully record the contents of the stream so that re-running
the application using playback reproduces that stream and thus the
original behavior (absent non-determinism due to asynchrony). This
faithful reproduction is what would allow a maintainer to diagnose an
operational failure caused by a block with a runtime failure when the
same tag is processed twice at the same offset.  This is true even if
the same key is set to the same value at the same sample offset multiple
times, which some might otherwise want to argue is redundant.

A corollary argument is that the sample number at which an event like a
tuner configuration change occurs usually can't be exactly associated
with a sample; the best estimate is likely to be the index of the first
sample generated by the next call to work.  But depending on processing
speed an application might change an attribute of a data source multiple
times before work was invoked.  The effect of those intermediate changes
may be visible in the signal, and to lose the fact they occurred by
discarding all but the last change affects both reproducibility and
interpretation of the signal itself.

I agree this is a problem, but I don't see a workaround as the data plane 
(work, streams, etc.) is asynchronous to the control logic. On the bright side, 
I believe the USRP source block does associate tuner, sample rate, etc. changes 
with an absolute sample in the stream, but this set of features doesn't 
necessarily extend to other hardware data sources. As for other asynchronous 
events generating stream tags, I think the user is stuck dealing with the 
inevitable latency unless the data source can produce metadata that is tightly 
coupled in time and pass that information along to GNU Radio.

Inaccuracy in identifying the associated sample is something we have tolive with, yes. My argument is that GNU Radio's stream taginfrastructure (including storage as metadata) needs to accommodate thisby not dropping tags based solely on offset and key (and value), becausethe "duplication" may actually carry information. So an offset-specificmap from key alone is the wrong data structure for tag storage.

With fork and join flows the tag propagation policy might introducereplications. A candidate workaround is a unique identifier, addedinternally by gr::block::add_item_tag, which can be used to identify anddrop redundant tag instances as they're propagated. That identifiermust be unique across all blocks in the system, not just anblock-specific ordinal, since the tag srcid is optional. It need not bepreserved in archived metadata, though, since at that point we "know"the tags are complete and unique; new identifiers would be added whenarchived tags are replayed as a live stream.

As background: I'm digging into this because I plan to updategr-osmosdr's rtlsdr_source so I know the sample rate, frequency, gain,and collection time of the signal, and (roughly) where they changed.Mostly because I keep collecting files with captured and processed datafor analysis, and have no idea what parameters I used to generate them.Preserving metadata with signal data in a single archive package isreally important to me.


Peter

_______________________________________________
Discuss-gnuradio mailing list
Discuss-gnuradio@gnu.org
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio

Re: [Discuss-gnuradio] comments on stream tags and metadata storage

Reply via email to