On 07/17/2014 10:04 PM, Nowlan, Sean wrote:
I don't see this requirement on ordered generation documented. In some
cases, it may be inconvenient to do this, e.g. when a block's analysis
discovers after-the-fact that something interesting can be associated
with a past sample. Similarly, a user might want a block to associate
a tag with sample that not yet arrived, to notify a downstream block
that will need to process the event.
I don't think that ordered generation is required per se, but certain blocks sort and
others don't. For instance, the tag_work function in usrp_sink_impl.cc "does"
sort precisely because get_tags_in_range doesn't.
My point is really that, because the infrastructure doesn't sort, only
blocks that are aware of the problem have compensated for it. Other
blocks are dropping data. This could be solved in the infrastructure
with a stable sort in get_tags_in_range or add_item_tags. (If the
latter, then the infrastructure could also diagnose violations of the
offset-must-be-in-valid-range expectation, which might be helpful.)
A simple solution for the infrastructure is to require that tags only be
generated from within work(), with offsets corresponding to samples
generated in that call to work(), and in non-decreasing offset order
(though this last requirement could be handled by add_item_tag()). The
developer must then handle the too-late/too-early tag associations
through some other mechanism, such as carrying the effective offset as
part of the tag value.
As far as I'm aware, adding tags from within work is the only safe way to add
tags to a stream. Also, it is required that offsets correspond to the valid
range spanning the buffer of input items passed to work. The scheduler prunes
others outside this range. It's also worth noting that although the history
mechanism allows viewing past samples (filters use this, for example),
attempting to add tags to samples in history will not work; those tags will be
pruned.
If tags need to be stored for future processing in subsequent calls to work,
it's up to the programmer to push them onto a stack/queue/whatever inside the
block. The scheduler won't handle this.
Thanks; that confirms and is consistent with my expectations.
(4) The in-memory stream of tags can produce multiple settings of the
same key at the same offset. However, when stored to a file only the
last setting of the key is recorded.
I believe this last behavior is incorrect and that it's a mistake to use
a map instead of a multimap or simple list for the metadata record of
stream tags associated with a sample.
One argument is that it's critical that a stream archive of a processing
session faithfully record the contents of the stream so that re-running
the application using playback reproduces that stream and thus the
original behavior (absent non-determinism due to asynchrony). This
faithful reproduction is what would allow a maintainer to diagnose an
operational failure caused by a block with a runtime failure when the
same tag is processed twice at the same offset. This is true even if
the same key is set to the same value at the same sample offset multiple
times, which some might otherwise want to argue is redundant.
A corollary argument is that the sample number at which an event like a
tuner configuration change occurs usually can't be exactly associated
with a sample; the best estimate is likely to be the index of the first
sample generated by the next call to work. But depending on processing
speed an application might change an attribute of a data source multiple
times before work was invoked. The effect of those intermediate changes
may be visible in the signal, and to lose the fact they occurred by
discarding all but the last change affects both reproducibility and
interpretation of the signal itself.
I agree this is a problem, but I don't see a workaround as the data plane
(work, streams, etc.) is asynchronous to the control logic. On the bright side,
I believe the USRP source block does associate tuner, sample rate, etc. changes
with an absolute sample in the stream, but this set of features doesn't
necessarily extend to other hardware data sources. As for other asynchronous
events generating stream tags, I think the user is stuck dealing with the
inevitable latency unless the data source can produce metadata that is tightly
coupled in time and pass that information along to GNU Radio.
Inaccuracy in identifying the associated sample is something we have to
live with, yes. My argument is that GNU Radio's stream tag
infrastructure (including storage as metadata) needs to accommodate this
by not dropping tags based solely on offset and key (and value), because
the "duplication" may actually carry information. So an offset-specific
map from key alone is the wrong data structure for tag storage.
With fork and join flows the tag propagation policy might introduce
replications. A candidate workaround is a unique identifier, added
internally by gr::block::add_item_tag, which can be used to identify and
drop redundant tag instances as they're propagated. That identifier
must be unique across all blocks in the system, not just an
block-specific ordinal, since the tag srcid is optional. It need not be
preserved in archived metadata, though, since at that point we "know"
the tags are complete and unique; new identifiers would be added when
archived tags are replayed as a live stream.
As background: I'm digging into this because I plan to update
gr-osmosdr's rtlsdr_source so I know the sample rate, frequency, gain,
and collection time of the signal, and (roughly) where they changed.
Mostly because I keep collecting files with captured and processed data
for analysis, and have no idea what parameters I used to generate them.
Preserving metadata with signal data in a single archive package is
really important to me.
Peter
_______________________________________________
Discuss-gnuradio mailing list
Discuss-gnuradio@gnu.org
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio