On Sun, Nov 24, 2019 at 8:28 PM Michael Niedermayer <mich...@niedermayer.cc> wrote:
> On Sun, Nov 24, 2019 at 09:08:00AM +0000, Andreas Rheinhardt wrote: > > Michael Niedermayer: > > > On Wed, Nov 06, 2019 at 03:49:02AM +0100, Andreas Rheinhardt wrote: > > >> Up until now, the TrackUID of a Matroska track which is supposed to be > > >> random was not random at all: It always coincided with the TrackNumber > > >> which is usually the 1-based index of the corresponding stream in the > > >> array of AVStreams. This has been changed: It is now set via an AVLFG > > >> if AVFMT_FLAG_BITEXACT is not set. Otherwise it is set like it is set > > >> now (the only change happens if an explicit track number has been > > >> choosen via dash_track_number, because the system used in the normal > > >> situation is now used, too). In particular, no FATE tests need to be > > >> updated. > > >> > > >> This also fixes a bug in case the dash_track_number option was used: > > >> In this case the TrackUID was set to the track number, but the tags > were > > >> written with a TagTrackUID simply based upon the index, so that the > tags > > >> didn't apply to the track they ought to apply to. > > >> > > >> Signed-off-by: Andreas Rheinhardt <andreas.rheinha...@gmail.com> > > >> --- > > >> mkv_get_uid() might be overkill, but I simply wanted to be sure that > > >> there are no collisions. > > >> > > >> libavformat/matroskaenc.c | 65 > ++++++++++++++++++++++++++++++++++----- > > >> 1 file changed, 57 insertions(+), 8 deletions(-) > > >> > > >> diff --git a/libavformat/matroskaenc.c b/libavformat/matroskaenc.c > > >> index de57e474be..b87d15b7ff 100644 > > >> --- a/libavformat/matroskaenc.c > > >> +++ b/libavformat/matroskaenc.c > > >> @@ -94,6 +94,7 @@ typedef struct mkv_cues { > > >> typedef struct mkv_track { > > >> int write_dts; > > >> int has_cue; > > >> + uint32_t uid; > > >> int sample_rate; > > >> int64_t sample_rate_offset; > > >> int64_t last_timestamp; > > >> @@ -1199,8 +1200,7 @@ static int mkv_write_track(AVFormatContext *s, > MatroskaMuxContext *mkv, > > >> track = start_ebml_master(pb, MATROSKA_ID_TRACKENTRY, 0); > > >> put_ebml_uint (pb, MATROSKA_ID_TRACKNUMBER, > > >> mkv->is_dash ? mkv->dash_track_number : i + 1); > > >> - put_ebml_uint (pb, MATROSKA_ID_TRACKUID, > > >> - mkv->is_dash ? mkv->dash_track_number : i + 1); > > >> + put_ebml_uint (pb, MATROSKA_ID_TRACKUID, mkv->tracks[i].uid); > > >> put_ebml_uint (pb, MATROSKA_ID_TRACKFLAGLACING , 0); // no > lacing (yet) > > >> > > >> if ((tag = av_dict_get(st->metadata, "title", NULL, 0))) > > >> @@ -1650,7 +1650,8 @@ static int mkv_write_tags(AVFormatContext *s) > > >> if (!mkv_check_tag(st->metadata, > MATROSKA_ID_TAGTARGETS_TRACKUID)) > > >> continue; > > >> > > >> - ret = mkv_write_tag(s, st->metadata, > MATROSKA_ID_TAGTARGETS_TRACKUID, i + 1); > > >> + ret = mkv_write_tag(s, st->metadata, > MATROSKA_ID_TAGTARGETS_TRACKUID, > > >> + mkv->tracks[i].uid); > > >> if (ret < 0) return ret; > > >> } > > >> > > >> @@ -1658,13 +1659,15 @@ static int mkv_write_tags(AVFormatContext *s) > > >> for (i = 0; i < s->nb_streams; i++) { > > >> AVIOContext *pb; > > >> AVStream *st = s->streams[i]; > > >> + mkv_track *track = &mkv->tracks[i]; > > >> ebml_master tag_target; > > >> ebml_master tag; > > >> > > >> if (st->codecpar->codec_type == AVMEDIA_TYPE_ATTACHMENT) > > >> continue; > > >> > > >> - mkv_write_tag_targets(s, > MATROSKA_ID_TAGTARGETS_TRACKUID, i + 1, &tag_target); > > >> + mkv_write_tag_targets(s, MATROSKA_ID_TAGTARGETS_TRACKUID, > > >> + track->uid, &tag_target); > > >> pb = mkv->tags_bc; > > >> > > >> tag = start_ebml_master(pb, MATROSKA_ID_SIMPLETAG, 0); > > >> @@ -1863,10 +1866,6 @@ static int mkv_write_header(AVFormatContext *s) > > >> version = 4; > > >> } > > >> > > >> - mkv->tracks = av_mallocz_array(s->nb_streams, > sizeof(*mkv->tracks)); > > >> - if (!mkv->tracks) { > > >> - return AVERROR(ENOMEM); > > >> - } > > >> ebml_header = start_ebml_master(pb, EBML_ID_HEADER, > MAX_EBML_HEADER_SIZE); > > >> put_ebml_uint (pb, EBML_ID_EBMLVERSION , 1); > > >> put_ebml_uint (pb, EBML_ID_EBMLREADVERSION , 1); > > >> @@ -2667,8 +2666,42 @@ static int webm_query_codec(enum AVCodecID > codec_id, int std_compliance) > > >> return 0; > > >> } > > >> > > >> +static uint32_t mkv_get_uid(const mkv_track *tracks, int i, AVLFG *c) > > >> +{ > > >> + uint32_t uid; > > >> + > > >> + for (int j = 0, k; j < 5; j++) { > > >> + uid = av_lfg_get(c); > > >> + if (!uid) > > >> + continue; > > >> + for (k = 0; k < i; k++) { > > >> + if (tracks[k].uid == uid) { > > >> + /* Was something wrong with our random seed? */ > > >> + av_lfg_init(c, av_get_random_seed()); > > >> + break; > > >> + } > > >> + } > > >> + if (k == i) > > >> + return uid; > > >> + } > > >> + > > >> + /* Test the numbers from 1 to i. */ > > >> + for (int j = 1, k; j < i + 1; j++) { > > >> + for (k = 0; k < i; k++) { > > >> + if (tracks[k].uid == j) > > >> + break; > > >> + } > > >> + if (k == i) > > >> + return j; > > >> + } > > >> + /* Return i + 1. It can't coincide with another uid. */ > > >> + return i + 1; > > >> +} > > >> + > > >> static int mkv_init(struct AVFormatContext *s) > > >> { > > >> + MatroskaMuxContext *mkv = s->priv_data; > > >> + AVLFG c; > > >> int i; > > >> > > >> if (s->nb_streams > MAX_TRACKS) { > > >> @@ -2697,7 +2730,23 @@ static int mkv_init(struct AVFormatContext *s) > > >> s->internal->avoid_negative_ts_use_pts = 1; > > >> } > > >> > > >> + mkv->tracks = av_mallocz_array(s->nb_streams, > sizeof(*mkv->tracks)); > > >> + if (!mkv->tracks) { > > >> + return AVERROR(ENOMEM); > > >> + } > > >> + > > > > > >> + if (!(s->flags & AVFMT_FLAG_BITEXACT)) > > >> + av_lfg_init(&c, av_get_random_seed()); > > > > > > would there be a disadvantage if the configuration of a stream / its > content > > > is used a seed ? > > > That way it would be more deterministic > > > > > I see several disadvantages: > > 1. If one duplicates a stream (i.e. muxes it twice in the same file), > > they would get the same uid (unless one takes additional measures). > > If you took the configuration of all streams as a single seed that would > not > happen > > > > 2. If the packet content is used, one can determine the uid only after > > having received a packet. This would mean that one has to postpone > > writing the Tracks element until one has a packet and the same goes > > for the tags given that the uid is used to convey what they apply to. > > One could solve this problem in the seekable, non-livestreaming case > > by reserving size to be overwritten later, but this is nontrivial and > > I'd like to avoid that. > > 3. If the configuration record is used, then you won't get anything > > remotely unique: Using the same encoder settings will likely produce > > files with identical uids. > > > > Furthermore, this approach means that I can reuse the code for both > > attachments as well as tracks. (See the next patch in this series.) > > fair enough, if its considered that avoiding some complexity is more > important than deterministic output > Just to be sure: The output will be deterministic if the bitexact flag is set. I don't see a reason for deterministic output if it is not set. > > Either way if you need X bits of entropy in the muxer there should be > X/32 calls to av_get_random_seed(). as it is there are 4 independant > places calling av_get_random_seed() after the patch each using 32bit > to initialize an independant LFG. > After the next patch, there will only be three; and usually only two of them are used (the one in mkv_get_uid is usually not used). > seed collisions are not at all impossible at 32bits > also av_get_random_seed() can be expensive on some platforms so > i think there should be a single place calling it as needed and not > throwing the obtained entropy away. > I didn't deem this important performance-wise, as this is only done a few times during init. But if you want to, I can reuse the AVLFG from mkv_init for the segment UID. > > Also, why 32bit ? arent these "uint" UIDs 64bit ? > > Most UIDs (except the Segment UIDs) are 64bit. The reason for 32bit is that this patchset was already pretty long, so I didn't include a patch to make these 64bit. (There are some issues with the chapter UIDs: The Matroska demuxer implicitly only uses the lower 32bits without making sure that these numbers are distinct; and in an edge case the written chapter ID can be zero despite this being against the spec: Namely if one of the chapter IDs is INT_MIN and another one is INT_MAX.) > and, this > + /* Test the numbers from 1 to i. */ > + for (int j = 1, k; j < i + 1; j++) { > + for (k = 0; k < i; k++) { > + if (tracks[k].uid == j) > + break; > + } > + if (k == i) > + return j; > + } > > just smells "wrong" > > This code would only be executed if using AVLFG would repeatedly lead to collisions; it is just there to guarantee that the UIDs are different, but actually it is not intended to be run. > if you want a simple non repeating UID, just count 0,1,2,3 for each > and put that in the unused upper 32bit for example > > or you can even ditch LFG and use something that findamentgally generates > non repeating output. > UID[i] = seed + i > comes to mind > or if theres a reason why that is not good then you can also do something > like > UID[i] = (seed + i*(seed2|1)) & 0xFFFFFFFFFFFFFFFF; > > but maybe theres a reason behind the more complex UID generation that > iam missing ? > > The reason I've chosen this is that this is what has already been used for the UIDs of attachments, so I simply wanted to reuse this, while also fixing its defects (namely that it does not ensure that the resulting UIDs are actually distinct). - Andreas _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".