> -----Original Message----- > From: ffmpeg-devel <ffmpeg-devel-boun...@ffmpeg.org> On Behalf Of > Michael Niedermayer > Sent: Dienstag, 8. April 2025 21:45 > To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> > Subject: Re: [FFmpeg-devel] [RFC] AVDictionary2 > > On Tue, Apr 08, 2025 at 06:36:55PM +0000, softworkz . wrote: > > > > > > > -----Original Message----- > > > From: ffmpeg-devel <ffmpeg-devel-boun...@ffmpeg.org> On Behalf Of > > > Michael Niedermayer > > > Sent: Dienstag, 8. April 2025 20:16 > > > To: FFmpeg development discussions and patches <ffmpeg- > de...@ffmpeg.org> > > > Subject: Re: [FFmpeg-devel] [RFC] AVDictionary2 > > > > > > Hi softworkz > > > > > > On Tue, Apr 08, 2025 at 04:56:36PM +0000, softworkz . wrote: > > > [...] > > > > Hi Michael, > > > > > > > > it's been a while, but as far as memory serves, wasn't a linear > search > > > even more efficient than other methods as long as we're dealing with > no > > > more than a few dozens of items? > > > > > > a dozen is 12, so a few dozen would minimally be 24 > > > > > > at average to find an entry in a list of 24 you need 12 comparisions > > > with a > > > linear search and 24 in worst case > > > > > > an AVL tree with 24 entries i think needs 7 comparisions in the > worst > > > case > > > So its certainly faster in number of comparisions > > > > > > the cost of strcmp() and overhead then come into play but small sets > > > arent really what seperates the 2 choices. > > > The seperation happens with there are many entries. dictionary is > > > generic > > > if you had a million entries a linear search will take about a > million > > > comparisions, the AVL tree should need less than ~30 in the worst > case > > > thats 5 orders of magnitude difference > > > > > > > > > > > > > > In turn, my question would be whether we even have use cases with > > > hundreds or thousands of dictionary entries? > > > > > > We use dictionary for metadata and options mainly. > > > It would be possible to also use a linear list until the number of > > > entries reaches a threshold > > > > LOL, sorry I really didn't want to make it even more complicated. > > > > Sticking on that side for a moment though, what you have skipped in > the comparison above is the insertion cost, because the insertion cost > is what buys you the 7 instead of 24 (worst) or x instead of 12 > (average) comparisons on lookup. One of my takeaways in that area was > that there's always a break-even point below of which there's nothing to > win. > > > > At the bottom line, I love optimizations and for dictionaries with > larger amounts, everything you said is perfectly valid of course. What I > tried to ask is just whether we actually have any case of dictionary use > that would benefit from that kind of optimization? > > I know that years ago there was some case in the command line option > handling > where some linear search resulted in some O(n^3) which was noticable > I dont remember if that was a AVDictionary > > also, if we use a linear search, what should we do with a file that > contains 10k or 100k+ entries ? > and then something checks for example for each of these entries if > theres > a corresponidng one in the local language, so for 100k entries someone > could do a linear lookup that fails thus 100k * 100k > This is a constructed case but it sounds plausible to me with such a > file > > If we do a linear search then everyone needs to be carefull what they > use AVDictionary for.
All granted, and surely, the current implementation hardly deserves its name because it's definitely not what you would expect from a dictionary. I'm not responding arbitrarily to this topic. I had just recently spent some thought on it, as you'll quickly find out what it does - at least when working inside the Ffmpeg source. One of those cases is FFprobe output filtering by using -show_entries to select specific fields. There are two actual hot paths which are printing of frame and packet data (each including descendants). In case when -show_entries is specified, the desired fields are stored in an AVDictionary (per section). The frame section has 34 fields atm (https://softworkz.github.io/ffmpeg_output_apis/ffprobe_schema.html), so the maximum reasonable number of entries is 33, while in the typical case, the number is likely a lot less. Insert performance is irrelevant as it happens only once on start but the lookup count can be excessive - e.g. 34k lookups for 1k frames. In this context I was wondering whether a different dictionary implementation might provide any benefit and I concluded that it probably won't - at least not for small numbers of selected fields. In that context I made another experiment, trying to find the fastest way to print all video keyframe times using packet data only. The keyframe packet-filter was hard-coded and I used show_entries, specifying only pts_time for packets. Packet section has 11 fields atm, so 11k dictionary lookup for 1k packets. I compared that to a patch which only prints pts_time, completely skipping the regular printing code (no dictionary lookups) and gained only 15% (3s instead of 3.5s). So, eventually, the cost of those 11k lookups (albeit in a single-entry dictionary) was a lot less than expected. To tell you the truth - at that point I was thinking: "Ah, clever! That's why the AVDictionary is done like that" 😊 So, this is the background of my previous replies - otherwise of course, I have nothing against a better dictionary. Best, sw _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".