On 1/10/24 22:50, Pedro Falcato wrote: > On Wed, Jan 10, 2024 at 1:45 PM Laszlo Ersek <ler...@redhat.com> > wrote: >> >> (+ Andrew) >> >> On 1/10/24 14:09, Laszlo Ersek wrote: >> >>> I think that keeping the depex section read-only is valuable, so I'd >>> rule out #2. I'd also not start with option #1 -- copying the depex >>> to heap memory, just so this optimization can succeed. I'd simply >>> start by removing the optimization, and measuring how much driver >>> dispatch slows down in practice, on various platforms. >>> >>> Can you try this? (I have only build-tested and "uncrustified" it.) >>> >>> The patch removes the EFI_DEP_REPLACE_TRUE handling altogether, plus >>> it CONST-ifies the Iterator pointer (which points into the DEPEX >>> section), so that the compiler catch any possible accesses at *build >>> time* that would write to the write-protected DEPEX memory area. >> >> On a tangent: the optimization in question highlights a more general >> problem, namely that the DXE (and possibly MM/SMM) protocol databases >> are considered slow, for some access patterns. >> >> Edk2 implements those protocol databases with linked lists, where >> lookup costs O(n) operations (average and worst cases). And protocol >> lookups are quite frequent (for example, in depex evaluations, they >> could be considered "particularly frequent"). >> >> (1) The "Tasks" wiki page mentions something *similar* (but not the >> same); see >> >> https://github.com/tianocore/tianocore.github.io/wiki/Tasks#datahub--gcd-scalability >> >> The description is: "The DXE core's DataHub and GCD (Global Coherency >> Domain) layers don't scale well as the number of data items gets >> large, since they are based on simple linked lists. Find better data >> structures." > > How large do they usually get? What's the worst case?
No idea. > >> >> The same might apply more or less to the protocol database >> implementation. >> >> (2) More to the point, Andrew Fish reported earlier that at Apple, >> they had rewritten the DXE protocol database, using the red-black >> tree OrderedCollectionLib that I had contributed previously to edk2 >> -- and they saw significant performance improvements. >> >> So upstreaming that feature to edk2 could be very valuable. >> (Red-black trees have O(log(n)) time cost (worst case) for lookup, >> insertion and deletion, and O(n) cost for iterating through the whole >> data structure.) > > FWIW, can we do better than an RB tree? They're notoriously cache > unfriendly... Sure, if someone contributes a different data structure that is suitable for the task :) RB trees may be cache unfriendly, but the functionality they provide with O(log(n)) worst case performance is amazing. You can use them as read-write associate arrays, sorted lists supporting forward and backward traversal (in fact tools for sorting), priority queues, etc. When I contributed the code, edk2 didn't have any associative array type, so something generic that would address the widest range of use cases looked like a good idea. (And indeed the library has been well applied in several of those use cases since, in various parts of edk2 -- for sorting, uniqueness-checking, async interface token tracking & lookups.) This is not an argument against a more suitable data structure of course. Just pointing out that the RB tree has worked well thus far. E.g., under the BZ link below, Andrew mentions a diagnostic tool that creates 3000 handles. Looking up an element in a 3000-element list would cost 1500 iterations on average; using a balanced binary search tree it might cost ~11 iterations. Assuming that linked lists and linked binary search trees are similarly cache-unfriendly, that's a ~136 factor of improvement. > >> >> Let me see if I can find the bugzilla ticket... >> >> Ah, I got it. Apologies, I misremembered: Andrew's comment was not >> about the protocol database, but about the handle database. Here it >> is: >> >> https://bugzilla.tianocore.org/show_bug.cgi?id=988#c7 >> >> (the BZ is still CONFIRMED btw...) >> >> Still, I think it must be related in a way. Namely, an EFI handle >> exists if and only if at least one protocol interface is installed on >> it. If you uninstall the last protocol interface from a handle, then >> the handle is destroyed -- in fact that's the *only* way to destroy a >> handle, to my understanding. See >> EFI_BOOT_SERVICES.UninstallProtocolInterface() in the UEFI spec: "If >> the last protocol interface is removed from a handle, the handle is >> freed and is no longer valid". Furthermore, calling >> InstallProtocolInterface() and InstallMultipleProtocolInterfaces() is >> how one *creates* new handles. >> >> So given how handles and protocol interfaces are conceptually >> interlinked, an rbtree-based protocol DB might have to maintain >> multiple rbtrees internally (for the ability to search the database >> quickly with different types of "keys"). I don't have a design ready >> in my mind at all (I'm not that familiar with the *current*, >> list-based implementation to begin with!). Upstreaming Apple's >> (experimental?) code could prove very helpful. > > Ok, some thoughts: > > For the handle database, if we made EFI_HANDLE an integer instead, we > could very easily use something similar to a radix tree (see Linux's > xarray). This would provide O(log(n)) lookups and insertion with a > much higher fan-out, without needing CoreValidateHandle's awful O(n) > lookup every time it gets an EFI_HANDLE. You'd just look up the handle > in the tree and if NULL, it's invalid. [1] EFI_HANDLE is a typedef to (VOID*) per UEFI spec -- section 2.3.1 "Data Types" says: "A collection of related interfaces. Type VOID *" --, so EFI_HANDLE can't be redefined, it would break compatibility with everything. Internally, it could be cast to UINTN, and then those 32- or 64-bit long bit strings could be considered as keys for a radix tree. But I'm not sure if this would still be as useful as the two-level radix tree (with a fanout of 64) that you describe. (If we wanted to map the EFI_HANDLEs to the index space [0..4095], then we'd just land back at the original problem: storing the EFI_HANDLEs in an associative data structure, for example in a hash map. We could find a good hash function, but because it would mathematically reduce the key space from 64-bit pointers to 12-bit indices, collisions would be theoretically inevitable, and the data structure couldn't avoid *storing* the mapping. This is not a general argument against using a hash table, it just means that we'd have to provide a good initial table size, and then deal with resizing cost whenever necessary.) > For the protocol database, you'd replace the linked list with a simple > hashtable, hashed by protocol. Something as simple as LIST_ENTRY > mProtocolHashtable[64]; would probably be enough to fan out most of > the problems (I think? How many different protocols are there?) I can answer this question reasonably well, I think. I have a script that collects symbolic names of GUIDs from the edk2 tree (plus hardcodes a number of well-known but not edk2 GUIDs), and creates a "sed" script out of them. Then another script uses this "sed" script for filtering edk2 logs -- the idea being to replace the whole bunch of logged GUIDs with their symbolic names. That makes logs much easier to read. The generator script is written such a way that the generated "sed" script only grows over time; put differently, this "dictionary" of name<->GUID associations never forgets, it only picks up new GUIDs. The "sed" script (= the dictionary file) consists of entries like s:FFB19961-79CC-4684-84A8-C31B0A2BBE82:[EarlyFdt16550SerialPortHookLib]:ig s:FFB56F09-65E3-4462-A799-2F0D1930D38C:[DxeContextBufferModuleConfigLib]:ig s:FFE06BDD-6107-46A6-7BB2-5A9C7EC5275C:[EfiAcpiTableProtocol]:ig s:FFF12B8D-7696-4C8B-A985-2747075B4F50:[EfiSystemNvDataFv]:ig it's sorted uniquely by GUID. Right now, it has 3074 entries. (People like generating GUIDs! :) In PI/UEFI/edk2, *everything* is a GUID, not just protocols!) So using 64 buckets would hash ~50 different protocol GUIDs to any given bucket (although, again, not all of those three thousand GUIDs are *protocol* GUIDs). 1024 buckets might work better. Laszlo -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#113595): https://edk2.groups.io/g/devel/message/113595 Mute This Topic: https://groups.io/mt/103594587/21656 Group Owner: devel+ow...@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/leave/9847357/21656/1706620634/xyzzy [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-