On Wed, Jun 30, 2021 at 11:14 PM David Rowley <dgrowle...@gmail.com> wrote: > On Wed, 23 Jun 2021 at 12:17, Thomas Munro <thomas.mu...@gmail.com> wrote: > Thanks for taking an interest in this. I started looking at your idea > and I've now changed my mind from just not liking it to thinking that > the whole idea is just completely horrible :-(
Hah. I accept that trying to make a thing that "wraps" these data structures and provides a simple interface is probably really quite horrible with preprocessor voodoo. I was mainly questioning how bad it would be if we had a generic segmented array component (seems like a great idea, which I'm sure would find other uses, I recall wanting to write that myself before), and then combined that with the presence map idea to make a dense object pool (ditto), but then, in each place where we need something like this, just used a plain old hash table to point directly to objects in it whenever we needed that, open coding the logic to keep it in sync (I mean, just the way that people usually use hash tables). That way, the object pool can give you very fast scans over all objects in cache friendly order (no linked lists), and the hash table doesn't know/care about its existence. In other words, small reusable components that each do one thing well and are not coupled together. I think I understand now that you really, really want small index numbers and not 64 bit pointers in the hash table. Hmm. > It gets really messy with all the nested pre-processor stuff around > fetching the element from the segmented array inside simplehash. One > problem is that simplehash needs the address of the segments despite > simplehash not knowing anything about segments. I've tried to make > that work by passing in the generic hash struct as simplehash's > private_data. This ends up with deeply nested macros all defined in > different files. I pitty the future person debugging that. Yeah, that sounds terrible. > There are also a bunch of changes / API breakages that need to be done > to make this work with simplehash.h. > > 1) Since I really need 8-byte buckets in the hash table to make this > as fast as possible, I want to use the array index for the hash status > and that means changing the simplehash API to allow that to work. > This requires something like SH_IS_BUCKET_INUSE, SH_SET_BUCKET_INUSE, > SH_SET_BUCKET_EMPTY. +1 for doing customisable "is in use" checks on day anyway, as a separate project. Not sure if any current users could shrink their structs in practice because, at a glance, the same amount of space might be used by padding anyway, but when a case like that shows up... > > 2. A bitmapset that tracks unused elements in 1, making it easy to > > find the lowest-index hole when looking for a place to put a new one > > by linear search for a 1 bit, so that we tend towards maximum density > > despite having random frees from time to time (seems good, the same > > idea is used in kernels to allocate the lowest unused file descriptor > > number). > > I didn't use Bitmapsets. I wanted the bitmaps to be allocated in the > same chunk of memory as the segments of the array. Also, because > bitmapset's nwords is variable, then they can't really do any loop > unrolling. Since in my implementation the number of bitmap words are > known at compile-time, the compiler has the flexibility to do loop > unrolling. The bitmap manipulation is one of the biggest overheads in > generichash.h. I'd prefer to keep that as fast as possible. I think my hands autocompleted "bitmapset", I really meant to write just "bitmap" as I didn't think you were using the actual thing called bitmapset, but point taken. > So, with all that. I really don't think it's a great idea to try and > have this use simplehash.h code. I plan to pursue the idea I proposed > with having seperate hash table code that is coded properly to have > stable pointers into the data rather than trying to contort > simplehash's code into working that way. Fair enough. It's not that I don't believe it's a good idea to be able to perform cache-friendly iteration over densely packed objects... that part sounds great... it's just that it's not obvious to me that it should be a *hashtable's* job to provide that access path. Perhaps I lack imagination and we'll have to agree to differ.