Hi, On 2021-08-04 12:39:29 -0400, John Naylor wrote: > CPU caches have multiple levels, so I had an idea to use that concept in > the syscaches.
I do think we loose a good bit to syscache efficiency in real workloads, so I think it's worth investing time into it. However: > Imagine if catcache buckets were cacheline-sized. In that case, we would > store the most recently accessed hash values and pointers to catctup, in > addition to the dlist_head: > > typedef struct cc_bucket > { > uint32 hashes[4]; > catctup *ct[4]; > dlist_head; > }; I'm not convinced that the above the right idea though. Even if the hash matches, you're still going to need to fetch at least catctup->keys[0] from a separate cacheline to be able to return the cache entry. If the hashes we use are decent and we're resizing at reasonable thresholds, we shouldn't constantly lookup up values that are later in a collision chain - particularly because we move cache hits to the head of the bucket. So I don't think a scheme as you described would actually result in a really better cache hit ratio? ISTM that something like struct cc_bucket_1 { uint32 hashes[3]; // 12 // 4 bytes alignment padding Datum key0s[3]; // 24 catctup *ct[3]; // 24 // cacheline boundary dlist_head conflicts; // 16 }; would be better for 1 key values? It's obviously annoying to need different bucket types for different key counts, but given how much 3 unused key Datums waste, it seems worth paying for? One issue with stuff like this is that aset.c won't return cacheline aligned values... It's possible that it'd be better to put the catctup pointers onto a neigboring cacheline (so you still get TLB benefits, as well as making it easy for prefetchers) and increase the number of hashes/keys that fit on one cacheline. If we stuffed four values into one bucket we could potentially SIMD the hash and Datum comparisons ;) Greetings, Andres Freund