Hi, On 2021-07-18 00:46:09 +0200, Tomas Vondra wrote: > On 7/17/21 11:14 PM, Andres Freund wrote: > > Hm. I wonder if we should just not populate the freelist eagerly, to > > drive down the initialization cost. I.e. have a separate allocation path > > for chunks that have never been allocated, by having a > > SlabBlock->free_offset or such. > > > > Sure, it adds a branch to the allocation happy path, but it also makes the > > first allocation for a chunk cheaper, because there's no need to get the > > next > > element from the freelist (adding a likely cache miss). And it should make > > the > > allocation of a new block faster by a lot. > > > > Not sure what you mean by 'not populate eagerly' so can't comment :-(
Instead of populating a linked list with all chunks upon creation of a block - which requires touching a fair bit of memory - keep a per-block pointer (or an offset) into "unused" area of the block. When allocating from the block and theres still "unused" memory left, use that, instead of bothering with the freelist. I tried that, and it nearly got slab up to the allocation/freeing performance of aset.c (while winning after allocation, due to the higher memory density). > > > > 4) Less of a performance, and more of a usability issue: The constant > > > > block size strikes me as problematic. Most users of an allocator can > > > > sometimes be used with a small amount of data, and sometimes with a > > > > large amount. > > > > > > > > > > I doubt this is worth the effort, really. The constant block size makes > > > various places much simpler (both to code and reason about), so this > > > should > > > not make a huge difference in performance. And IMHO the block size is > > > mostly > > > an implementation detail, so I don't see that as a usability issue. > > > > Hm? It's something the user has to specify, so I it's not really an > > implementation detail. It needs to be specified without sufficient > > information, as well, since externally one doesn't know how much memory the > > block header and chunk headers + rounding up will use, so computing a good > > block size isn't easy. I've wondered whether it should just be a count... > > > > I think this is mixing two problems - how to specify the block size, and > whether the block size is constant (as in slab) or grows over time (as in > allocset). That was in response to the "implementation detail" bit solely. > But growing the block size seems problematic for long-lived contexts with > workloads that change a lot over time - imagine e.g. decoding many small > transactions, with one huge transaction mixed in. The one huge transaction > will grow the block size, and we'll keep using it forever. But in that case > we might have just as well allocate the large blocks from the start, I > guess. I was thinking of capping the growth fairly low. I don't think after a 16x growth or so you're likely to still see allocation performance gains with slab. And I don't think that'd be too bad for decoding - we'd start with a small initial block size, and in many workloads that will be enough, and just workloads where that doesn't suffice will adapt performance wise. And: Medium term I wouldn't expect reorderbuffer.c to stay the only slab.c user... > > Why do you not think it's relevant for performance? Either one causes too > > much > > memory usage by using a too large block size, wasting memory, or one ends up > > loosing perf through frequent allocations? > > True. I simply would not expect this to make a huge difference - I may be > wrong, and I'm sure there are workloads where it matters. But I still think > it's easier to just use larger blocks than to make the slab code more > complex. IDK. I'm looking at using slab as part of a radix tree implementation right now. Which I'd hope to be used in various different situations. So it's hard to choose the right block size - and it does seem to matter for performance. Greetings, Andres Freund