On 21.11.2014 00:03, Andres Freund wrote: > On 2014-11-17 21:03:07 +0100, Tomas Vondra wrote: >> On 17.11.2014 19:46, Andres Freund wrote: >> >>> The MemoryContextData struct is embedded into AllocSetContext. >> >> Oh, right. That makes is slightly more complicated, though, because >> AllocSetContext adds 6 x 8B fields plus an in-line array of >> freelist pointers. Which is 11x8 bytes. So in total 56+56+88=200B, >> without the additional field. There might be some difference >> because of alignment, but I still don't see how that one >> additional field might impact cachelines? > > It's actually 196 bytes:
Ummmm, I think the pahole output shows 192, not 196? Otherwise it wouldn't be exactly 3 cachelines anyway. But yeah - my math-foo was weak for a moment, because 6x8 != 56. Which is the 8B difference :-/ > struct AllocSetContext { > MemoryContextData header; /* 0 56 */ > AllocBlock blocks; /* 56 8 */ > /* --- cacheline 1 boundary (64 bytes) --- */ > AllocChunk freelist[11]; /* 64 88 */ > /* --- cacheline 2 boundary (128 bytes) was 24 bytes ago --- */ > Size initBlockSize; /* 152 8 */ > Size maxBlockSize; /* 160 8 */ > Size nextBlockSize; /* 168 8 */ > Size allocChunkLimit; /* 176 8 */ > AllocBlock keeper; /* 184 8 */ > /* --- cacheline 3 boundary (192 bytes) --- */ > > /* size: 192, cachelines: 3, members: 8 */ > }; > > And thus one additional field tipps it over the edge. > > "pahole" is a very useful utility. Indeed. >> But if we separated the freelist, that might actually make it >> faster, at least for calls that don't touch the freelist at all, >> no? Because most of the palloc() calls will be handled from the >> current block. > > I seriously doubt it. The additional indirection + additional > branches are likely to make it worse. That's possible, although I tried it on "my version" of the accounting patch, and it showed slight improvement (lower overhead) on Robert's reindex benchmark. The question is how would that work with regular workload, because moving the freelist out of the structure makes it smaller (2 cachelines instead of 3), and it can only impact workloads working with the freelists (i.e. either by calling free, or realloc, or whatever). Although palloc() checks the freelist too ... Also, those pieces may be allocated together (next to each other), which might keep locality. But I haven't tested any of this, and my knowledge of this low-level stuff is poor, so I might be completely wrong. regard Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers