On 09/11/2020 11:34, Kyotaro Horiguchi wrote:
At Fri, 6 Nov 2020 10:42:15 +0200, Heikki Linnakangas <hlinn...@iki.fi> wrote in
Do you need the "ntaccess == 2" test? You could always increment the
counter, and in the code that uses ntaccess to decide what to evict,
treat all values >= 2 the same.

Need to handle integer overflow somehow. Or maybe not: integer
overflow is so infrequent that even if a hot syscache entry gets
evicted prematurely because its ntaccess count wrapped around to 0, it
will happen so rarely that it won't make any difference in practice.

That relaxing simplifies the code significantly, but a significant
degradation by about 5% still exists.

(SearchCatCacheInternal())
  +     ct->naccess++;
!+     ct->lastaccess = catcacheclock;

If I removed the second line above, the degradation disappears
(-0.7%).

0.7% degradation is probably acceptable.

However, I don't find the corresponding numbers in the output
of perf. The sum of the numbers for the removed instructions is (0.02
+ 0.28 = 0.3%).  I don't think the degradation as the whole doesn't
always reflect to the instruction level profiling, but I'm stuck here,
anyway.

Hmm. Some kind of cache miss effect, perhaps? offsetof(CatCTup, tuple) is exactly 64 bytes currently, so any fields that you add after 'tuple' will go on a different cache line. Maybe it would help if you just move the new fields before 'tuple'.

Making CatCTup smaller might help. Some ideas/observations:

- The 'ct_magic' field is only used for assertion checks. Could remove it.

- 4 Datums (32 bytes) are allocated for the keys, even though most catcaches have fewer key columns.

- In the current syscaches, keys[2] and keys[3] are only used to store 32-bit oids or some other smaller fields. Allocating a full 64-bit Datum for them wastes memory.

- You could move the dead flag at the end of the struct or remove it altogether, with the change I mentioned earlier to not keep dead items in the buckets

- You could steal a few bit for dead/negative flags from some other field. Use special values for tuple.t_len for them or something.

With some of these tricks, you could shrink CatCTup so that the new lastaccess and naccess fields would fit in the same cacheline.

That said, I think this is good enough performance-wise as it is. So if we want to improve performance in general, that can be a separate patch.

- Heikki


Reply via email to