From: Tomas Vondra > I don't think we need to remove the expired entries right away, if there > are only very few of them. The cleanup requires walking the hash table, > which means significant fixed cost. So if there are only few expired > entries (say, less than 25% of the cache), we can just leave them around > and clean them if we happen to stumble on them (although that may not be > possible with dynahash, which has no concept of expiration) of before > enlarging the hash table.
I agree in that we don't need to evict cache entries as long as the memory permits (within the control of the DBA.) But how does the concept of expiration fit the catcache? How would the user determine the expiration time, i.e. setting of syscache_prune_min_age? If you set a small value to evict unnecessary entries faster, necessary entries will also be evicted. Some access counter would keep accessed entries longer, but some idle time (e.g. lunch break) can flush entries that you want to access after the lunch break. The idea of expiration applies to the case where we want possibly stale entries to vanish and load newer data upon the next access. For example, the TTL (time-to-live) of Memcached, Redis, DNS, ARP. Is the catcache based on the same idea with them? No. What we want to do is to evict never or infrequently used cache entries. That's naturally the task of LRU, isn't it? Even the high performance Memcached and Redis uses LRU when the cache is full. As Bruce said, we don't have to be worried about the lock contention or something, because we're talking about the backend local cache. Are we worried about the overhead of manipulating the LRU chain? The current catcache already does it on every access; it calls dlist_move_head() to put the accessed entry to the front of the hash bucket. > So if we want to address this case too (and we probably want), we may > need to discard the old cache memory context someho (e.g. rebuild the > cache in a new one, and copy the non-expired entries). Which is a nice > opportunity to do the "full" cleanup, of course. The straightforward, natural, and familiar way is to limit the cache size, which I mentioned in some previous mail. We should give the DBA the ability to control memory usage, rather than considering what to do after leaving the memory area grow unnecessarily too large. That's what a typical "cache" is, isn't it? https://en.wikipedia.org/wiki/Cache_(computing) "To be cost-effective and to enable efficient use of data, caches must be relatively small." Another relevant suboptimal idea would be to provide each catcache with a separate memory context, which is the child of CacheMemoryContext. This gives slight optimization by using the slab context (slab.c) for a catcache with fixed-sized tuples. But that'd be a bit complex, I'm afraid for PG 12. Regards MauMau