Thanks for having another look at this. > On Sun, Nov 22, 2020 at 9:21 PM Andy Fan <zhihui.fan1...@gmail.com> wrote: > add 2 more comments. > > 1. I'd suggest adding Assert(false); in RC_END_OF_SCAN case to make the > error clearer. > > case RC_END_OF_SCAN: > /* > * We've already returned NULL for this scan, but just in case > * something call us again by mistake. > */ > return NULL;
This just took some inspiration from nodeMaterial.c where it says: /* * If necessary, try to fetch another row from the subplan. * * Note: the eof_underlying state variable exists to short-circuit further * subplan calls. It's not optional, unfortunately, because some plan * node types are not robust about being called again when they've already * returned NULL. */ I'm not feeling a pressing need to put an Assert(false); in there as it's not what nodeMaterial.c does. nodeMaterial is nodeResultCache's sister node which can also be seen below Nested Loops. > 2. Currently we handle the (!cache_store_tuple(node, outerslot))) case by set > it > to RC_CACHE_BYPASS_MODE. The only reason for the cache_store_tuple failure > is > we can't cache_reduce_memory. I guess if cache_reduce_memory > failed once, it would not succeed later(no more tuples can be stored, > nothing is changed). So I think we can record this state and avoid any new > cache_reduce_memory call. > > /* > * If we failed to create the entry or failed to store the > * tuple in the entry, then go into bypass mode. > */ > if (unlikely(entry == NULL || > !cache_store_tuple(node, outerslot))) > > to > > if (unlikely(entry == NULL || node->memory_cant_be_reduced || > !cache_store_tuple(node, outerslot))) The reason for RC_CACHE_BYPASS_MODE is if there's a single set of parameters that have so many results that they, alone, don't fit in the cache. We call cache_reduce_memory() whenever we go over our memory budget. That function returns false if it was unable to free enough memory without removing the "specialkey", which in this case is the current cache entry that's being populated. Later, when we're caching some entry that isn't quite so large, we still want to be able to cache that. In that case, we'll have removed the remnants of the overly large entry that didn't fit to way for newer and, hopefully, smaller entries. No problems. I'm not sure why there's a need for another flag here. A bit more background. When caching a new entry, or finding an existing entry, we move that entry to the top of the MRU dlist. When adding entries or tuples to existing entries, if we've gone over memory budget, then we remove cache entries from the MRU list starting at the tail (lease recently used). If we begin caching tuples for an entry and need to free some space, then since we've put the current entry to the top of the MRU list, it'll be the last one to be removed. However, it's still possible that we run through the entire MRU list and end up at the most recently used item. So the entry we're populating can also be removed if freeing everything else was still not enough to give us enough free memory. The code refers to this as a cache overflow. This causes the state machine to move into RC_CACHE_BYPASS_MODE mode. We'll just read tuples directly from the subnode in that case, no need to attempt to cache them. They're not going to fit. We'll come out of RC_CACHE_BYPASS_MODE when doing the next rescan with a different set of parameters. This is our chance to try caching things again. The code does that. There might be far fewer tuples for the next parameter we're scanning for, or those tuples might be more narrow. So it makes sense to give caching them another try. Perhaps there's some point where we should give up doing that, but given good statistics, it's unlikely the planner would have thought a result cache would have been worth the trouble and would likely have picked some other way to execute the plan. The planner does estimate the average size of a cache entry and calculates how many of those fit into a hash_mem. If that number is too low then Result Caching the inner side won't be too appealing. Of course, calculating the average does not mean there are no outliers. We'll deal with the large side of the outliers with the bypass code. I currently don't really see what needs to be changed about that. David