mpetris commented on code in PR #1155: URL: https://github.com/apache/solr/pull/1155#discussion_r1045286686
########## solr/core/src/java/org/apache/solr/core/TransientSolrCoreCacheDefault.java: ########## @@ -47,6 +49,14 @@ public class TransientSolrCoreCacheDefault extends TransientSolrCoreCache { */ protected final Cache<String, SolrCore> transientCores; + /** + * This explicitly models capacity overflow of transientCores with cores that are still in use. A + * cores cache which holds cores evicted from transientCores because of limited size but which + * shall remain cached because they are still referenced or because they are part of an ongoing + * operation (pending ops). + */ + protected final Cache<String, SolrCore> overflowCores; Review Comment: > The PR description says _"a separate thread will schedule a ping-pong behaviour in the eviction handler on a full cache with SolrCores still referenced"_. Could you be more specific? > Did you see this ping pong behavior, with a core being evicted, then put back, then evicted again, and so on? Since we don't use transient cores in our environment, I tested the behaviour via the tests in [TestLazyCores](https://github.com/apache/solr/blob/main/solr/core/src/test/org/apache/solr/core/TestLazyCores.java). I observed the ping-pong behaviour with the separation in read and write locks in place in SolrCores and still with the original behaviour in the TransientSolrCoreCacheDefault but with the difference that the onEvict() method uses the new write lock from SolrCores and the executor of the Caffeine cache is set to the ForkJoinPool.commonPool() (to avoid the deadlock when the Caffeine maintenance task and the eviction happens during a get on the cache holding a read lock already). With this scenario the [testCreateTransientFromAdmin](https://github.com/apache/solr/blob/37c34664bab9a5d6df9ba6f3544d09432845aa51/solr/core/src/test/org/apache/solr/core/TestLazyCores.java#L486) test shows, [after holding references to five cores](https://github.com/apache/solr/blob/37c34664bab9a5d6df9ba6f3544d09432845aa51/solr/core/src/test/org/apache/solr/core/TestLazyCores.java#L508) with a max size of four, that the ForkJoinPool.commonPool() based Caffeine maintenance task tries to evict cores that get put back in by `onEvict()` over and over again until the first core gets closed in the [TestThread](https://github.com/apache/solr/blob/37c34664bab9a5d6df9ba6f3544d09432845aa51/solr/core/src/test/org/apache/solr/core/TestLazyCores.java#L545). So, yes this behaviour is transient but it seem to be present as long as the max size is exceeded. Not sure how long this can be but it felt bad to let the maintenance task go crazy during that time. That's why we came up with the overflowCores structure. Note that it might be different when the executor is set to a custom pool. In Caffeine's BoundedLocalCache the scheduling of the maintenance task happens more frequently when the ForkJoinPool.commonPool() is used. But it seemed to make things even more complicated introducing a custom pool so we didn't explore that road. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org