mpetris commented on code in PR #1155:
URL: https://github.com/apache/solr/pull/1155#discussion_r1045286686


##########
solr/core/src/java/org/apache/solr/core/TransientSolrCoreCacheDefault.java:
##########
@@ -47,6 +49,14 @@ public class TransientSolrCoreCacheDefault extends 
TransientSolrCoreCache {
    */
   protected final Cache<String, SolrCore> transientCores;
 
+  /**
+   * This explicitly models capacity overflow of transientCores with cores 
that are still in use. A
+   * cores cache which holds cores evicted from transientCores because of 
limited size but which
+   * shall remain cached because they are still referenced or because they are 
part of an ongoing
+   * operation (pending ops).
+   */
+  protected final Cache<String, SolrCore> overflowCores;

Review Comment:
   > The PR description says _"a separate thread will schedule a ping-pong 
behaviour in the eviction handler on a full cache with SolrCores still 
referenced"_. Could you be more specific?
   > Did you see this ping pong behavior, with a core being evicted, then put 
back, then evicted again, and so on?
   
   Since we don't use transient cores in our environment, I tested the 
behaviour via the tests in 
[TestLazyCores](https://github.com/apache/solr/blob/main/solr/core/src/test/org/apache/solr/core/TestLazyCores.java).
 I observed the ping-pong behaviour with the separation in read and write locks 
in place in SolrCores and still with the original behaviour in the 
TransientSolrCoreCacheDefault but with the difference that the onEvict() method 
uses the new write lock from SolrCores and the executor of the Caffeine cache 
is set to the ForkJoinPool.commonPool() (to avoid the deadlock when the 
Caffeine maintenance task and the eviction happens during a get on the cache 
holding a read lock already).
   
   With this scenario the 
[testCreateTransientFromAdmin](https://github.com/apache/solr/blob/37c34664bab9a5d6df9ba6f3544d09432845aa51/solr/core/src/test/org/apache/solr/core/TestLazyCores.java#L486)
 test  shows, [after holding references to five 
cores](https://github.com/apache/solr/blob/37c34664bab9a5d6df9ba6f3544d09432845aa51/solr/core/src/test/org/apache/solr/core/TestLazyCores.java#L508)
 with a max size of four, that the ForkJoinPool.commonPool() based Caffeine 
maintenance task  tries to evict cores that get put back in by `onEvict()` over 
and over again until the first core gets closed in the 
[TestThread](https://github.com/apache/solr/blob/37c34664bab9a5d6df9ba6f3544d09432845aa51/solr/core/src/test/org/apache/solr/core/TestLazyCores.java#L545).
 
   
   So, yes this behaviour is transient but it seem to be present as long as the 
max size is exceeded. Not sure how long this can be but it felt bad to let the 
maintenance task go crazy during that time. That's why we came up with the 
overflowCores structure. 
   
   Note that it might be different when the executor is set to a custom pool. 
In Caffeine's BoundedLocalCache the scheduling of the maintenance task happens 
more frequently when the ForkJoinPool.commonPool() is used. But it seemed to 
make things even more complicated introducing a custom pool so we didn't 
explore that road.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to