murblanc commented on a change in pull request #580: URL: https://github.com/apache/solr/pull/580#discussion_r797700536
########## File path: solr/core/src/java/org/apache/solr/core/TransientSolrCoreCacheDefault.java ########## @@ -89,6 +86,25 @@ public TransientSolrCoreCacheDefault(CoreContainer coreContainer) { transientDescriptors = new LinkedHashMap<>(initialCapacity); } + private void onEvict(SolrCore core) { + if (coreContainer.hasPendingCoreOps(core.getName())) { + if (log.isInfoEnabled()) { + log.info("NOT evicting transient core [{}]; it's loading or something else", core.getName()); + } + transientCores.put(core.getName(), core); // put back Review comment: This `put` and the one below being called from a `removalListener`, they happen **after** the eviction from the cache. I assume that if another thread tries to access the core (and therefore tries to create a new one?) it will see the error this PR is fixing, but once the core is put back here, things should be ok. Not suggesting any change, just checking my understanding that there might be some race condition transient errors around this. ########## File path: solr/core/src/java/org/apache/solr/core/TransientSolrCoreCacheDefault.java ########## @@ -89,6 +86,25 @@ public TransientSolrCoreCacheDefault(CoreContainer coreContainer) { transientDescriptors = new LinkedHashMap<>(initialCapacity); } + private void onEvict(SolrCore core) { + if (coreContainer.hasPendingCoreOps(core.getName())) { + if (log.isInfoEnabled()) { + log.info("NOT evicting transient core [{}]; it's loading or something else", core.getName()); + } + transientCores.put(core.getName(), core); // put back + } else if (core.getOpenCount() > 1) { + if (log.isInfoEnabled()) { + log.info("NOT evicting transient core [{}]; it's still in use", core.getName()); + } + transientCores.put(core.getName(), core); // put back + } else { + if (log.isInfoEnabled()) { + log.info("Closing transient core [{}] evicted from the cache", core.getName()); + } + coreContainer.queueCoreToClose(core); Review comment: What happens if once the core was added here to the `SolrCores.pendingCloses` but before the close actually happens, some thread reopens and reloads the core? ########## File path: solr/core/src/test/org/apache/solr/core/TestLazyCores.java ########## @@ -212,43 +225,44 @@ public void testCachingLimit() throws Exception { checkCoresNotLoaded(cc, "collection3", "collection4", "collection6", "collection7", "collection8", "collection9"); // By putting these in non-alpha order, we're also checking that we're not just seeing an artifact. - SolrCore core1 = cc.getCore("collection1"); - SolrCore core3 = cc.getCore("collection3"); - SolrCore core4 = cc.getCore("collection4"); - SolrCore core2 = cc.getCore("collection2"); - SolrCore core5 = cc.getCore("collection5"); + getCoreAndPutBack(cc, "collection1"); + getCoreAndPutBack(cc, "collection3"); + getCoreAndPutBack(cc, "collection4"); + getCoreAndPutBack(cc, "collection2"); + getCoreAndPutBack(cc, "collection5"); checkLoadedCores(cc, "collection1", "collection2", "collection3", "collection4", "collection5"); checkCoresNotLoaded(cc, "collection6", "collection7", "collection8", "collection9"); // map should be full up, add one more and verify - SolrCore core6 = cc.getCore("collection6"); + getCoreAndPutBack(cc, "collection6"); checkLoadedCores(cc, "collection1", "collection2", "collection3", "collection4", "collection5", "collection6"); checkCoresNotLoaded(cc, "collection7", "collection8", "collection9"); - SolrCore core7 = cc.getCore("collection7"); + getCoreAndPutBack(cc, "collection7"); checkLoadedCores(cc, "collection1", "collection2", "collection3", "collection4", "collection5", "collection6", "collection7"); checkCoresNotLoaded(cc, "collection8", "collection9"); - SolrCore core8 = cc.getCore("collection8"); + getCoreAndPutBack(cc, "collection8"); checkLoadedCores(cc, "collection1", "collection4", "collection5", "collection8"); Review comment: Help me: why are we sure here that collection1, collection4 and collection5 cores are loaded whereas collection2 and collection3 for example might not? @bruno-roustant maybe you know? ########## File path: solr/core/src/java/org/apache/solr/core/CoreContainer.java ########## @@ -2284,8 +2285,10 @@ public void run() { SolrCore core; while (!container.isShutDown() && (core = solrCores.getCoreToClose()) != null) { + assert core.getOpenCount() == 1; Review comment: assert is only enabled in tests usually, right? What happens if the core gets reopened while we close it here? Is it a case of the open count being higher than one in a legitimate way? ########## File path: solr/core/src/java/org/apache/solr/core/TransientSolrCoreCacheDefault.java ########## @@ -89,6 +86,25 @@ public TransientSolrCoreCacheDefault(CoreContainer coreContainer) { transientDescriptors = new LinkedHashMap<>(initialCapacity); } + private void onEvict(SolrCore core) { + if (coreContainer.hasPendingCoreOps(core.getName())) { Review comment: After offline discussion with David, this can happen when a core load takes too much time (registering in ZK etc) and we try to unload it before it has completed. Likely deserves a comment here. This means the cache can grow pretty big, as evictions will not be allowed for a while, or if we prevent additions to the cache when it's over max size, this can cause global slowdown on node startup. If the cache grows big, we need to make sure there is some process shrinking it back to reasonable size in reasonable time. If we only evict one core when we add one (I don't know if that's the case), the cache might never reduce in size. ########## File path: solr/core/src/test/org/apache/solr/core/TestLazyCores.java ########## @@ -261,23 +275,16 @@ public void testCachingLimit() throws Exception { assertNotNull(o); } - - // Note decrementing the count when the core is removed from the lazyCores list is appropriate, since the - // refcount is 1 when constructed. anyone _else_ who's opened up one has to close it. - core1.close(); - core2.close(); - core3.close(); - core4.close(); - core5.close(); - core6.close(); - core7.close(); - core8.close(); - core9.close(); } finally { cc.shutdown(); } } + private void getCoreAndPutBack(CoreContainer cc, String name) { Review comment: `getAndCloseCore` a better name maybe? And comment that core needs to be closed for it to be an eviction candidate. ########## File path: solr/core/src/java/org/apache/solr/core/CoreContainer.java ########## @@ -2284,8 +2285,10 @@ public void run() { SolrCore core; while (!container.isShutDown() && (core = solrCores.getCoreToClose()) != null) { + assert core.getOpenCount() == 1; try { - core.close(); + MDCLoggingContext.setCore(core); Review comment: Nice. Wasn't obvious for me at first glance that it was for setting the right logging context for the `close()` call. ########## File path: solr/core/src/java/org/apache/solr/core/TransientSolrCoreCacheDefault.java ########## @@ -89,6 +86,25 @@ public TransientSolrCoreCacheDefault(CoreContainer coreContainer) { transientDescriptors = new LinkedHashMap<>(initialCapacity); } + private void onEvict(SolrCore core) { + if (coreContainer.hasPendingCoreOps(core.getName())) { + if (log.isInfoEnabled()) { + log.info("NOT evicting transient core [{}]; it's loading or something else", core.getName()); Review comment: Need to update comment line 72 above since the cache max size is not strictly enforced anymore. I don't think the cache can grow extremely big unless all the cores we try to evict happen to be still in use, which is unlikely. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org