murblanc commented on a change in pull request #580:
URL: https://github.com/apache/solr/pull/580#discussion_r797700536



##########
File path: 
solr/core/src/java/org/apache/solr/core/TransientSolrCoreCacheDefault.java
##########
@@ -89,6 +86,25 @@ public TransientSolrCoreCacheDefault(CoreContainer 
coreContainer) {
     transientDescriptors = new LinkedHashMap<>(initialCapacity);
   }
 
+  private void onEvict(SolrCore core) {
+    if (coreContainer.hasPendingCoreOps(core.getName())) {
+      if (log.isInfoEnabled()) {
+        log.info("NOT evicting transient core [{}]; it's loading or something 
else", core.getName());
+      }
+      transientCores.put(core.getName(), core); // put back

Review comment:
       This `put` and the one below being called from a `removalListener`, they 
happen **after** the eviction from the cache.
   I assume that if another thread tries to access the core (and therefore 
tries to create a new one?) it will see the error this PR is fixing, but once 
the core is put back here, things should be ok.
   Not suggesting any change, just checking my understanding that there might 
be some race condition transient errors around this.

##########
File path: 
solr/core/src/java/org/apache/solr/core/TransientSolrCoreCacheDefault.java
##########
@@ -89,6 +86,25 @@ public TransientSolrCoreCacheDefault(CoreContainer 
coreContainer) {
     transientDescriptors = new LinkedHashMap<>(initialCapacity);
   }
 
+  private void onEvict(SolrCore core) {
+    if (coreContainer.hasPendingCoreOps(core.getName())) {
+      if (log.isInfoEnabled()) {
+        log.info("NOT evicting transient core [{}]; it's loading or something 
else", core.getName());
+      }
+      transientCores.put(core.getName(), core); // put back
+    } else if (core.getOpenCount() > 1) {
+      if (log.isInfoEnabled()) {
+        log.info("NOT evicting transient core [{}]; it's still in use", 
core.getName());
+      }
+      transientCores.put(core.getName(), core); // put back
+    } else {
+      if (log.isInfoEnabled()) {
+        log.info("Closing transient core [{}] evicted from the cache", 
core.getName());
+      }
+      coreContainer.queueCoreToClose(core);

Review comment:
       What happens if once the core was added here to the 
`SolrCores.pendingCloses` but before the close actually happens, some thread 
reopens and reloads the core?

##########
File path: solr/core/src/test/org/apache/solr/core/TestLazyCores.java
##########
@@ -212,43 +225,44 @@ public void testCachingLimit() throws Exception {
       checkCoresNotLoaded(cc, "collection3", "collection4", "collection6", 
"collection7", "collection8", "collection9");
 
       // By putting these in non-alpha order, we're also checking that we're  
not just seeing an artifact.
-      SolrCore core1 = cc.getCore("collection1");
-      SolrCore core3 = cc.getCore("collection3");
-      SolrCore core4 = cc.getCore("collection4");
-      SolrCore core2 = cc.getCore("collection2");
-      SolrCore core5 = cc.getCore("collection5");
+      getCoreAndPutBack(cc, "collection1");
+      getCoreAndPutBack(cc, "collection3");
+      getCoreAndPutBack(cc, "collection4");
+      getCoreAndPutBack(cc, "collection2");
+      getCoreAndPutBack(cc, "collection5");
 
       checkLoadedCores(cc, "collection1", "collection2", "collection3", 
"collection4", "collection5");
       checkCoresNotLoaded(cc, "collection6", "collection7", "collection8", 
"collection9");
 
       // map should be full up, add one more and verify
-      SolrCore core6 = cc.getCore("collection6");
+      getCoreAndPutBack(cc, "collection6");
       checkLoadedCores(cc, "collection1", "collection2", "collection3", 
"collection4", "collection5",
           "collection6");
       checkCoresNotLoaded(cc, "collection7", "collection8", "collection9");
 
-      SolrCore core7 = cc.getCore("collection7");
+      getCoreAndPutBack(cc, "collection7");
       checkLoadedCores(cc, "collection1", "collection2", "collection3", 
"collection4", "collection5",
           "collection6", "collection7");
       checkCoresNotLoaded(cc, "collection8", "collection9");
 
-      SolrCore core8 = cc.getCore("collection8");
+      getCoreAndPutBack(cc, "collection8");
       checkLoadedCores(cc, "collection1", "collection4", "collection5", 
"collection8");

Review comment:
       Help me: why are we sure here that collection1, collection4 and 
collection5 cores are loaded whereas collection2 and collection3 for example 
might not? @bruno-roustant maybe you know?

##########
File path: solr/core/src/java/org/apache/solr/core/CoreContainer.java
##########
@@ -2284,8 +2285,10 @@ public void run() {
 
       SolrCore core;
       while (!container.isShutDown() && (core = solrCores.getCoreToClose()) != 
null) {
+        assert core.getOpenCount() == 1;

Review comment:
       assert is only enabled in tests usually, right?
   
   What happens if the core gets reopened while we close it here? Is it a case 
of the open count being higher than one in a legitimate way?

##########
File path: 
solr/core/src/java/org/apache/solr/core/TransientSolrCoreCacheDefault.java
##########
@@ -89,6 +86,25 @@ public TransientSolrCoreCacheDefault(CoreContainer 
coreContainer) {
     transientDescriptors = new LinkedHashMap<>(initialCapacity);
   }
 
+  private void onEvict(SolrCore core) {
+    if (coreContainer.hasPendingCoreOps(core.getName())) {

Review comment:
       After offline discussion with David, this can happen when a core load 
takes too much time (registering in ZK etc) and we try to unload it before it 
has completed. Likely deserves a comment here.
   
   This means the cache can grow pretty big, as evictions will not be allowed 
for a while, or if we prevent additions to the cache when it's over max size, 
this can cause global slowdown on node startup.
   
   If the cache grows big, we need to make sure there is some process shrinking 
it back to reasonable size in reasonable time. If we only evict one core when 
we add one (I don't know if that's the case), the cache might never reduce in 
size.

##########
File path: solr/core/src/test/org/apache/solr/core/TestLazyCores.java
##########
@@ -261,23 +275,16 @@ public void testCachingLimit() throws Exception {
         assertNotNull(o);
       }
 
-
-      // Note decrementing the count when the core is removed from the 
lazyCores list is appropriate, since the
-      // refcount is 1 when constructed. anyone _else_ who's opened up one has 
to close it.
-      core1.close();
-      core2.close();
-      core3.close();
-      core4.close();
-      core5.close();
-      core6.close();
-      core7.close();
-      core8.close();
-      core9.close();
     } finally {
       cc.shutdown();
     }
   }
 
+  private void getCoreAndPutBack(CoreContainer cc, String name) {

Review comment:
       `getAndCloseCore` a better name maybe?
   And comment that core needs to be closed for it to be an eviction candidate.

##########
File path: solr/core/src/java/org/apache/solr/core/CoreContainer.java
##########
@@ -2284,8 +2285,10 @@ public void run() {
 
       SolrCore core;
       while (!container.isShutDown() && (core = solrCores.getCoreToClose()) != 
null) {
+        assert core.getOpenCount() == 1;
         try {
-          core.close();
+          MDCLoggingContext.setCore(core);

Review comment:
       Nice. Wasn't obvious for me at first glance that it was for setting the 
right logging context for the `close()` call.

##########
File path: 
solr/core/src/java/org/apache/solr/core/TransientSolrCoreCacheDefault.java
##########
@@ -89,6 +86,25 @@ public TransientSolrCoreCacheDefault(CoreContainer 
coreContainer) {
     transientDescriptors = new LinkedHashMap<>(initialCapacity);
   }
 
+  private void onEvict(SolrCore core) {
+    if (coreContainer.hasPendingCoreOps(core.getName())) {
+      if (log.isInfoEnabled()) {
+        log.info("NOT evicting transient core [{}]; it's loading or something 
else", core.getName());

Review comment:
       Need to update comment line 72 above since the cache max size is not 
strictly enforced anymore. I don't think the cache can grow extremely big 
unless all the cores we try to evict happen to be still in use, which is 
unlikely.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to