atris commented on a change in pull request #96:
URL: https://github.com/apache/solr/pull/96#discussion_r644474656



##########
File path: 
solr/core/src/java/org/apache/solr/util/circuitbreaker/CircuitBreakerManager.java
##########
@@ -40,12 +41,15 @@
  * solution. There will be a follow up with a SIP for a schema API design.
  */
 public class CircuitBreakerManager implements PluginInfoInitialized {
+  private final SolrCore core;

Review comment:
       Thanks for the input, @janhoy . This is elegant and a clean solution.
   
   RE: Not having a common manager, a common manager is needed so that SolrCore 
has to deal. with only one access point for all CBs. However, I agree that 
CircuitBreakerManager should not need to be explicitly aware of each CB 
implementation, will fix that for 9.0
   
   Also, I am confused about your comment of the common config class -- there 
is a common config class already for all circuit breakers, and the 
configuration is completely pluggable since it implements the PluginInfo 
pattern?
   
   
   
   

##########
File path: 
solr/core/src/java/org/apache/solr/util/circuitbreaker/LoadAverageCircuitBreaker.java
##########
@@ -0,0 +1,115 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.util.circuitbreaker;
+
+import java.lang.invoke.MethodHandles;
+import java.lang.management.ManagementFactory;
+import java.lang.management.OperatingSystemMXBean;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * <p>
+ * Tracks current system load average and triggers if the specified threshold 
is breached.
+ *
+ * This circuit breaker gets the load average (length of the run queue) over 
the last
+ * minute and uses that data to take a decision. We depend on 
OperatingSystemMXBean which does
+ * not allow a configurable interval of collection of data.
+ * //TODO: Use Codahale Meter to calculate the value locally.
+ * </p>
+ *
+ * <p>
+ * The configuration to define which mode to use and the trigger threshold are 
defined in
+ * solrconfig.xml
+ * </p>
+ */
+public class LoadAverageCircuitBreaker extends CircuitBreaker {
+  private static final Logger log = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+  private static final OperatingSystemMXBean operatingSystemMXBean = 
ManagementFactory.getOperatingSystemMXBean();
+
+  private final boolean enabled;
+  private final double loadAverageThreshold;
+
+  // Assumption -- the value of these parameters will be set correctly before 
invoking getDebugInfo()
+  private static final ThreadLocal<Double> seenLoadAverage = 
ThreadLocal.withInitial(() -> 0.0);
+
+  private static final ThreadLocal<Double> allowedLoadAverage = 
ThreadLocal.withInitial(() -> 0.0);
+
+  public LoadAverageCircuitBreaker(CircuitBreakerConfig config) {
+    super(config);
+
+    this.enabled = config.getLoadAverageCBEnabled();
+    this.loadAverageThreshold = config.getLoadAverageCBThreshold();
+  }
+
+  @Override
+  public boolean isTripped() {
+    if (!isEnabled()) {
+      return false;
+    }
+
+    if (!enabled) {
+      return false;
+    }
+
+    double localAllowedLoadAverage = getLoadAverageThreshold();
+    double localSeenLoadAverage = calculateLiveLoadAverage();
+
+    if (localSeenLoadAverage < 0) {
+      if (log.isWarnEnabled()) {

Review comment:
       @janhoy I am sorry, but the refactorings proposed do not really have 
anything to do with the actual circuit breakers themselves.
   
   The two circuit breakers (load average and proposed CPU CB) share enough 
method signatures to warrant common interfaces, and their tests are alike apart 
from class names. This is a bad code smell enough to be fixed, and I do not see 
why it cannot be done in this PR itself.

##########
File path: 
solr/core/src/java/org/apache/solr/util/circuitbreaker/CPUCircuitBreaker.java
##########
@@ -110,6 +113,27 @@ public double getCpuUsageThreshold() {
   }
 
   protected double calculateLiveCPUUsage() {
-    return operatingSystemMXBean.getSystemLoadAverage();
+    Metric metric = this.core
+        .getCoreContainer()
+        .getMetricManager()
+        .registry("solr.jvm")
+        .getMetrics()
+        .get("os.systemCpuLoad");

Review comment:
       Yes, so here is the idea:
   
   Typical Solr clusters are dedicated node models (Solr is not expected to 
share resources with any other process). To my knowledge, most IO bound tasks 
in Solr also have a CPU impact so the possibility of a Solr operation having 
such a massive IO wait that it triggers the CB *without* having any significant 
impact on CPU is low (unless you are running Solr on USB storage drives).
   
   Please remember that the circuit breaker is designed to *only* deal with 
typical Solr load patterns, hence is not targeted for generic situations. 
Unless misconfigured, Solr will use multiple CPUs if they are available, hence 
I do not understand your point.
   
   Even in the scenario you described, you can simply set the trigger threshold 
of the CB higher so that it will trigger only when CPU usage is also high. 
   
   Let's say you set the threshold to 200. You will need more than 150 IO 
waiting processes *and* moderate CPU usage to trigger the threshold.
   
   Also, load average is computed by contributions of three factors -- CPU 
usage, tasks waiting for CPU and (in some implementations) IO blocked tasks. If 
you set the triggering threshold high enough, there is a very slim possibility 
that the IO blocked processes metric can overwhelm the others and cause the CB 
to trigger.
   
   But we digress. Solr is an open source project. Circuit breakers are 
designed to be extendable. If you need a circuit breaker with a different 
metric, by all means, please implement the same.
   
   Let us now focus on your complaints against the existing circuit breaker:
   
   1. Documentation not reflecting that in the platforms that choose to include 
IO waiting tasks as a factor in load average, CPU CB will implicitly include 
the same in its triggering threshold. I agree this should be included and I 
thank you for fixing this.
   
   2. Existing CB does not work because it limits the threshold that can be set 
between 0 and 1 -- Where did that come from?
   
   Solr's circuit breaker documentation 
(https://solr.apache.org/guide/8_7/circuit-breakers.html) does not state 
anything of this sort for the CPU circuit breaker. Nor does the code implement 
anything of this sort. The triggering threshold is unbounded, because the load 
average metric is unbounded. At work, we use a threshold in multiples of 100 
and it works perfectly well for us.
   
   So, where is this claim coming from? Where exactly have you seen the limits 
being enforced?
   
   Your entire premise that the existing code does not work is based on this 
fulcrum of the limits, which do not exist in the code or documentation. You 
have been perpetually defaming the feature and have even gone to the extent of 
"warning" our users to immediately stop using it. My question is -- what is the 
basis of the claim?

##########
File path: 
solr/core/src/java/org/apache/solr/util/circuitbreaker/CircuitBreakerManager.java
##########
@@ -40,12 +41,15 @@
  * solution. There will be a follow up with a SIP for a schema API design.
  */
 public class CircuitBreakerManager implements PluginInfoInitialized {
+  private final SolrCore core;

Review comment:
       @wrunderwood I have highlighted this to yourself before months back when 
you asked the same question, and I will repeat it again -- Solr 8x has no way 
of distinguishing between internal and external requests hence there is no way 
for circuit breakers to know what kind of requests they are seeing. I 
implemented this functionality in SOLR-13528 and will be doing the request 
level disambiguation in 9.x but there is still no way to identify the same in 
8.x

##########
File path: 
solr/core/src/java/org/apache/solr/util/circuitbreaker/CircuitBreaker.java
##########
@@ -66,14 +66,19 @@ protected boolean isEnabled() {
     private final int memCBThreshold;
     private final boolean cpuCBEnabled;
     private final int cpuCBThreshold;
+    private final boolean loadAverageCBEnabled;
+    private final double loadAverageCBThreshold;
 
     public CircuitBreakerConfig(final boolean enabled, final boolean 
memCBEnabled, final int memCBThreshold,
-                                  final boolean cpuCBEnabled, final int 
cpuCBThreshold) {
+                                final boolean cpuCBEnabled, final int 
cpuCBThreshold,

Review comment:
       I was advocating including the name of the metric being used for the CPU 
enabled CB (and not call it cpuCBEnabled)

##########
File path: 
solr/core/src/java/org/apache/solr/util/circuitbreaker/CircuitBreakerManager.java
##########
@@ -40,12 +41,15 @@
  * solution. There will be a follow up with a SIP for a schema API design.
  */
 public class CircuitBreakerManager implements PluginInfoInitialized {
+  private final SolrCore core;

Review comment:
       There is now, post SOLR-13528, in 9x. There was none in 8x, hence I had 
to add that mechanism.

##########
File path: solr/core/src/test/org/apache/solr/util/TestCircuitBreaker.java
##########
@@ -199,7 +201,63 @@ public void testFakeCPUCircuitBreaker() {
       PluginInfo pluginInfo = 
h.getCore().getSolrConfig().getPluginInfo(CircuitBreakerManager.class.getName());
 
       CircuitBreaker.CircuitBreakerConfig circuitBreakerConfig = 
CircuitBreakerManager.buildCBConfig(pluginInfo);
-      CircuitBreaker circuitBreaker = new 
FakeCPUCircuitBreaker(circuitBreakerConfig);
+      CircuitBreaker circuitBreaker = new 
FakeCPUCircuitBreaker(circuitBreakerConfig, null);
+
+      h.getCore().getCircuitBreakerManager().register(circuitBreaker);
+
+      List<Future<?>> futures = new ArrayList<>();
+
+      for (int i = 0; i < 5; i++) {
+        Future<?> future = executor.submit(() -> {
+          try {
+            h.query(req("name:\"john smith\""));
+          } catch (SolrException e) {
+            assertThat(e.getMessage(), containsString("Circuit Breakers 
tripped"));
+            failureCount.incrementAndGet();
+          } catch (Exception e) {
+            throw new RuntimeException(e.getMessage());
+          }
+        });
+
+        futures.add(future);
+      }
+
+      for  (Future<?> future : futures) {
+        try {
+          future.get();
+        } catch (Exception e) {
+          throw new RuntimeException(e.getMessage());
+        }
+      }
+
+      executor.shutdown();
+      try {
+        executor.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
+      } catch (InterruptedException e) {
+        Thread.currentThread().interrupt();
+        throw new RuntimeException(e.getMessage());
+      }
+
+      assertEquals("Number of failed queries is not correct",5, 
failureCount.get());
+    } finally {
+      if (!executor.isShutdown()) {
+        executor.shutdown();
+      }
+    }
+  }
+
+  public void testFakeLoadAverageCircuitBreaker() {
+    AtomicInteger failureCount = new AtomicInteger();
+
+    ExecutorService executor = ExecutorUtil.newMDCAwareCachedThreadPool(
+        new SolrNamedThreadFactory("TestCircuitBreaker"));
+    try {
+      removeAllExistingCircuitBreakers();
+
+      PluginInfo pluginInfo = 
h.getCore().getSolrConfig().getPluginInfo(CircuitBreakerManager.class.getName());

Review comment:
       That does not answer my question -- they are still exact copies of each 
other. Lets abstract to a parent class?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to