atris commented on a change in pull request #96: URL: https://github.com/apache/solr/pull/96#discussion_r644474656
########## File path: solr/core/src/java/org/apache/solr/util/circuitbreaker/CircuitBreakerManager.java ########## @@ -40,12 +41,15 @@ * solution. There will be a follow up with a SIP for a schema API design. */ public class CircuitBreakerManager implements PluginInfoInitialized { + private final SolrCore core; Review comment: Thanks for the input, @janhoy . This is elegant and a clean solution. RE: Not having a common manager, a common manager is needed so that SolrCore has to deal. with only one access point for all CBs. However, I agree that CircuitBreakerManager should not need to be explicitly aware of each CB implementation, will fix that for 9.0 Also, I am confused about your comment of the common config class -- there is a common config class already for all circuit breakers, and the configuration is completely pluggable since it implements the PluginInfo pattern? ########## File path: solr/core/src/java/org/apache/solr/util/circuitbreaker/LoadAverageCircuitBreaker.java ########## @@ -0,0 +1,115 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.solr.util.circuitbreaker; + +import java.lang.invoke.MethodHandles; +import java.lang.management.ManagementFactory; +import java.lang.management.OperatingSystemMXBean; + +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * <p> + * Tracks current system load average and triggers if the specified threshold is breached. + * + * This circuit breaker gets the load average (length of the run queue) over the last + * minute and uses that data to take a decision. We depend on OperatingSystemMXBean which does + * not allow a configurable interval of collection of data. + * //TODO: Use Codahale Meter to calculate the value locally. + * </p> + * + * <p> + * The configuration to define which mode to use and the trigger threshold are defined in + * solrconfig.xml + * </p> + */ +public class LoadAverageCircuitBreaker extends CircuitBreaker { + private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass()); + private static final OperatingSystemMXBean operatingSystemMXBean = ManagementFactory.getOperatingSystemMXBean(); + + private final boolean enabled; + private final double loadAverageThreshold; + + // Assumption -- the value of these parameters will be set correctly before invoking getDebugInfo() + private static final ThreadLocal<Double> seenLoadAverage = ThreadLocal.withInitial(() -> 0.0); + + private static final ThreadLocal<Double> allowedLoadAverage = ThreadLocal.withInitial(() -> 0.0); + + public LoadAverageCircuitBreaker(CircuitBreakerConfig config) { + super(config); + + this.enabled = config.getLoadAverageCBEnabled(); + this.loadAverageThreshold = config.getLoadAverageCBThreshold(); + } + + @Override + public boolean isTripped() { + if (!isEnabled()) { + return false; + } + + if (!enabled) { + return false; + } + + double localAllowedLoadAverage = getLoadAverageThreshold(); + double localSeenLoadAverage = calculateLiveLoadAverage(); + + if (localSeenLoadAverage < 0) { + if (log.isWarnEnabled()) { Review comment: @janhoy I am sorry, but the refactorings proposed do not really have anything to do with the actual circuit breakers themselves. The two circuit breakers (load average and proposed CPU CB) share enough method signatures to warrant common interfaces, and their tests are alike apart from class names. This is a bad code smell enough to be fixed, and I do not see why it cannot be done in this PR itself. ########## File path: solr/core/src/java/org/apache/solr/util/circuitbreaker/CPUCircuitBreaker.java ########## @@ -110,6 +113,27 @@ public double getCpuUsageThreshold() { } protected double calculateLiveCPUUsage() { - return operatingSystemMXBean.getSystemLoadAverage(); + Metric metric = this.core + .getCoreContainer() + .getMetricManager() + .registry("solr.jvm") + .getMetrics() + .get("os.systemCpuLoad"); Review comment: Yes, so here is the idea: Typical Solr clusters are dedicated node models (Solr is not expected to share resources with any other process). To my knowledge, most IO bound tasks in Solr also have a CPU impact so the possibility of a Solr operation having such a massive IO wait that it triggers the CB *without* having any significant impact on CPU is low (unless you are running Solr on USB storage drives). Please remember that the circuit breaker is designed to *only* deal with typical Solr load patterns, hence is not targeted for generic situations. Unless misconfigured, Solr will use multiple CPUs if they are available, hence I do not understand your point. Even in the scenario you described, you can simply set the trigger threshold of the CB higher so that it will trigger only when CPU usage is also high. Let's say you set the threshold to 200. You will need more than 150 IO waiting processes *and* moderate CPU usage to trigger the threshold. Also, load average is computed by contributions of three factors -- CPU usage, tasks waiting for CPU and (in some implementations) IO blocked tasks. If you set the triggering threshold high enough, there is a very slim possibility that the IO blocked processes metric can overwhelm the others and cause the CB to trigger. But we digress. Solr is an open source project. Circuit breakers are designed to be extendable. If you need a circuit breaker with a different metric, by all means, please implement the same. Let us now focus on your complaints against the existing circuit breaker: 1. Documentation not reflecting that in the platforms that choose to include IO waiting tasks as a factor in load average, CPU CB will implicitly include the same in its triggering threshold. I agree this should be included and I thank you for fixing this. 2. Existing CB does not work because it limits the threshold that can be set between 0 and 1 -- Where did that come from? Solr's circuit breaker documentation (https://solr.apache.org/guide/8_7/circuit-breakers.html) does not state anything of this sort for the CPU circuit breaker. Nor does the code implement anything of this sort. The triggering threshold is unbounded, because the load average metric is unbounded. At work, we use a threshold in multiples of 100 and it works perfectly well for us. So, where is this claim coming from? Where exactly have you seen the limits being enforced? Your entire premise that the existing code does not work is based on this fulcrum of the limits, which do not exist in the code or documentation. You have been perpetually defaming the feature and have even gone to the extent of "warning" our users to immediately stop using it. My question is -- what is the basis of the claim? ########## File path: solr/core/src/java/org/apache/solr/util/circuitbreaker/CircuitBreakerManager.java ########## @@ -40,12 +41,15 @@ * solution. There will be a follow up with a SIP for a schema API design. */ public class CircuitBreakerManager implements PluginInfoInitialized { + private final SolrCore core; Review comment: @wrunderwood I have highlighted this to yourself before months back when you asked the same question, and I will repeat it again -- Solr 8x has no way of distinguishing between internal and external requests hence there is no way for circuit breakers to know what kind of requests they are seeing. I implemented this functionality in SOLR-13528 and will be doing the request level disambiguation in 9.x but there is still no way to identify the same in 8.x ########## File path: solr/core/src/java/org/apache/solr/util/circuitbreaker/CircuitBreaker.java ########## @@ -66,14 +66,19 @@ protected boolean isEnabled() { private final int memCBThreshold; private final boolean cpuCBEnabled; private final int cpuCBThreshold; + private final boolean loadAverageCBEnabled; + private final double loadAverageCBThreshold; public CircuitBreakerConfig(final boolean enabled, final boolean memCBEnabled, final int memCBThreshold, - final boolean cpuCBEnabled, final int cpuCBThreshold) { + final boolean cpuCBEnabled, final int cpuCBThreshold, Review comment: I was advocating including the name of the metric being used for the CPU enabled CB (and not call it cpuCBEnabled) ########## File path: solr/core/src/java/org/apache/solr/util/circuitbreaker/CircuitBreakerManager.java ########## @@ -40,12 +41,15 @@ * solution. There will be a follow up with a SIP for a schema API design. */ public class CircuitBreakerManager implements PluginInfoInitialized { + private final SolrCore core; Review comment: There is now, post SOLR-13528, in 9x. There was none in 8x, hence I had to add that mechanism. ########## File path: solr/core/src/test/org/apache/solr/util/TestCircuitBreaker.java ########## @@ -199,7 +201,63 @@ public void testFakeCPUCircuitBreaker() { PluginInfo pluginInfo = h.getCore().getSolrConfig().getPluginInfo(CircuitBreakerManager.class.getName()); CircuitBreaker.CircuitBreakerConfig circuitBreakerConfig = CircuitBreakerManager.buildCBConfig(pluginInfo); - CircuitBreaker circuitBreaker = new FakeCPUCircuitBreaker(circuitBreakerConfig); + CircuitBreaker circuitBreaker = new FakeCPUCircuitBreaker(circuitBreakerConfig, null); + + h.getCore().getCircuitBreakerManager().register(circuitBreaker); + + List<Future<?>> futures = new ArrayList<>(); + + for (int i = 0; i < 5; i++) { + Future<?> future = executor.submit(() -> { + try { + h.query(req("name:\"john smith\"")); + } catch (SolrException e) { + assertThat(e.getMessage(), containsString("Circuit Breakers tripped")); + failureCount.incrementAndGet(); + } catch (Exception e) { + throw new RuntimeException(e.getMessage()); + } + }); + + futures.add(future); + } + + for (Future<?> future : futures) { + try { + future.get(); + } catch (Exception e) { + throw new RuntimeException(e.getMessage()); + } + } + + executor.shutdown(); + try { + executor.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS); + } catch (InterruptedException e) { + Thread.currentThread().interrupt(); + throw new RuntimeException(e.getMessage()); + } + + assertEquals("Number of failed queries is not correct",5, failureCount.get()); + } finally { + if (!executor.isShutdown()) { + executor.shutdown(); + } + } + } + + public void testFakeLoadAverageCircuitBreaker() { + AtomicInteger failureCount = new AtomicInteger(); + + ExecutorService executor = ExecutorUtil.newMDCAwareCachedThreadPool( + new SolrNamedThreadFactory("TestCircuitBreaker")); + try { + removeAllExistingCircuitBreakers(); + + PluginInfo pluginInfo = h.getCore().getSolrConfig().getPluginInfo(CircuitBreakerManager.class.getName()); Review comment: That does not answer my question -- they are still exact copies of each other. Lets abstract to a parent class? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org