Re: [PR] SOLR-17432: Allow OTel Java Agent [solr]

2024-09-01 Thread via GitHub


dsmiley commented on code in PR #2687:
URL: https://github.com/apache/solr/pull/2687#discussion_r1740338583


##
solr/core/src/java/org/apache/solr/core/TracerConfigurator.java:
##
@@ -38,33 +41,53 @@
 public abstract class TracerConfigurator implements NamedListInitializedPlugin 
{
 
   public static final boolean TRACE_ID_GEN_ENABLED =
-  Boolean.parseBoolean(EnvUtils.getProperty("solr.alwaysOnTraceId", 
"true"));
+  EnvUtils.getPropertyAsBool("solr.alwaysOnTraceId", true);
 
   private static final String DEFAULT_CLASS_NAME =
   EnvUtils.getProperty(
   "solr.otelDefaultConfigurator", 
"org.apache.solr.opentelemetry.OtelTracerConfigurator");
 
   private static final Logger log = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
 
-  public static Tracer loadTracer(SolrResourceLoader loader, PluginInfo info) {
-if (info != null && info.isEnabled()) {
-  TracerConfigurator configurator =
-  loader.newInstance(info.className, TracerConfigurator.class);
-  configurator.init(info.initArgs);
-  ExecutorUtil.addThreadLocalProvider(new ContextThreadLocalProvider());
-  return configurator.getTracer();
-}
-if (shouldAutoConfigOTEL()) {
-  return autoConfigOTEL(loader);
+  /** Initializes {@link io.opentelemetry.api.GlobalOpenTelemetry} and returns 
a Tracer. */
+  public static synchronized Tracer loadTracer(SolrResourceLoader loader, 
PluginInfo info) {
+// synchronized to avoid races in tests starting Solr concurrently setting 
GlobalOpenTelemetry
+if (TraceUtils.OTEL_AGENT_PRESENT) {
+  log.info("OpenTelemetry Java agent was already installed; skipping OTEL 
initialization");
+} else {
+  OpenTelemetry otel = null;
+  if (info != null && info.isEnabled()) {
+var configurator = loader.newInstance(info.className, 
TracerConfigurator.class);
+configurator.init(info.initArgs);
+otel = configurator.createOtel();
+  } else if (shouldAutoConfigOTEL()) {
+otel = autoConfigOTEL(loader); // could return null if failed to load
+if (otel != null) {
+  log.info("OpenTelemetry loaded via auto configuration.");
+}
+  }
+  if (otel == null && TRACE_ID_GEN_ENABLED) {
+otel = 
OpenTelemetry.propagating(ContextPropagators.create(SimplePropagator.getInstance()));
+log.info("OpenTelemetry loaded with simple propagation only.");
+  }
+  if (otel == null) {
+otel = OpenTelemetry.noop();
+  }
+
+  try {
+GlobalOpenTelemetry.set(otel); // throws IllegalStateException if 
already set

Review Comment:
   this is the one spot where we call this.  Previously it was in multiple 
places which was confusing and hard to reason about.



##
solr/core/src/java/org/apache/solr/core/TracerConfigurator.java:
##
@@ -38,33 +41,53 @@
 public abstract class TracerConfigurator implements NamedListInitializedPlugin 
{
 
   public static final boolean TRACE_ID_GEN_ENABLED =
-  Boolean.parseBoolean(EnvUtils.getProperty("solr.alwaysOnTraceId", 
"true"));
+  EnvUtils.getPropertyAsBool("solr.alwaysOnTraceId", true);
 
   private static final String DEFAULT_CLASS_NAME =
   EnvUtils.getProperty(
   "solr.otelDefaultConfigurator", 
"org.apache.solr.opentelemetry.OtelTracerConfigurator");
 
   private static final Logger log = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
 
-  public static Tracer loadTracer(SolrResourceLoader loader, PluginInfo info) {
-if (info != null && info.isEnabled()) {
-  TracerConfigurator configurator =
-  loader.newInstance(info.className, TracerConfigurator.class);
-  configurator.init(info.initArgs);
-  ExecutorUtil.addThreadLocalProvider(new ContextThreadLocalProvider());
-  return configurator.getTracer();
-}
-if (shouldAutoConfigOTEL()) {
-  return autoConfigOTEL(loader);
+  /** Initializes {@link io.opentelemetry.api.GlobalOpenTelemetry} and returns 
a Tracer. */
+  public static synchronized Tracer loadTracer(SolrResourceLoader loader, 
PluginInfo info) {
+// synchronized to avoid races in tests starting Solr concurrently setting 
GlobalOpenTelemetry
+if (TraceUtils.OTEL_AGENT_PRESENT) {
+  log.info("OpenTelemetry Java agent was already installed; skipping OTEL 
initialization");
+} else {
+  OpenTelemetry otel = null;
+  if (info != null && info.isEnabled()) {
+var configurator = loader.newInstance(info.className, 
TracerConfigurator.class);
+configurator.init(info.initArgs);
+otel = configurator.createOtel();
+  } else if (shouldAutoConfigOTEL()) {
+otel = autoConfigOTEL(loader); // could return null if failed to load
+if (otel != null) {
+  log.info("OpenTelemetry loaded via auto configuration.");
+}
+  }
+  if (otel == null && TRACE_ID_GEN_ENABLED) {
+otel = 
OpenTelemetry.propagating(ContextPropagators.

[jira] [Updated] (SOLR-17432) Enable use of OTEL Agent

2024-09-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-17432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SOLR-17432:
--
Labels: pull-request-available  (was: )

> Enable use of OTEL Agent
> 
>
> Key: SOLR-17432
> URL: https://issues.apache.org/jira/browse/SOLR-17432
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: tracing
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The [OpenTelemetry Java Agent 
> |https://opentelemetry.io/docs/zero-code/java/agent/]is really powerful, 
> supporting {{WithSpan}} annotations and auto-instrumentation of many 
> libraries like the AWS SDK.  It also isolates its transitive dependencies in 
> another classloader so as not to conflict with Solr's choices.  Solr 
> currently only supports OTEL via Solr itself calling into OTEL to initialize. 
>  This ticket proposes _also_ supporting recognizing that the OTEL agent is 
> loaded, and if so then using that without any change to solr.xml.
> Without this, someone can write a trivial TracerConfigurator and configure it 
> in solr.xml but ideally Solr should detect the situation.  If you want to run 
> tests with tracing (an overlooked use of tracing!), it's annoying to go touch 
> the pertinent solr.xml.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-16036 Race condition in SolrJmxReporterTest [solr]

2024-09-01 Thread via GitHub


iamsanjay commented on PR #673:
URL: https://github.com/apache/solr/pull/673#issuecomment-2323360445

   Race Condition!
   ```
org.apache.solr.metrics.reporters.SolrJmxReporterTest.testClosedCore 
(:solr:core)
   Test history: 
https://ge.apache.org/scans/tests?search.rootProjectNames=solr-root&tests.container=org.apache.solr.metrics.reporters.SolrJmxReporterTest&tests.test=testClosedCore
 
http://fucit.org/solr-jenkins-reports/history-trend-of-recent-failures.html#series/org.apache.solr.metrics.reporters.SolrJmxReporterTest.testClosedCore
   Test output: 
/home/jenkins/jenkins-slave/workspace/Solr/Solr-check-9.7/solr/core/build/test-results/test/outputs/OUTPUT-org.apache.solr.metrics.reporters.SolrJmxReporterTest.txt
   Reproduce with: ./gradlew :solr:core:test --tests 
"org.apache.solr.metrics.reporters.SolrJmxReporterTest.testClosedCore" 
-Ptests.jvms=4 -Ptests.haltonfailure=false 
"-Ptests.jvmargs=-XX:TieredStopAtLevel=1 -XX:+UseParallelGC 
-XX:ActiveProcessorCount=1 -XX:ReservedCodeCacheSize=120m" 
-Ptests.seed=D864F32CF202EEE8 -Ptests.multiplier=2 -Ptests.badapples=false 
-Ptests.file.encoding=UTF-8
   ```
   
   
   https://github.com/user-attachments/assets/35b7fc2d-3425-4b58-83ae-c87e58130943";>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Updated] (SOLR-17343) Live-updating Global Circuit Breakers via ClusterProperties

2024-09-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-17343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SOLR-17343:
--
Labels: pull-request-available  (was: )

> Live-updating Global Circuit Breakers via ClusterProperties
> ---
>
> Key: SOLR-17343
> URL: https://issues.apache.org/jira/browse/SOLR-17343
> Project: Solr
>  Issue Type: New Feature
>Affects Versions: 9.6.1
>Reporter: Nick Ginther
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Global rate limiting was added via environment variables and system 
> properties in SOLR-16974. This was implemented via environment variables that 
> are evaluated at Solr startup. The implementation results in global circuit 
> breakers that are static until the Solr process restarts. It would be useful 
> to be able to dynamically update circuit breakers while Solr is running for 
> two reasons.
>  # Cluster load may change in such a way that it is useful to raise or lower 
> circuit breaker thresholds, or add/remove circuit breakers
>  # Experimenting with circuit breaker thresholds and types is much easier
> For these reasons, this issue proposes the following changes:
>  # Add a ClusterProps field that describes cluster-level global circuit 
> breakers
>  ## One global circuit breaker per-class, with a query and update threshold
>  ## The ability to override these values at the node level
>  # Add a ClusterProps command `set-circuit-breakers` to set this ClusterProps 
> field via the v2 API
>  # Implement a global circuit breaker manager that handles ClusterProps 
> changes by updating / adding / removing circuit breakers based on changes in 
> configuration
>  # ClusterProps-based circuit breakers override any environment-variable 
> based global circuit breakers for ease of circuit breaker management



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Updated] (SOLR-16036) Race condition in SolrJmxReporterTest.testClosedCore

2024-09-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-16036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SOLR-16036:
--
Labels: pull-request-available  (was: )

> Race condition in SolrJmxReporterTest.testClosedCore
> 
>
> Key: SOLR-16036
> URL: https://issues.apache.org/jira/browse/SOLR-16036
> Project: Solr
>  Issue Type: Test
>  Components: JMX, Tests
>Reporter: Mike Drob
>Assignee: Mike Drob
>Priority: Major
>  Labels: pull-request-available
> Fix For: 9.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> [https://jenkins.thetaphi.de/view/Solr/job/Solr-main-MacOSX/666/testReport/org.apache.solr.metrics.reporters/SolrJmxReporterTest/testClosedCore/]
>  
> There's a race condition in our test where the core can be unloaded, causing 
> the MBean to deregister before the loop has been instructed to stop checking. 
> It doesn't reproduce for me at all locally, so I can't verify this, but 
> that's my best guess from visual inspection.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[PR] SOLR-17432: Allow OTel Java Agent [solr]

2024-09-01 Thread via GitHub


dsmiley opened a new pull request, #2687:
URL: https://github.com/apache/solr/pull/2687

   https://issues.apache.org/jira/browse/SOLR-17432
   
   Admittedly there's a lot going on here; the PR is more of a refactoring / 
minor improvements that also happens to support the Otel agent too, as it's a 
small aspect.  I can separate this PR.  A number of things were gnawing at me, 
which I started to deal with.
   
   API Change:  TracerConfigurator subclasses now create an OpenTelemetry not a 
Tracer.  Since we depend on GlobalOpenTelemetry, if a configurator were to 
return some custom Tracer thing (independent of an OpenTelemetry), it's not 
clear how we would install it into a GlobalOpenTelemetry.  Maybe possible (?) 
but doesn't seem a good direction.  We could stop our use of 
GlobalOpenTelemetry easily as it's only used in one spot that could actually 
get the Tracer from elsewhere.  Maybe we should do that anyway, but nonetheless 
installing it is something we should continue to do.
   
   Refactorings / changes:
   * removed all tracing config from MiniSolrCloudCluster
   * SolrTestCase resets tracing now
   * Only TracerConfigurator should call GlobalOpenTelemetry.set (removed from 
SimplePropagator & OtelTracerConfigurator) and it does so in exactly one spot.  
   * Tests examining spans now use OpenTelemetryRule. Removed 
CustomTestOtelTracerConfigurator.java
   
   todo: test Custom TracerConfigurator ?
   Do we even have a test actually using OtelTracerConfigurator.java?  
   Admittedly there's no test here for the Agent approach.  Would need a bats 
test to do right, and it'd need to go download a 20MB jar.  I tested it 
manually when running a test (to debug a test) and I have used it in Solr 9.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] Produce test report for integration tests [solr]

2024-09-01 Thread via GitHub


dsmiley commented on PR #2600:
URL: https://github.com/apache/solr/pull/2600#issuecomment-2323552894

   @gerlowskija what Jenkins job runs the integrationTests ?  I went looking 
right now and none of the jobs appear to do so -- 
https://ci-builds.apache.org/job/Solr/


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17343: Live-updating Global Circuit Breakers via ClusterProperties [solr]

2024-09-01 Thread via GitHub


github-actions[bot] commented on PR #2529:
URL: https://github.com/apache/solr/pull/2529#issuecomment-2323550265

   This PR has had no activity for 60 days and is now labeled as stale.  Any 
new activity or converting it to draft will remove the stale label.  To attract 
more reviewers, please tag people who might be familiar with the code area 
and/or notify the d...@solr.apache.org mailing list.  Thank you for your 
contribution!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org