[
https://issues.apache.org/jira/browse/SOLR-17458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18035377#comment-18035377
]
Chris M. Hostetter commented on SOLR-17458:
-------------------------------------------
I don't know where/where exactly it happened, but at some point in the week
ending with 2025-10-13, multiple DirectoryFactoryTest methods started failing
very frequently with similar underlying causes that appear to be related to the
OTEL switch...
[http://fucit.org/solr-jenkins-reports/history-trend-of-recent-failures.html#series/org.apache.solr.core.DirectoryFactoryTest.testGetDataHomeByteBuffersDirectory]
{noformat}
Stack Trace:
com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught
exception in thread: Thread[id=10865, name=Thread-1625,
state=RUNNABLE, group=TGRP-DirectoryFactoryTest]
at
__randomizedtesting.SeedInfo.seed([C079B069F7DA505D:BCB24E1EBC9794AC]:0)
Caused by: java.lang.IllegalStateException: Recording can only be started once.
at __randomizedtesting.SeedInfo.seed([C079B069F7DA505D]:0)
at
jdk.jfr/jdk.jfr.internal.PlatformRecording.start(PlatformRecording.java:120)
at
jdk.jfr/jdk.jfr.consumer.RecordingStream.start(RecordingStream.java:356)
at
io.opentelemetry.instrumentation.runtimemetrics.java17.RuntimeMetrics$JfrRuntimeMetrics.lambda$new$2(RuntimeMetrics.java:104)
at java.base/java.lang.Thread.run(Thread.java:1575)
{noformat}
...the failure rate of both tests is currently ~21%, suggesting some static
test randomization is a key factor, but the seeds don't seem to reproduce for
me locally – making me wonder if some OS/jvm level settings are a contributing
factor (especially since the failure seems to related to JFR)
> Metrics: switch from DropWizard to OpenTelemetry
> ------------------------------------------------
>
> Key: SOLR-17458
> URL: https://issues.apache.org/jira/browse/SOLR-17458
> Project: Solr
> Issue Type: Improvement
> Reporter: Matthew Biscocho
> Assignee: Matthew Biscocho
> Priority: Major
> Labels: pull-request-available
> Fix For: 10.0
>
> Time Spent: 2h 50m
> Remaining Estimate: 0h
>
> Solr currently captures metrics with Dropwizard 4. There was some limitations
> to Dropwizard, biggest one being metrics without tags/attributes making
> aggregation difficult and requires the Prometheus Exporter to work with
> Grafana.
> Creating this to track and explore integrating OpenTelemetry into Solr and
> possibly replace Dropwizard giving a larger exposure of observability tools.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]