[ 
https://issues.apache.org/jira/browse/SOLR-17458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18035377#comment-18035377
 ] 

Chris M. Hostetter commented on SOLR-17458:
-------------------------------------------

I don't know where/where exactly it happened, but at some point in the week 
ending with 2025-10-13, multiple DirectoryFactoryTest methods started failing 
very frequently with similar underlying causes that appear to be related to the 
OTEL switch...

[http://fucit.org/solr-jenkins-reports/history-trend-of-recent-failures.html#series/org.apache.solr.core.DirectoryFactoryTest.testGetDataHomeByteBuffersDirectory]

 
{noformat}
Stack Trace:
com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught 
exception in thread: Thread[id=10865, name=Thread-1625,
state=RUNNABLE, group=TGRP-DirectoryFactoryTest]
        at 
__randomizedtesting.SeedInfo.seed([C079B069F7DA505D:BCB24E1EBC9794AC]:0)
Caused by: java.lang.IllegalStateException: Recording can only be started once.
        at __randomizedtesting.SeedInfo.seed([C079B069F7DA505D]:0)
        at 
jdk.jfr/jdk.jfr.internal.PlatformRecording.start(PlatformRecording.java:120)
        at 
jdk.jfr/jdk.jfr.consumer.RecordingStream.start(RecordingStream.java:356)
        at 
io.opentelemetry.instrumentation.runtimemetrics.java17.RuntimeMetrics$JfrRuntimeMetrics.lambda$new$2(RuntimeMetrics.java:104)
        at java.base/java.lang.Thread.run(Thread.java:1575)
{noformat}
...the failure rate of both tests is currently ~21%, suggesting some static 
test randomization is a key factor, but the seeds don't seem to reproduce for 
me locally – making me wonder if some OS/jvm level settings are a contributing 
factor (especially since the failure seems to related to JFR)

> Metrics: switch from DropWizard to OpenTelemetry
> ------------------------------------------------
>
>                 Key: SOLR-17458
>                 URL: https://issues.apache.org/jira/browse/SOLR-17458
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Matthew Biscocho
>            Assignee: Matthew Biscocho
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 10.0
>
>          Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Solr currently captures metrics with Dropwizard 4. There was some limitations 
> to Dropwizard, biggest one being metrics without tags/attributes making 
> aggregation difficult and requires the Prometheus Exporter to work with 
> Grafana.
> Creating this to track and explore integrating OpenTelemetry into Solr and 
> possibly replace Dropwizard giving a larger exposure of observability tools.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to