mlbiscoc commented on code in PR #3811:
URL: https://github.com/apache/solr/pull/3811#discussion_r2463838449
##########
solr/solr-ref-guide/modules/deployment-guide/pages/monitoring-with-prometheus-and-grafana.adoc:
##########
Review Comment:
With Prometheus exporter gone, I deleted this page and put some of the info
into metrics-reporting. There is an OTEL collector section now that basically
replaces this anyways. Would still love to create a grafana dashboard to ship
upstream with solr metrics similar how we did before.
##########
solr/core/src/java/org/apache/solr/metrics/otel/MetricExporterFactory.java:
##########
@@ -30,5 +30,12 @@ public interface MetricExporterFactory {
public static final int OTLP_EXPORTER_INTERVAL =
Integer.parseInt(EnvUtils.getProperty("solr.metrics.otlpExporterInterval",
"60000"));
+ public static final String OTLP_EXPORTER_GRPC_ENDPOINT =
+ EnvUtils.getProperty("solr.metrics.otlpGrpcExporterEndpoint",
"http://localhost:4317");
+
+ public static final String OTLP_EXPORTER_HTTP_ENDPOINT =
+ EnvUtils.getProperty(
+ "solr.metrics.otlpHttpExporterEndpoint",
"http://localhost:4318/v1/metrics");
Review Comment:
We are using the `default` OTLP exporter but I never added these 2 options
for having a configurable endpoint of where to push to. So it was also
defaulting to `4317` via gRPC or `4318` via HTTP` Added these in here.
##########
solr/solr-ref-guide/modules/deployment-guide/pages/performance-statistics-reference.adoc:
##########
@@ -19,55 +19,55 @@
This page explains some of the statistics that Solr exposes.
There are two approaches to retrieving metrics.
-First, you can use the xref:metrics-reporting.adoc#metrics-api[Metrics API],
or you can enable JMX and get metrics from the
xref:mbean-request-handler.adoc[] or via an external tool such as JConsole.
-The below descriptions focus on retrieving the metrics using the Metrics API,
but the metric names are the same if using the MBean Request Handler or an
external tool.
+First, you can use the xref:metrics-reporting.adoc#metrics-api[Metrics API] or
push metrics with OTLP to your monitoring backend.
+The descriptions below focus on retrieving metrics using the Metrics API and
Prometheus, but the metric names are the same with OTLP.
-These statistics are per core.
-When you are running in SolrCloud mode these statistics would co-relate to the
performance of an individual replica.
+These statistics are per core. When you are running in SolrCloud mode these
statistics would co-relate to the performance of an individual replica.
+
+*Note: Solr metrics provide raw data that must be aggregated and calculated by
monitoring backends (Prometheus, Grafana, etc.). Counters can be use to
calculate rates and averages over time windows. Histograms provide raw bucket
data that backends use to calculate percentiles (p50, p75, p95, p99, p999),
averages, and other statistical measures. Solr delegates these calculations to
your monitoring system for better flexibility and reduced load on Solr.*
Review Comment:
I think this is a very important note that I debated also putting in the
metrics-reporting page but felt redundant saying it twice? Users may be used to
getting rates from Solr from 9 and before but now need to calculate in their
backend.
##########
solr/solr-ref-guide/modules/deployment-guide/pages/metrics-reporting.adoc:
##########
@@ -107,771 +93,287 @@ RequestHandlers can be configured to roll up core level
metrics to the node leve
</requestHandler>
```
-=== Jetty Registry
-
-This registry is returned at `solr.jetty` and includes the following
information.
-When making requests with the <<Metrics API>>, you can specify `&group=jetty`
to limit to only these metrics.
+=== JVM Registry
-* threads and pools,
-* connection and request timers,
-* meters for responses by HTTP class (1xx, 2xx, etc.)
+The `JVM Registry` gathers metrics from the JVM using the OpenTelemetry
instrumentation library with JFR and JMX. See the
https://github.com/open-telemetry/opentelemetry-java-instrumentation/tree/main/instrumentation/runtime-telemetry/runtime-telemetry-java17/library[runtime-telemetry-java17]
documentation for more information on available JVM metrics.
-== Metrics Configuration
+JVM metrics are enabled by default but can be disabled by setting either the
system property `-Dsolr.metrics.jvm.enabled=false` or the environment variable
`SOLR_METRICS_JVM_ENABLED=false`.
-The metrics available in your system can be customized by modifying the
`<metrics>` element in `solr.xml`.
+=== Overseer Registry
-TIP: See also the section xref:configuration-guide:configuring-solr-xml.adoc[]
for more information about the `solr.xml` file, where to find it, and how to
edit it.
+The `Overseer Registry` is initialized when running in SolrCloud mode and
includes the following information:
-=== Disabling the Metrics Collection
-The `<metrics>` element in `solr.xml` supports one attribute `enabled`, which
takes a boolean value,
-for example `<metrics enabled="true">`.
+* Size of the Overseer queues (collection work queue and cluster state update
queue)
-The default value of this attribute is `true`, meaning that metrics are being
collected, processed and
-reported by Solr according to the configured metric reporters.
-They are also available from the
-metrics APIs.
+== Core Level Metrics
-The `false` value of this attribute (`<metrics enabled="false">`) turns off
metrics collection and processing.
-Internally, all metrics suppliers are replaced by singleton no-op
-implementations, which effectively removes nearly all overheads related to
metrics collection.
-All reporter configurations are skipped, and the metrics APIs stop reporting
any metrics and only return an `<error>`
-element in their responses.
+=== Index Merge Metrics
-=== The <metrics> <hiddenSysProps> Element
+These metrics are collected under the `INDEX` category and track flush
operations (documents being written to disk) and merge operations (segments on
disk being merged).
-This section of `solr.xml` allows you to define the system properties which
are considered system-sensitive and should not be exposed via the Metrics API.
+For merge metrics, metrics are tracked with the distinction of "minor" and
"major" merges (as merges with fewer documents will be typically more frequent).
+This is indicated by the `merge_type` label for the metric. The threshold for
when a merge becomes large enough to be considered major is configurable, but
+defaults to 524k documents.
-If this section is not defined, the following default configuration is used
which hides password and authentication information:
+Metrics collection for index merges can be configured in the `<metrics>`
section of `solrconfig.xml` as shown below:
[source,xml]
----
-<metrics>
- <hiddenSysProps>
- <str>javax.net.ssl.keyStorePassword</str>
- <str>javax.net.ssl.trustStorePassword</str>
- <str>solr.security.auth.basicauth.credentials</str>
- <str>zkDigestPassword</str>
- <str>zkDigestReadonlyPassword</str>
- </hiddenSysProps>
-</metrics>
+<config>
+ ...
+ <indexConfig>
+ <metrics>
+ <long name="majorMergeDocs">524288</long>
+ </metrics>
+ ...
+ </indexConfig>
+...
+</config>
----
-[#the-metrics-reporters-element]
-=== The <metrics> <reporters> Element
-
-Reporters consume the metrics data generated by Solr.
-See the section <<Reporters>> below for more details on how to configure
custom reporters.
-
-=== The <metrics> <suppliers> Element
+== Metrics API
-Suppliers help Solr generate metrics data.
-The `<metrics><suppliers>` section of `solr.xml` allows you to define your own
implementations of metrics and configure parameters for them.
+The `/admin/metrics` endpoint natively provides access to all metrics in
Prometheus format by default. You can also specify `wt=prometheus` as a
parameter for Prometheus format or `wt=openmetrics` for OpenMetrics format.
More information on the data models is provided in the sections below.
-Implementation of a custom metrics supplier is beyond the scope of this guide,
but there are other customizations possible with the default implementation,
via the elements described below.
+=== Prometheus
-`<counter>`::
-This element defines the implementation and configuration of a `Counter`
supplier.
-The default implementation does not support any configuration.
+See https://prometheus.io/docs/concepts/data_model/[Prometheus Data Model]
documentation for more information on its data model.
-`<meter>`::
-This element defines the implementation of a `Meter` supplier.
-The default implementation supports an additional parameter:
+This endpoint can be used to pull/scrape metrics to a Prometheus server or any
Prometheus-compatible backend directly from Solr.
-`<str name="clock">`:::
-+
-[%autowidth,frame=none]
-|===
-|Optional |Default: `user`
-|===
-+
-The type of clock to use for calculating EWMA rates.
-The supported values are:
-* `user`, which uses `System.nanoTime()`
-* `cpu`, which uses the current thread's CPU time
+==== Prometheus Setup
-`<histogram>`::
-This element defines the implementation of a `Histogram` supplier.
-This element also supports the `clock` parameter shown above with the `meter`
element, and also:
+The `prometheus-config.yml` file needs to be configured for a Prometheus
server to scrape and collect metrics. A basic configuration for SolrCloud mode
is as follows:
-`<str name="reservoir">`:::
-+
-[%autowidth,frame=none]
-|===
-|Optional |Default: `com.codahale.metrics.ExponentiallyDecayingReservoir`
-|===
-+
-The fully-qualified class name of the `Reservoir` implementation to use.
-The default is `com.codahale.metrics.ExponentiallyDecayingReservoir` but there
are other options available with the http://metrics.dropwizard.io/[Codahale
Metrics library] that Solr uses.
-
-`<int name="size">`:::
-+
-[%autowidth,frame=none]
-|===
-|Optional |Default: `1028`
-|===
-+
-The reservoir size.
-
-`<double name="alpha">`:::
-+
-[%autowidth,frame=none]
-|===
-|Optional |Default: `0.015`
-|===
-+
-The decay parameter.
-This is only valid for the `ExponentiallyDecayingReservoir`.
-
-`<long name="window">`:::
-+
-[%autowidth,frame=none]
-|===
-|Optional |Default: `300` seconds
-|===
-+
-The window size, in seconds, and only valid for the
`SlidingTimeWindowReservoir`.
-
-`<timer>`::
-This element defines an implementation of a `Timer` supplier.
-The default implementation supports the `clock` and `reservoir` parameters
described above.
-
-As an example of a section of `solr.xml` that defines some of these custom
parameters, the following defines the default `Meter` supplier with a
non-default `clock` and the default `Timer` is used with a non-default
reservoir:
-
-[source,xml]
+[source,plain]
----
-<metrics>
- <suppliers>
- <meter>
- <str name="clock">cpu</str>
- </meter>
- <timer>
- <str
name="reservoir">com.codahale.metrics.SlidingTimeWindowReservoir</str>
- <long name="window">600</long>
- </timer>
- </suppliers>
-</metrics>
+scrape_configs:
+ - job_name: 'solr'
+ metrics_path: "/solr/admin/metrics"
+ static_configs:
+ - targets: ['localhost:8983', 'localhost:7574']
----
-=== The <metrics> <missingValues> Element
-Long-lived metrics values are still reported when the underlying value is
unavailable (e.g., "INDEX.sizeInBytes" when IndexReader is closed).
-Short-lived transient metrics (such as cache entries) that are properties of
complex gauges (internally represented as `MetricsMap`) are simply skipped when
not available, and neither their names nor values appear in registries (or in
`/admin/metrics` reports).
+=== OpenMetrics
-When a missing value is encountered by default it's reported as null value,
regardless of the metrics type.
-This can be configured in the `solr.xml:/solr/metrics/missingValues` element,
which recognizes the following child elements (for string elements a JSON
payload is supported):
+OpenMetrics format is available from the `/admin/metrics` endpoint by
providing the `wt=openmetrics` parameter or by passing the Accept header
`application/openmetrics-text;version=1.0.0`. OpenMetrics is an extension of
the Prometheus format that adds additional metadata and exemplars.
-`nullNumber`::
-+
-[%autowidth,frame=none]
-|===
-|Optional |Default: none
-|===
-+
-The value to use when a missing (null) numeric metrics value is encountered.
+See https://prometheus.io/docs/specs/om/open_metrics_spec/[OpenMetrics Spec]
documentation for more information.
-`notANumber`::
-+
-[%autowidth,frame=none]
-|===
-|Optional |Default: none
-|===
-+
-The value to use when an invalid numeric value is encountered.
+OpenMetrics can be used to pull/scrape metrics to a Prometheus server or any
OpenMetrics-compatible backend directly from Solr.
-`nullString`::
-+
-[%autowidth,frame=none]
-|===
-|Optional |Default: none
-|===
-+
-The value to use when a missing (null) string metrics is encountered.
+==== Prometheus setup with exemplars
-`nullObject`::
-+
-[%autowidth,frame=none]
-|===
-|Optional |Default: none
-|===
-+
-The value to use when a missing (null) complex object is encountered.
+OpenMetrics includes `exemplars` that provide additional information and allow
users to leverage Solr's
xref:deployment-guide:distributed-tracing.adoc#distributed-tracing[OpenTelemetry
distributed tracing module] and metrics in a cohesive view for correlating
traces and metrics.
-Example configuration that returns null for missing numbers, `-1` for
-invalid numeric values, empty string for missing strings, and a Map for missing
-complex objects:
+Distributed tracing must be enabled to see exemplars. Exemplars will never
appear in OpenMetrics format otherwise. You can then scrape OpenMetrics format
to a Prometheus server or OpenMetrics-compatible backend.
-[source,xml]
-----
-<metrics>
- <missingValues>
- <null name="nullNumber"/>
- <int name="notANumber">-1</int>
- <str name="nullString"></str>
- <str name="nullObject">{"value":"missing"}</str>
- </missingValues>
-</metrics>
-----
+A basic `prometheus-config.yml` configuration for a Prometheus server in
SolrCloud mode that collects exemplars is as follows:
-=== Caching Threads Metrics ===
-The threads metrics in the JVM group can be expensive to compute, as it
requires traversing all threads.
-This can be avoided for every call to the metrics API (group=jvm) by setting a
high caching expiration interval
-(in seconds). For example, to cache the thread metrics for 5 seconds:
-
-[source,xml]
+[source,plain]
----
-<solr>
- <metrics>
- <caching>
- <int name="threadsIntervalSeconds">5</int>
- </caching>
- ...
- </metrics>
-...
-</solr>
+scrape_configs:
+ - job_name: 'solr'
+ metrics_path: "/solr/admin/metrics"
+ static_configs:
+ - targets: ['localhost:8983', 'localhost:7574']
+ params:
+ wt: ['openmetrics']
+ scrape_protocols:
+ - OpenMetricsText1.0.0
----
-== Reporters
+The Prometheus server must also be started with the command-line parameter
`--enable-feature=exemplar-storage` to collect exemplars from OpenMetrics.
-Reporter configurations are specified in `solr.xml` file in
`<metrics><reporter>` sections, for example:
+If you are using Grafana, follow the
https://grafana.com/docs/grafana/latest/fundamentals/exemplars/[Introduction to
exemplars] guide to connect your Prometheus data source and see exemplars on
Grafana panels.
-[source,xml]
-----
-<solr>
- <metrics>
- <reporter name="graphite" group="node, jvm"
class="org.apache.solr.metrics.reporters.SolrGraphiteReporter">
- <str name="host">graphite-server</str>
- <int name="port">9999</int>
- <int name="period">60</int>
- </reporter>
- <reporter name="log_metrics" group="core"
class="org.apache.solr.metrics.reporters.SolrSlf4jReporter">
- <int name="period">60</int>
- <str name="filter">QUERY./select.requestTimes</str>
- <str name="filter">QUERY./get.requestTimes</str>
- <str name="filter">UPDATE./update.requestTimes</str>
- <str name="filter">UPDATE./update.clientErrors</str>
- <str name="filter">UPDATE./update.errors</str>
- <str name="filter">SEARCHER.new.time</str>
- <str name="filter">SEARCHER.new.warmup</str>
- <str
name="logger">org.apache.solr.metrics.reporters.SolrSlf4jReporter</str>
- </reporter>
- </metrics>
-...
-</solr>
-----
+=== API Filtering
-This example configures two reporters: <<Graphite Reporter,Graphite>> and
<<SLF4J Reporter,SLF4J>>.
-See below for more details on how to configure reporters.
+A fixed set of parameters is available to filter metrics by either metric name
or base core labels. You can combine these parameters to filter only the
specific metrics you need:
-=== Reporter Arguments
-
-Reporter plugins use the following arguments:
+*NOTE: All parameters can be specified with more than one value in a request;
multiple values should be separated by a comma.*
`name`::
+
[%autowidth,frame=none]
|===
-s|Required |Default: none
-|===
-+
-The unique name of the reporter plugin.
-
-`class`::
-+
-[%autowidth,frame=none]
-|===
-s|Required |Default: none
-|===
-+
-The fully-qualified implementation class of the plugin, which must extend
`SolrMetricReporter`.
-
-`group`::
-+
-[%autowidth,frame=none]
-|===
-|Optional |Default: none
-|===
-+
-One or more of the predefined groups (see above).
-
-`registry`::
-+
-[%autowidth,frame=none]
-|===
|Optional |Default: none
|===
+
-One or more of valid fully-qualified registry names.
+The metric name to filter on.
-If both `group` and `registry` attributes are specified only the `group`
attribute is considered.
-If neither attribute is specified then the plugin will be used for all groups
and registries.
-Multiple group or registry names can be specified, separated by comma and/or
space.
-
-Additionally, several implementation-specific initialization arguments can be
specified in nested elements.
-There are some arguments that are common to SLF4J, Ganglia and Graphite
reporters:
-
-`period`::
-+
-[%autowidth,frame=none]
-|===
-|Optional |Default: `60` seconds
-|===
-+
-The period in seconds between reports.
-
-`prefix`::
-+
-[%autowidth,frame=none]
-|===
-|Optional |Default: _empty string_
-|===
-+
-A prefix to be added to metric names, which may be helpful in logical grouping
of related Solr instances, e.g., machine name or cluster name.
-Default is empty string, i.e., just the registry name and metric name will be
used to form a fully-qualified metric name.
-
-`filter`::
+`category`::
+
[%autowidth,frame=none]
|===
|Optional |Default: none
|===
+
-If not empty then only metric names that start with this value will be
reported.
-Default is no filtering, i.e., all metrics from the selected registry will be
reported.
-
-Reporters are instantiated for every group and registry that they were
configured for, at the time when the respective components are initialized
(e.g., on JVM startup or SolrCore load).
-
-When reporters are created their configuration is validated (and e.g.,
necessary connections are established).
-Uncaught errors at this initialization stage cause the reporter to be
discarded from the running configuration.
-
-Reporters are closed when the corresponding component is being closed (e.g.,
on SolrCore close, or JVM shutdown) but metrics that they reported are still
maintained in respective registries, as explained in the previous section.
-
-The following sections provide information on implementation-specific
arguments.
-All implementation classes provided with Solr can be found under
`org.apache.solr.metrics.reporters`.
+The category label to filter on.
-=== JMX Reporter
-
-The JMX Reporter uses the `org.apache.solr.metrics.reporters.SolrJmxReporter`
class.
-
-It takes the following arguments:
-
-`domain`::
+`core`::
+
[%autowidth,frame=none]
|===
|Optional |Default: none
|===
+
-The JMX domain name.
-If not specified then the registry name will be used.
+The core name to filter on.
+More than one core can be specified in a request; multiple cores should be
separated by a comma.
-`serviceUrl`::
+`collection`::
+
[%autowidth,frame=none]
|===
|Optional |Default: none
|===
+
-The service URL for a JMX server.
-If not specified, Solr will attempt to discover if the JVM has an MBean server
and will use that address.
-See below for additional information on this.
+The collection name to filter on. This attribute is only filterable in
SolrCloud mode.
-`agentId`::
+`shard`::
+
[%autowidth,frame=none]
|===
|Optional |Default: none
|===
+
-The agent ID for a JMX server.
-Note either `serviceUrl` or `agentId` can be specified but not both.
-If both are specified then the default MBean server will be used.
-
-Object names created by this reporter are hierarchical, dot-separated but also
properly structured to form corresponding hierarchies in e.g., JConsole.
-This hierarchy consists of the following elements in the top-down order:
-
-* registry name (e.g., `solr.core.collection1.shard1.replica1`).
-Dot-separated registry names are also split into ObjectName hierarchy levels,
so that metrics for this registry will be shown under
`/solr/core/collection1/shard1/replica1` in JConsole, with each domain part
being assigned to `dom1, dom2, ... domN` property.
-* reporter name (the value of reporter's `name` attribute)
-* category, scope and name for request handlers
-* or additional `name1, name2, ... nameN` elements for metrics from other
components.
-
-The JMX Reporter replaces the JMX functionality available in Solr versions
before 7.0.
-If you have upgraded from an earlier version and have an MBean Server running
when Solr starts, Solr will automatically discover the location of the local
MBean server and use a default configuration for the SolrJmxReporter.
-
-You can start a local MBean server with a system property at startup by adding
`-Dcom.sun.management.jmxremote` to your start command.
-This will not add the reporter configuration to `solr.xml`, so if you enable
it with a system property, you must always start Solr with the system property
or JMX will not be enabled in subsequent starts.
+The shard name to filter on. This attribute is only filterable in SolrCloud
mode.
-=== SLF4J Reporter
-
-The SLF4J Reporter uses the
`org.apache.solr.metrics.reporters.SolrSlf4jReporter` class.
-
-It takes the following arguments, in addition to common arguments described
<<Reporter Arguments,above>>.
-
-`logger`::
+`replica_type`::
+
[%autowidth,frame=none]
|===
|Optional |Default: none
|===
+
-The name of the logger to use.
-Default is empty, in which case the group (or the initial part of the registry
name that identifies a metrics group) will be used if specified in the plugin
configuration.
-
-Users can specify logger name (and the corresponding logger configuration in
e.g., Log4j configuration) to output metrics-related logging to separate
file(s), which can then be processed by external applications.
-Here is an example for configuring the default `log4j2.xml` which ships in
Solr.
-This can be used in conjunction with the `solr.xml` example provided earlier
in this page to configure the SolrSlf4jReporter:
-
-[source,xml]
-----
-<Configuration>
- <Appenders>
- ...
- <RollingFile
- name="MetricsFile"
- fileName="${sys:solr.logs.dir}/solr_metrics.log"
- filePattern="${sys:solr.logs.dir}/solr_metrics.log.%i" >
- <PatternLayout>
- <Pattern>
- %d{yyyy-MM-dd HH:mm:ss.SSS} %-5p (%t) [%X{node_name} %X{collection}
%X{shard} %X{replica} %X{core} %X{trace_id}] %m%n
- </Pattern>
- </PatternLayout>
- <Policies>
- <OnStartupTriggeringPolicy />
- <SizeBasedTriggeringPolicy size="32 MB"/>
- </Policies>
- <DefaultRolloverStrategy max="10"/>
- </RollingFile>
- ...
- </Appenders>
+The replica type to filter on. Valid values are NRT, TLOG, or PULL. This
attribute is only filterable in SolrCloud mode.
- <Loggers>
- ...
- <Logger name="org.apache.solr.metrics.reporters.SolrSlf4jReporter"
level="info" additivity="false">
- <AppenderRef ref="MetricsFile"/>
- </Logger>
- ...
- </Loggers>
-</Configuration>
-----
+[[metrics_examples]]
+=== Examples
-Each log line produced by this reporter consists of configuration-specific
fields, and a message that follows this format:
+Request only metrics from the `foobar` collection:
[source,text]
-----
-type=COUNTER, name={}, count={}
-
-type=GAUGE, name={}, value={}
-
-type=TIMER, name={}, count={}, min={}, max={}, mean={}, stddev={}, median={},
p75={}, p95={}, p98={}, p99={}, p999={}, mean_rate={}, m1={}, m5={}, m15={},
rate_unit={}, duration_unit={}
-
-type=METER, name={}, count={}, mean_rate={}, m1={}, m5={}, m15={}, rate_unit={}
+http://localhost:8983/solr/admin/metrics?collection=foobar
-type=HISTOGRAM, name={}, count={}, min={}, max={}, mean={}, stddev={},
median={}, p75={}, p95={}, p98={}, p99={}, p999={}
-----
-
-(curly braces added here only as placeholders for actual values).
-
-Additionally, the following MDC context properties are passed to the logger
and can be used in log formats:
-
-`node_name`::
-Solr node name (for SolrCloud deployments, otherwise null), prefixed with `n:`.
-
-`registry`::
-Metric registry name, prefixed with `m:`.
-
-For reporters that are specific to a SolrCore also the following properties
are available:
-
-`collection`::
-Collection name, prefixed with `c:`.
-
-`shard`::
-Shard name, prefixed with `s:`.
-
-`replica`::
-Replica name (core node name), prefixed with `r:`.
-
-`core`::
-SolrCore name, prefixed with `x:`.
-
-`tag`::
-Reporter instance tag, prefixed with `t:`.
-
-=== Graphite Reporter
-
-The http://graphiteapp.org[Graphite] Reporter uses the
`org.apache.solr.metrics.reporters.SolrGraphiteReporter`) class.
-
-It takes the following attributes, in addition to the common attributes
<<Reporter Arguments,above>>.
-
-`host`::
-+
-[%autowidth,frame=none]
-|===
-s|Required |Default: none
-|===
-+
-The host name where Graphite server is running.
-
-`port`::
-+
-[%autowidth,frame=none]
-|===
-s|Required |Default: none
-|===
-+
-The port number for the server.
-
-`pickled`::
-+
-[%autowidth,frame=none]
-|===
-s|Required |Default: `false`
-|===
-+
-If `true`, use "pickled" Graphite protocol which may be more efficient.
-
-When plain-text protocol is used (`pickled==false`) it's possible to use this
reporter to integrate with systems other than Graphite, if they can accept
space-separated and line-oriented input over network in the following format:
+Request only the metrics with a category label of QUERY or UPDATE:
[source,text]
-----
-dot.separated.metric.name[.and.attribute] value epochTimestamp
-----
+http://localhost:8983/solr/admin/metrics?category=QUERY,UPDATE
-For example:
+Request only `solr_core_requests_total` metrics from the
`foobar_shard1_replica_n1` core:
-[source,plain]
-----
-example.solr.node.cores.loaded 1 1482932097
-example.solr.jetty.org.eclipse.jetty.server.handler.DefaultHandler.2xx-responses.count
21 1482932097
-example.solr.jetty.org.eclipse.jetty.server.handler.DefaultHandler.2xx-responses.m1_rate
2.5474287707930614 1482932097
-example.solr.jetty.org.eclipse.jetty.server.handler.DefaultHandler.2xx-responses.m5_rate
3.8003171557510305 1482932097
-example.solr.jetty.org.eclipse.jetty.server.handler.DefaultHandler.2xx-responses.m15_rate
4.0623076220244245 1482932097
-example.solr.jetty.org.eclipse.jetty.server.handler.DefaultHandler.2xx-responses.mean_rate
0.5698031798408144 1482932097
-----
-
-== Core Level Metrics
-
-These metrics are available only on a per-core basis.
-Metrics can be aggregated across cores using Shard and Cluster reporters.
+[source,text]
+http://localhost:8983/solr/admin/metrics?name=solr_core_requests_total&core=foobar_shard1_replica_n1
-=== Index Merge Metrics
+Request only the core index size `solr_core_index_size_bytes` metrics from
collections labeled `foo` and `bar`:
-These metrics are collected under the `INDEX` category and track flush
operations (documents being written to disk) and merge operations (segments on
disk being merged).
+[source,text]
+http://localhost:8983/solr/admin/metrics?name=solr_core_index_size_bytes&collection=foo,bar
-For merge metrics, metrics are tracked with the distinction of "minor" and
"major" merges (as merges with fewer documents will be typically more frequent).
-This is indicated by the `merge_type` label for the metric. The threshold for
when a merge becomes large enough to be considered major is configurable, but
-defaults to 524k documents.
+== OTLP
-Metrics collection for index merges can be configured in the `<metrics>`
section of `solrconfig.xml` as shown below:
+For users who do not use or support pulling metrics in Prometheus format with
the `/admin/metrics` API, Solr also supports pushing metrics natively with
https://opentelemetry.io/docs/specs/otlp/[OTLP], which is a vendor-agnostic
protocol for pushing metrics via gRPC or HTTP.
-[source,xml]
-----
-<config>
- ...
- <indexConfig>
- <metrics>
- <long name="majorMergeDocs">524288</long>
- </metrics>
- ...
- </indexConfig>
-...
-</config>
-----
+OTLP is widely supported by many tools, vendors, and pipelines. See the
OpenTelemetry https://opentelemetry.io/ecosystem/vendors/[vendors list] for
more details on available and compatible options.
+=== OTLP properties
-== Metrics API
+Solr's internal OTLP exporter is disabled by default and is packaged with the
OpenTelemetry module.
-The `admin/metrics` endpoint provides access to all the metrics for all metric
groups.
+The module can be enabled with either the system property
`-Dsolr.modules=opentelemetry` or the environment variable
`SOLR_MODULES=opentelemetry`, similar to distributed tracing.
-A few query parameters are available to limit your request to only certain
metrics:
+The OTLP exporter can be configured with the supported system properties
below. These can also be set as environment variables by following these
mapping rules:
-`group`::
-+
-[%autowidth,frame=none]
-|===
-|Optional |Default: `all`
-|===
-+
-The metric group to retrieve.
-The value `all` retrieves all metrics for all groups.
-Other possible values are: `jvm`, `jetty`, `node`, and `core`.
-More than one group can be specified in a request; multiple group names should
be separated by a comma.
+- Replace `.` with `_`
+- Convert camelCase to UPPER_SNAKE_CASE
+- Make all letters uppercase
-`type`::
+`solr.metrics.otlpExporterEnabled`::
+
[%autowidth,frame=none]
|===
-|Optional |Default: `all`
+|Optional |Default: false
|===
+
-The type of metric to retrieve.
-The value `all` retrieves all metric types.
-Other possible values are `counter`, `gauge`, `histogram`, `meter`, and
`timer`.
-More than one type can be specified in a request; multiple types should be
separated by a comma.
+Boolean value to enable or disable the OTLP metrics exporter.
-`prefix`::
+`solr.metrics.otlpExporterProtocol`::
+
[%autowidth,frame=none]
|===
-|Optional |Default: none
+|Optional |Default: grpc
|===
+
-The first characters of metric name that will filter the metrics returned to
those starting with the provided string.
-It can be combined with `group` and/or `type` parameters.
-More than one prefix can be specified in a request; multiple prefixes should
be separated by a comma.
-Prefix matching is also case-sensitive.
+OTLP protocol to use for pushing metrics. Available options are `grpc`,
`http`, or `none` (disabled).
-`regex`::
+`solr.metrics.otlpExporterInterval`::
+
[%autowidth,frame=none]
|===
-|Optional |Default: none
+|Optional |Default: 60000
|===
+
-A regular expression matching metric names.
-Note: dot separators in metric names must be escaped, e.g.,
-`QUERY\./select\..*` is a valid regex that matches all metrics with the
`QUERY./select.` prefix.
+The interval in milliseconds for how frequently metrics are pushed via OTLP.
-`property`::
+`solr.metrics.otlpGrpcExporterEndpoint`::
+
[%autowidth,frame=none]
|===
-|Optional |Default: none
+|Optional |Default: http://localhost:4317
|===
+
-Allows requesting only this metric from any compound metric.
-Multiple `property` parameters can be combined to act as an OR request.
-For example, to only get the 99th and 999th percentile values from all metric
types and groups, you can add `&property=p99_ms&property=p999_ms` to your
request.
-This can be combined with `group`, `type`, and `prefix` as necessary.
+Endpoint to send OTLP metrics to using the gRPC protocol.
-`key`::
+`solr.metrics.otlpHttpExporterEndpoint`::
+
[%autowidth,frame=none]
|===
-|Optional |Default: none
+|Optional |Default: http://localhost:4318/v1/metrics
|===
+
-The fully-qualified metric name, which specifies one concrete metric instance
(parameter can be specified multiple times to retrieve multiple concrete
metrics).
-+
-Fully-qualified name consists of registry name, colon and metric name, with
optional colon and metric property.
-Colons in names can be escaped using backslash (`\`) character.
-Examples:
+Endpoint to send OTLP metrics to using the HTTP protocol.
-* `key=solr.node:CONTAINER.fs.totalSpace`
-* `key=solr.core.collection1:QUERY./select.requestTimes:max_ms`
-* `key=solr.jvm:system.properties:user.name`
-+
-*NOTE: when this parameter is used, any other selection methods are ignored.*
+=== OpenTelemetry Collector setup
-`expr`::
-+
-[%autowidth,frame=none]
-|===
-|Optional |Default: none
-|===
-+
-Extended notation of the `key` selection criteria, which supports regular
expressions for each of the parts supported by the `key` selector.
-This parameter can be specified multiple times to retrieve metrics that match
-any expression.
-The API guarantees that the output will consist only of unique metric names
even if multiple expressions match the same metric name.
-Note: the order of multiple `expr` parameters matters here - only the first
value of the first matching expression will be recorded, subsequent values for
the same metric name produced by matching other expressions will be skipped.
-+
-Fully-qualified expression consists of at least two and at most three regex
patterns separated by colons: a registry pattern, colon, a metric pattern, and
then an optional colon and metric property pattern.
-Colons and other regex meta-characters in names and in regular expressions
MUST be escaped using backslash (`\`) character.
-+
-Examples:
+The https://opentelemetry.io/docs/collector/[OpenTelemetry Collector] is a
powerful process that allows users to decouple their metrics pipeline and route
to their preferred backend. It natively supports metrics being pushed to it via
OTLP and/or scraping the `/admin/metrics` Prometheus endpoint supported by
Solr. You can push both metrics and traces to the collector via OTLP as a
single pipeline.
-* `expr=solr\.core\..*:QUERY\..*\.requestTimes:max_ms`
-* `expr=solr\.jvm:system\.properties:user\..*`
-
-+
-*NOTE: when this parameter is used, any other selection methods are ignored.*
+A simple setup to route metrics from Solr -> OpenTelemetry Collector ->
Prometheus can be configured with the following OpenTelemetry Collector
configuration file:
-`compact`::
-+
-[%autowidth,frame=none]
-|===
-|Optional |Default: `true`
-|===
-+
-When `false`, a more verbose format of the response will be returned.
-Instead of a response like this:
-+
-[source,json]
-----
-{"metrics": [
- "solr.core.gettingstarted",
- {
- "CORE.aliases": {
- "value": ["gettingstarted"]
- },
- "CORE.coreName": {
- "value": "gettingstarted"
- },
- "CORE.indexDir": {
- "value": "/solr/example/schemaless/solr/gettingstarted/data/index/"
- },
- "CORE.instanceDir": {
- "value": "/solr/example/schemaless/solr/gettingstarted"
- },
- "CORE.refCount": {
- "value": 1
- },
- "CORE.startTime": {
- "value": "2017-03-14T11:43:23.822Z"
- }
- }
- ]}
-----
-+
-The response will look like this:
-+
-[source,json]
+[source,plain]
----
-{"metrics": [
- "solr.core.gettingstarted",
- {
- "CORE.aliases": [
- "gettingstarted"
- ],
- "CORE.coreName": "gettingstarted",
- "CORE.indexDir":
"/solr/example/schemaless/solr/gettingstarted/data/index/",
- "CORE.instanceDir": "/solr/example/schemaless/solr/gettingstarted",
- "CORE.refCount": 1,
- "CORE.startTime": "2017-03-14T11:43:23.822Z"
- }
- ]}
+receivers:
+ otlp:
+ protocols:
+ grpc:
+ endpoint: 0.0.0.0:4317
+ http:
+ endpoint: 0.0.0.0:4318
+
+exporters:
+ prometheus:
+ endpoint: 0.0.0.0:9464
+ send_timestamps: true
+ enable_open_metrics: true
+
+service:
+ pipelines:
+ metrics:
+ receivers: [otlp]
+ exporters: [prometheus]
----
Review Comment:
I added a very basic OTEL Collector config users can get started with to use
OTLP.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]