[jira] [Commented] (SOLR-11779) Basic long-term collection of aggregated metrics

Andrzej Bialecki (JIRA) Wed, 23 May 2018 03:17:24 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-11779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487032#comment-16487032
 ]


Andrzej Bialecki  commented on SOLR-11779:
------------------------------------------

bq. Maybe it doesn't make sense in 7x to have enable=true by default?
The way defaults are set for now it would only collect aggregated metrics 
history (one history db per collection, plus one for aggregated nodes and one 
for aggregated jvms). Considering the small memory impact of each DB (~30kB) 
and small CPU impact (metrics are polled every 60 sec) I'd say it's benign. But 
I've been wrong before ... ;)

[~janhoy] definitely, the format of the graphs is suitable for just 
copy/pasting the data into an {{<img src="data:image/png;base64, .... "/>}} 
element.

Tracking the history of ephemeral resources such as individual replicas and 
nodes is somewhat complicated due to their relatively shorter life-cycle (I 
know, it may sound weird if you run 3 nodes with 3 collections, but there are 
users running very large clusters that experience high churn). There's a config 
option to collect selected per-node metrics so it's possible to do so (see the 
patch description above). However, there's no mechanism in place yet to 
automatically clean up these DBs when nodes and replicas go permanently away 
(though we could add it as a scheduled maintenance task, there's already a 
predefined trigger for this). There's an API for doing this manually.

The list of metrics that are currently collected is as follows:
* CORE and COLLECTION level metrics
** QUERY./select.requests
** UPDATE./update.requests
** INDEX.sizeInBytes
** numShards (active)
** numReplicas (active)
* NODE level metrics
** CONTAINER.fs.coreRoot.usableSpace
** numNodes
* JVM level metrics
** memory.heap.used
** os.processCpuLoad
** os.systemLoadAverage

Currently one DB is created for each these groups. However, RRD4j doesn't allow 
adding new datasources once the DB is created, so this list is not configurable 
on the fly (yet - there are ways to work-around it that I'm exploring).

> Basic long-term collection of aggregated metrics
> ------------------------------------------------
>
>                 Key: SOLR-11779
>                 URL: https://issues.apache.org/jira/browse/SOLR-11779
>             Project: Solr
>          Issue Type: New Feature
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: metrics
>    Affects Versions: 7.3, master (8.0)
>            Reporter: Andrzej Bialecki 
>            Assignee: Andrzej Bialecki 
>            Priority: Major
>         Attachments: SOLR-11779.patch, SOLR-11779.patch, SOLR-11779.patch, 
> c1.png, c2.png, core.json, d1.png, d2.png, d3.png, jvm-list.json, 
> jvm-string.json, jvm.json, o1.png, u1.png
>
>
> Tracking the key metrics over time is very helpful in understanding the 
> cluster and user behavior.
> Currently even basic metrics tracking requires setting up an external system 
> and either polling {{/admin/metrics}} or using {{SolrMetricReporter}}-s. The 
> advantage of this setup is that these external tools usually provide a lot of 
> sophisticated functionality. The downside is that they don't ship out of the 
> box with Solr and require additional admin effort to set up.
> Solr could collect some of the key metrics and keep their historical values 
> in a round-robin database (eg. using RRD4j) to keep the size of the historic 
> data constant (eg. ~64kB per metric), but at the same providing out of the 
> box useful insights into the basic system behavior over time. This data could 
> be persisted to the {{.system}} collection as blobs, and it could be also 
> presented in the Admin UI as graphs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-11779) Basic long-term collection of aggregated metrics

Reply via email to