dsmiley commented on code in PR #3671:
URL: https://github.com/apache/solr/pull/3671#discussion_r2369926163
##########
solr/core/src/test/org/apache/solr/core/TestSolrConfigHandler.java:
##########
@@ -1045,20 +1046,19 @@ public void testSetPropertyEnableAndDisableCache()
throws Exception {
String payload = "{'set-property' : { 'query.documentCache.enabled': true}
}";
runConfigCommand(restTestHarness, "/config", payload);
restTestHarness.setServerProvider(() -> getBaseUrl());
- MapWriter confMap = getRespMap("/admin/metrics", restTestHarness);
- assertNotNull(
- confMap._get(
- asList("metrics", "solr.core.collection1",
"CACHE.searcher.documentCache"), null));
+
+ String prometheusMetrics =
restTestHarness.query("/admin/metrics?wt=prometheus");
+ assertTrue(prometheusMetrics.contains("cache_name=\"documentCache\""));
// Disabling Cache
payload = "{ 'set-property' : { 'query.documentCache.enabled': false } }";
restTestHarness.setServerProvider(oldProvider);
+
runConfigCommand(restTestHarness, "/config", payload);
restTestHarness.setServerProvider(() -> getBaseUrl());
- confMap = getRespMap("/admin/metrics", restTestHarness);
- assertNull(
- confMap._get(
- asList("metrics", "solr.core.collection1",
"CACHE.searcher.documentCache"), null));
+
+ prometheusMetrics = restTestHarness.query("/admin/metrics?wt=prometheus");
Review Comment:
it's fine; just wanted to confirm. The explicitness has value!
##########
solr/core/src/java/org/apache/solr/core/CoreContainer.java:
##########
@@ -874,9 +874,7 @@ private void loadInternal() {
String cacheName = e.getKey();
c.initializeMetrics(
solrMetricsContext,
- Attributes.builder()
- .put(AttributeKey.stringKey("cache_name"), "nodeLevelCache/" +
cacheName)
- .build(),
+ Attributes.builder().put(NAME_ATTR, "nodeLevelCache/" +
cacheName).build(),
Review Comment:
IMO "nodeLevelCache" can simply be "node". The metric name itself includes
the word "cache"; we know it's a cache.
I like that you were thinking of this cache level differentiation. It feels
like a small hack to embed this in the cache name. I suppose ideally it'd be
in the metric name but maybe cache name is fine too.
##########
solr/core/src/java/org/apache/solr/search/CaffeineCache.java:
##########
@@ -471,47 +477,90 @@ public SolrMetricsContext getSolrMetricsContext() {
@Override
public String toString() {
- return name() + (cacheMap != null ? cacheMap.getValue().toString() : "");
+ return name();
}
- // TODO SOLR-17458: Migrate to Otel
@Override
public void initializeMetrics(
SolrMetricsContext parentContext, Attributes attributes, String scope) {
+ Attributes cacheAttributes =
+ attributes.toBuilder().put(CATEGORY_ATTR,
getCategory().toString()).build();
solrMetricsContext = parentContext.getChildContext(this);
- cacheMap =
- new MetricsMap(
- map -> {
+
+ ObservableLongMeasurement cacheOperation =
+ solrMetricsContext.longCounterMeasurement(
+ "solr_searcher_cache_ops",
+ "Number of cache operations (hits, lookups, inserts and
evictions)");
+
+ ObservableLongMeasurement hitRatioMetric =
+
solrMetricsContext.longGaugeMeasurement("solr_searcher_cache_hit_ratio", "Cache
hit ratio");
+
+ ObservableLongMeasurement sizeMetric =
+ solrMetricsContext.longGaugeMeasurement(
+ "solr_searcher_cache_size", "Current number cache entries");
+
+ ObservableLongMeasurement ramBytesUsedMetric =
+ solrMetricsContext.longGaugeMeasurement(
+ "solr_searcher_cache_ram_used", "RAM bytes used by cache",
OtelUnit.BYTES);
+
+ ObservableLongMeasurement warmupTimeMetric =
+ solrMetricsContext.longGaugeMeasurement(
+ "solr_searcher_cache_warmup_time", "Cache warmup time",
OtelUnit.MILLISECONDS);
+
+ ObservableLongMeasurement cumulativeCacheOperation =
+ solrMetricsContext.longGaugeMeasurement(
+ "solr_cache_cumulative_ops",
+ "Number of cumulative cache operations (hits, lookups, inserts and
evictions)");
+
+ this.toClose =
+ solrMetricsContext.batchCallback(
+ () -> {
if (cache != null) {
CacheStats stats = cache.stats();
long hitCount = stats.hitCount() + hits.sum();
long insertCount = inserts.sum();
long lookupCount = stats.requestCount() + lookups.sum();
- map.put(LOOKUPS_PARAM, lookupCount);
- map.put(HITS_PARAM, hitCount);
- map.put(HIT_RATIO_PARAM, hitRate(hitCount, lookupCount));
- map.put(INSERTS_PARAM, insertCount);
- map.put(EVICTIONS_PARAM, stats.evictionCount());
- map.put(SIZE_PARAM, cache.asMap().size());
- map.put("warmupTime", warmupTime);
- map.put(RAM_BYTES_USED_PARAM, ramBytesUsed());
- map.put(MAX_RAM_MB_PARAM, getMaxRamMB());
+ cacheOperation.record(
+ hitCount, cacheAttributes.toBuilder().put(OPERATION_ATTR,
"hits").build());
Review Comment:
You addressed my feedback, thanks.
I see that the lookup metric with the absence of `result` is the sum of
hit/miss. I asked Gemini AI about the possible redundancy of this, and it
pointed out the trend of doing aggregations at query time and only having
granular data at the spot where we export. This makes sense based on how easy
it showed me to sum the lookups of a given cache name across all cores. But
it's not quite so simple/easy with this redundancy; since then you have to
specifically match where there is a value in result or exclude where result is
present (there are different ways). It's not a big deal but... I dunno.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]