[
https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16337190#comment-16337190
]
Andrzej Bialecki commented on SOLR-11882:
------------------------------------------
{quote}[~ab]kindly provided this suggestion, I applied it
{quote}
My suggestion was to replace Gauge metrics in the registry inside
{{SolrCoreMetricManager.close()}} with their last primitive values (because
most of these Gauges are created as lambdas and keep referencing SolrCore,
whereas values they produce don't reference the core) - this way we would stop
referencing SolrCore but still preserve a snapshot of gauge values. Something
like this:
{code}
metricRegistry.getGauges().forEach((k, v) -> {
Object val = v.getValue();
metricRegistry.remove(k);
metricRegistry.register(k, (Gauge)() -> val);
}
{code}
I'm surprised your patch works, because {{SolrCoreMetricManager.close()}} is
called from inside {{SolrCore.close()}}, and calling {{SolrCore.close()}} here
again should IMHO lead to "Too many closes" exception...
> SolrMetric registries retain references to SolrCores when closed
> ----------------------------------------------------------------
>
> Key: SOLR-11882
> URL: https://issues.apache.org/jira/browse/SOLR-11882
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: Server
> Affects Versions: 7.1
> Reporter: Eros Taborelli
> Assignee: Erick Erickson
> Priority: Major
> Attachments: SOLR-11882.patch, SOLR-11882.patch, create-cores.zip,
> solr-dump-full_Leak_Suspects.zip, solr.config.zip
>
>
> *Description:*
> Our setup involves using a lot of small cores (possibly hundred thousand),
> but working only on a few of them at any given time.
> We already followed all recommendations in this guide:
> [https://wiki.apache.org/solr/LotsOfCores]
> We noticed that after creating/loading around 1000-2000 empty cores, with no
> documents inside, the heap consumption went through the roof despite having
> set transientCacheSize to only 64 (heap size set to 12G).
> All cores are correctly set to loadOnStartup=false and transient=true, and we
> have verified via logs that the cores in excess are actually being closed.
> However, a reference remains in the
> org.apache.solr.metrics.SolrMetricManager#registries that is never removed
> until a core if fully unloaded.
> Restarting the JVM loads all cores in the admin UI, but doesn't populate the
> ConcurrentHashMap until a core is actually fully loaded.
> I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size
> = 512m) and made a report (attached) using eclipse MAT.
> *Desired outcome:*
> When a transient core is closed, the references in the SolrMetricManager
> should be removed, in the same fashion the reporters for the core are also
> closed and removed.
> In alternative, a unloadOnClose=true|false flag could be implemented to fully
> unload a transient core when closed due to the cache size.
> *Note:*
> The documentation mentions everywhere that the unused cores will be unloaded,
> but it's misleading as the cores are never fully unloaded.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]