[jira] [Commented] (SOLR-13579) Create resource management API

Hoss Man (JIRA) Fri, 26 Jul 2019 16:23:22 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-13579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16894184#comment-16894184
 ]


Hoss Man commented on SOLR-13579:
---------------------------------

Honestly, i'm still very lost.

Part of my struggle is i'm trying to wade into the patch, and review the APIs 
and functionality it contains, while knowing – as you mentioned – that's not 
all the details are here, and it's not fully fleshed out w/everything you 
intend as far as configuration and customization and having more concrete 
implementations beyond just the {{CacheManagerPlugin}}.

I know that in your mind there is more that can/should be done, and that some 
of this code is just "placeholder" for later, but i don't have enough 
familiarity with the "long term" plan to really understand what in the current 
patch is placeholder or stub APIs, vs what is "real" and exists because of long 
term visions for how all of these pieces can be used together in a more 
generalized system – ie: what classes might have surface APIs that look more 
complex then needed given what's currently implemented in the patch, because of 
how you envinsion those classes being used in the future?

Just to pick one example, was my question about the "ResourceManagerPool" vs 
"ResourceManagerPlugin" – in your reply you said...
{quote}The code in ResourceManagerPool is independent of the type of 
resource(s) that a pool can manage. ...
{quote}
...but the code in {{ResourceManagerPlugin}} is _also_ independent of any 
specific type of resource(s) that a pool can manage – those specifics only 
exist in the concrete subclasses. Hence the crux of my question is why theses 
two very generalized pieces of abstract functionality/data collection couldn't 
just be a single abstract base class for all (concrete) ResourceManagerPlugin 
subclasses to extend?

Your followup gives a clue...
{quote}...perhaps at some point we could allow a single pool to manage several 
aspects of a component, in which case a pool could have several plugins.
{quote}
but w/o some "concrete hypothetical" examples of what that might look like, 
it's hard to evaluate if the current APIs are the "best" approach, or if maybe 
there is something better/simpler.
{quote}Also, there can be different pools of the same type, each used for a 
different group of components that support the same management aspect. For 
example, for searcher caches we may want to eventually create separate pools 
for filterCache, queryResultCache and fieldValueCache. All of these pools would 
use the same plugin implementation CacheManagerPlugin but configured with 
different params and limits.
{quote}
But even in this situation, there could be multiple *instances* of a 
{{CacheManagerPlugin}}, one for each pool, each with different params and 
limits, w/o needing distinction between the {{ResourceManagerPlugin}} 
concept/instances and the {{ResourceManagerPool}} concept/instances.

(To be clear, i'm not trying to harp on the specific design/seperation/linkage 
of {{ResourceManagerPlugin}} vs {{ResourceManagerPool}} – these are just some 
of the first classes i looked at and had questions about. I'm just using them 
as examples of where/how it's hard to ask questions or form opinions about the 
current API/code w/o having a better grasp of some "concrete specifcs" (or even 
"hypothetical specifics") of when/how/where/why each of these APIs are expected 
to be used and interact w/each other.

Another example of where i got lost as to the specific motivation behind some 
of these APIs in the long term view is in the "loose coupling" that currently 
exists in the patch between the {{ManagedComponent}} API and 
{{ResourceManagerPlugin}}:
 As i understand it:
 * An object in Solr supports being managed by a particular subclass of 
{{ResourceManagerPlugin}} if and only if it extends {{ManagedComponent}} and 
implementes {{ManagedComponent.getManagedResourceTypes()}} such that the 
resulting {{Collection<String>}} contains a String matching the return value of 
a {{ResourceManagerPlugin.getType()}} for that particular 
{{ResourceManagerPlugin}}
 ** ie: {{SolrCache}} extends the {{ManagedComponent}} interface, and all 
classess implementeing {{SolrCache}} should/must implement 
{{getManagedResourceTypes()}} by returning a java {{Collection}} containing 
{{CacheManagerPlugin.TYPE}}
 * once some {{ManagedComponent}} instances are "registered in a pool" and 
managed by a specific {{ResourceManagerPlugin}} intsance then that plugin 
expects to be able to call {{ManagedComponent.setResourceLimits(Map<String, 
Object> limits)}} and {{ManagedComponent.getResourceLimits()}} on all of those 
{{ManagedComponent}} instances, and that both Maps should contain/support a set 
of {{String}} keys specific to that {{ResourceManagerPlugin}} subclass acording 
to {{ResourceManagerPlugin.getControlledParams()}}
 ** ie: {{CacheManagerPlugin.getControlledParams()}} returns a java 
{{Collection}} containing {{SolrCache.MAX_RAM_MB_PARAM}} and 
{{SolrCache.MAX_SIZE_PARAM}} which are also used in a "switch style" if/else 
block in {{LRUCache.setResourceLimit(...)}} to adjust the corrisponding 
(private, strongly typed) internal variables.

...but why is this coupling so loose? why is each implementation of 
{{ManagedComponent}} left to fend for itself in terms of building a "switch" 
statement to process the {{Map<String,Object}} inputs it might recieve, instead 
of having more type specific sub-classes like {{ManagedCacheComponent extends 
ManagedComponent}} that defines all the get/set methods a 
{{CacheManagerPlugin}} expects to be able to call on any 
{{ManagedCacheComponent}} registered with it – at which point even the 
_registration_ could be staticly typed, w/o the need for indirectly comparing 
Strings to {{ResourceManagerPlugin.getType()}} ?

(Again: I'm not saying this loose coupling is inherently bad and should be 
replaced, I'm saying that I don't have enough grasp on the specifics of the 
hypothetical functionality you want to add later that might take advantage of 
this loose coupling to understand if it's strictly neccessary; so i'm not even 
sure i understand what the right questions to ask are – these are just the 
questions that occur to be diving into the code at random. My gut says using 
more marker interfaces and staticly typed methods will help reduce "human 
error" bugs in the code.)
----
Ignoring the internal API/design, I'm also still confused a little bit about 
how exactly we want/expect a system like this to work and "do stuff" with 
something like a "cache memory" pool – and was hoping for more specifics on 
what you envisioned for the "Cluster Administrator's" UX as pools are 
created/configured/used in story form.

In your Story #1, and subsequent reply to another comment, you mentioned...
{quote}In order to do this we need a control mechanism that is able to adjust 
individual cache sizes per core, based on the total hard limit and the actual 
current "need" of a core, defined as a combination of hit ratio, QPS, and other 
arbitrary quality factors / SLA. This control mechanism also needs to be able 
to forcibly reduce excessive usage ...
 ...
 * the plugin is executed periodically to check the current resource usage of 
all registered caches, using eg. the aggregated value of ramBytesUsed.
 * as a result of this action some of the cache content will be evicted sooner 
and more aggressively than initially configured, thus freeing more RAM.
 * when the memory pressure decreases the CacheManagerPlugin re-adjusts the 
maxRamMB settings of each cache to the initially configured values. ...

...
  
{quote}does that imply that once SolrCache(s) are part of a "pool" they no 
longer have their own max size(s)?
{quote}
They still do - but it's used as the starting point for proportional 
adjustments.
{quote}
You seem to be saying that the {{CacheManagerPlugin}} would only ever reduce 
the {{maxRamMB}} setting of some caches at run time, if/when the sum of 
{{ramBytesUsed}} for all caches exceeds the pools {{maxRamMB}} ... but it seems 
like in order for that to happen, there's one of two unstated implications; 
either:
 * users who want to use these pools need to change the individual cache's 
configured {{maxRamMB}} to be much higher then they are today. (potentially to 
the same value as the {{maxRamMB}} of the pool?)
 * OR: that we only expect the plugin to kick in and affect caches if/when the 
number of _cores_ increases. (as a result of collection creation of autoscaling)

If for example "Bob" has a single core w/3 caches each configured with 
{{maxRamMB=4GB}}, and Bob sets' up a pool for all caches on his system 
configured with {{maxRamMB=12GB}} – then in theory, unless more caches are 
added to the pool (by more cores being created on this node) that pool/plugin 
is never going to adjust the sizes of those caches because the aggregated sum 
of the {{ramBytesUsed}} should never exceed 12GB.

Since I assume the idea is that these pools & {{ResourceManagerPlugins}} will 
be useful under varying _query_ load, and not just for taking action when 
provisioning new collections/cores, I gather the expectation is that when Bob 
decides to enable a {{CacheManagerPlugin}} pool, Bob should configure the 
individual caches in that with (relatively) "high" {{maxRamMB}} sizes, so that 
they could individually use a more/less RAM then eachother and later be 
"reigned in" by the pool's {{CacheManagerPlugin}} ... so perhaps in Bob's 
solrconfig.xml all 3 caches have {{maxRamMB=8GB}}, while the pool has 
{{maxRamMB=12GB}}. If/when the {{CacheManagerPlugin}} detects that Bob's user's 
have filled cacheXX to 5GB, cacheYY to 6GB, and cacheZZ to 2GB, it sees the 
total = 13GB > 12GB and adjusts the {{maxRamMB}} of the individual caches down 
proportionately so that the new {{maxRamMB}} total == 12GB.

But if that's the case, then (based on the typical usage patterns of 
SolrCaches) I don't understand your comment about "when the memory pressure 
decreases" ... how/when can/should a {{CacheManagerPlugin}} assume/recognize 
that the memory pressure has decreased? Because whatever specific new 
{{maxRamMB}} values those caches get configured with, they are going to keep 
using roughly that much RAM (with some small variation in Query/DocSet/DocList 
size) as new requests come in – evicting objects only to replace them with 
(similarly sized) ... there's really no reason to assume/expect the 
{{ramBytesUsed}} of a cache to _decrease_ in a meaningful way once it's full.

(ie: collections/cores that contains mostly small documents don't tend to 
suddenly get a lot of large documents added to them that significantly impact 
the size of the documentCache. collections/cores that have big filterCaches 
don't tend to suddenly stop getting requests with {{fq}} params and no longer 
have a need for a big filterCache, etc...)

So when/how/why exactly would a {{CacheManagerPlugin}} ever "re-adjusts the 
maxRamMB settings of each cache to the initially configured values." ?
----
{quote}consistently use the name "component" instead of the confusing "resource"
{quote}
Hmmm, Did you mean to upload a diff patch? the latests i see (#12975831) still 
contains lots of new class names refering to "Resource" instead of "Component" 
...
{noformat}
$ ls solr/core/src/java/org/apache/solr/managed/*Resource*
solr/core/src/java/org/apache/solr/managed/AbstractResourceManagerPlugin.java
solr/core/src/java/org/apache/solr/managed/DefaultResourceManager.java
solr/core/src/java/org/apache/solr/managed/DefaultResourceManagerPluginFactory.java
solr/core/src/java/org/apache/solr/managed/DefaultResourceManagerPool.java
solr/core/src/java/org/apache/solr/managed/NoOpResourceManager.java
solr/core/src/java/org/apache/solr/managed/ResourceManager.java
solr/core/src/java/org/apache/solr/managed/ResourceManagerPluginFactory.java
solr/core/src/java/org/apache/solr/managed/ResourceManagerPlugin.java
solr/core/src/java/org/apache/solr/managed/ResourceManagerPool.java
{noformat}

> Create resource management API
> ------------------------------
>
>                 Key: SOLR-13579
>                 URL: https://issues.apache.org/jira/browse/SOLR-13579
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Andrzej Bialecki 
>            Assignee: Andrzej Bialecki 
>            Priority: Major
>         Attachments: SOLR-13579.patch, SOLR-13579.patch, SOLR-13579.patch, 
> SOLR-13579.patch, SOLR-13579.patch, SOLR-13579.patch
>
>
> Resource management framework API supporting the goals outlined in SOLR-13578.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-13579) Create resource management API

Reply via email to