Re: Approach for a new Autoscaling framework

Varun Thacker Sat, 25 Jul 2020 19:13:11 -0700

Hi Ilan,

I like where we're going with
https://github.com/apache/lucene-solr/pull/1684 . Correct me if I am wrong,
but my understanding of this PR is we're defining the interfaces for
creating policies


What's not clear to me is how will existing collection APIs like
create-collections/add-replica etc make use of it? Is that something that
has been discussed somewhere that I could read up on?



On Sat, Jul 25, 2020 at 2:03 PM Ilan Ginzburg <ilans...@gmail.com> wrote:

> Thanks Gus!
> This makes a lot of sense but significantly increases IMO the scope and
> effort to define an "Autoscaling" framework interface.
>
> I'd be happy to try to see what concepts could be shared and how a generic
> plugin facade could be defined.
>
> What are the other types of plugins that would share such a unified
> approach? Do they already exist under another form or are just projects at
> this stage, like Autoscaling plugins?
>
> But... Assuming this is the first "facade" layer to be defined between
> Solr and external code, it might be hard to make it generic and get it
> right. There's value in starting simple, understanding the tradeoffs and
> generalizing later.
>
> Also I'd like to make sure we're not paying a performance "genericity tax"
> in Autoscaling for unneeded features.
>
> Ilan
>
> Le sam. 25 juil. 2020 à 16:02, Gus Heck <gus.h...@gmail.com> a écrit :
>
>> Scanned through the PR and read some of this thread. I likely have missed
>> much other discussion, so forgive me if I'm dredging up somethings that are
>> already discussed elsewhere.
>>
>> The idea of designing the interfaces defining what information is
>> available seems good here, but I worry that it's too auto-scaling focused.
>> In my imagination, I would see solr having a standard informational
>> interface that is useful to any plugin of any sort. Autoscaling should be
>> leveraging that and we should be enhancing that to enable autoscaling. The
>> current state of  the system is one key type of information, but another
>> type of information that should exist within solr and be exposed to plugins
>> (including autoscaling) is events. When a new node joins there should be an
>> event for example so that plugins can listen for that rather than
>> incessantly polling and comparing the list of 100 nodes to a cached list of
>> 100 nodes.
>>
>> In the PR I see a bunch of classes all off in a separate package, which
>> looks like an autoscaling fiefdom which will be tempted if not forced to
>> duplicate lots of stuff relative to other plugins and/or core.
>>
>> As a side note I would think the metrics system could be a plugin that
>> leverages the same set of informational interfaces....
>>
>> So there should be 3 parts to this as I imagine it.
>>
>> 1) Enhancements to the **plugin system** that make information about the
>> cluster available solr to ALL plugins
>> 2) Enhancements to the **plugin system** API's provided to ALL plugins
>> that allow them to mutate solr safely.
>> 3) A plugin that we intend to support for our users currently using auto
>> scaling utilizes the enhanced information to provide a similar level of
>> functionality as is *promised* by our current documentation of autoscaling,
>> there might be some gaps or differences but we should be discussing what
>> they are and providing recommended workarounds for users that relied on
>> those promises to the users. Even if there were cases where we failed to
>> deliver, if there were at least some conditions under which we could
>> deliver the promised functionality those should be supported. Only if we
>> never were able to deliver and it never worked under any circumstance
>> should we rip stuff out entirely.
>>
>> Implicit in the above is the concept that there should be a facade
>> between plugins and the core of solr.
>>
>> WRT #1 which will necessarily involve information collected from remote
>> nodes, we need to be designing that thinking about what informational
>> guarantees it provides. Latency, consistency, delivery, etc. We also need
>> to think about what is exposed in a read-only fashion vs what plugins might
>> write back to solr. Certainly there will be a lot of information that most
>> plugins ignore, and we might consider having groupings of information and
>> interfaces or annotations that indicate what info is provided, but the
>> simplest default state is to just give plugins a reference to a class that
>> they can use to drill into information about the cluster as needed.
>> (SolrInformationBooth? ... or less tongue in cheek...  enhance
>> SolrInfoBean? )
>>
>> Finally a fourth thing that occurs to me as I write is we need to
>> consider what information one plugin might make available to the rest of
>> the solr plugins. This might come later, and is hard because it's very hard
>> to anticipate what info might be generated by unknown plugins in the future.
>>
>> So some humorous, not seriously suggested but hopefully memorable class
>> names encapsulating the concepts:
>>
>> SolrInformationBooth (place to query)
>> SolrLoudspeaker (event announcements)
>> SolrControlLevers (mutate solr cluster)
>> SolrPluginFacebookPage (info published by the plugin that others can
>> watch)
>>
>> The "facade" provided to plugins by the plugin system should grow and
>> expand such that more and more plugins can rely on it. This effort should
>> grow it enough to move autoscaling onto it without dropping (much)
>> functionality that we've previously published.
>>
>> -Gus
>>
>> On Fri, Jul 24, 2020 at 4:40 PM Jan Høydahl <jan....@cominvent.com>
>> wrote:
>>
>>> Not clear to me what type of "alternative proposal" you're thinking of
>>> Jan
>>>
>>>
>>> That would be the responsibility of Noble and others who have concerns
>>> to detail - and try convince other peers.
>>> It’s hard for me as a spectator to know whether to agree with Noble
>>> without a clear picture of what the alternative API or approach would look
>>> like.
>>> I’m often a fan of loosely typed APIs since they tend to cause less
>>> boilerplate code, but strong typing may indeed be a sound choice in this
>>> API.
>>>
>>> Jan Høydahl
>>>
>>> 24. jul. 2020 kl. 01:44 skrev Ilan Ginzburg <ilans...@gmail.com>:
>>>
>>> 
>>> In my opinion we have to (and therefore will) ship at least a basic prod
>>> ready implementation on top of the API that does simple things (not sure
>>> about rack, but for example balance cores and disk size without co locating
>>> replicas of same shard on same node).
>>> Without such an implementation, I suspect adoption will be low.
>>> Moreover, it's always a lot more friendly to start coding from a working
>>> example than from scratch.
>>>
>>> Not clear to me what type of "alternative proposal" you're thinking of
>>> Jan. Alternative API proposal? Alternative approach to replace Autoscaling?
>>>
>>> Ilan
>>>
>>> Ilan
>>>
>>> On Fri, Jul 24, 2020 at 12:11 AM Jan Høydahl <jan....@cominvent.com>
>>> wrote:
>>>
>>>> Important discussion indeed.
>>>>
>>>> I don’t have time to dive deep into the PR or make up my mind whether
>>>> there is a simpler and more future proof way of designing these APIs. But I
>>>> understand that autoscaling is a complex beast and it is important we get
>>>> it right.
>>>>
>>>> One question regarding having to write code vs config. Is the plan to
>>>> ship some very simple light weight default placement rules ootb that gives
>>>> 80% of users what they need with simple config, or would every user need to
>>>> write code to e.g. spread replicas across hosts/racks? I’d be interested in
>>>> seeing an alternative proposal laid out, perhaps not in code but with a
>>>> design that can be compared and discussed.
>>>>
>>>> Jan Høydahl
>>>>
>>>> 23. jul. 2020 kl. 17:53 skrev Houston Putman <houstonput...@gmail.com>:
>>>>
>>>> 
>>>> I think this is a valid thing to discuss on the dev list, since this
>>>> isn't just about code comments.
>>>> It seems to me that Ilan wants to discuss the philosophy around how to
>>>> design plugins and the interfaces in Solr which the plugins will talk to.
>>>> This is broad and affects much more than just the Autoscaling
>>>> framework.
>>>>
>>>> As a community & product, we have so far agreed that Solr should be
>>>> lighter weight and additional features should live in plugins that are
>>>> managed separately from Solr itself.
>>>> At that point we need to think about the lifetime and support of these
>>>> plugins. People love to refactor stuff in the solr core, which before
>>>> plugins wasn't a large issue.
>>>> However if we are now intending for many customers to rely on plugins,
>>>> then we need to come up with standards and guarantees so that these plugins
>>>> don't:
>>>>
>>>>    - Stall people from upgrading Solr (minor or major versions)
>>>>    - Hinder the development of Solr Core
>>>>    - Cause us more headaches trying to keep multiple repos of plugins
>>>>    up to date with recent versions of Solr
>>>>
>>>>
>>>> I am not completely sure where I stand right now, but this is
>>>> definitely something that we should be thinking about when migrating all of
>>>> this functionality to plugins.
>>>>
>>>> - Houston
>>>>
>>>> On Thu, Jul 23, 2020 at 9:27 AM Ishan Chattopadhyaya <is...@apache.org>
>>>> wrote:
>>>>
>>>>> I think we should move the discussion back to the PR because it has
>>>>> more context and inline comments are possible. Having this discussion in 4
>>>>> places (jira, pr, slack and dev list is very hard to keep track of).
>>>>>
>>>>> On Thu, 23 Jul, 2020, 5:57 pm Ilan Ginzburg, <ilans...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> [I’m moving a discussion from the PR
>>>>>> <https://github.com/apache/lucene-solr/pull/1684> for SOLR-14613
>>>>>> <https://issues.apache.org/jira/browse/SOLR-14613> to the dev list
>>>>>> for a wider audience. This is about replacing the now (in master) gone
>>>>>> Autoscaling framework with a way for clients to write their customized
>>>>>> placement code]
>>>>>>
>>>>>> It took me a long time to write this mail and it's quite long, sorry.
>>>>>> Please anybody interested in the future of Autoscaling (not only
>>>>>> those I cc'ed) do read it and provide feedback. Very impacting decisions
>>>>>> have to be made now.
>>>>>>
>>>>>> Thanks Noble for your feedback.
>>>>>> I believe it is important that we are aligned on what we build here,
>>>>>> esp. at the early defining stages (now).
>>>>>>
>>>>>> Let me try to elaborate on your concerns and provide in general the
>>>>>> rationale behind the approach.
>>>>>>
>>>>>> *> Anyone who wishes to implement this should not require to learn a
>>>>>> lot before even getting started*
>>>>>> For somebody who knows Solr (what is a Node, Collection, Shard,
>>>>>> Replica) and basic notions related to Autoscaling (getting variables
>>>>>> representing current state to make decisions), there’s not much to learn.
>>>>>> The framework uses the same concepts, often with the same names.
>>>>>>
>>>>>> *> I don't believe we should have a set of interfaces that duplicate
>>>>>> existing classes just for this functionality.*
>>>>>> Where appropriate we can have existing classes be the implementations
>>>>>> for these interfaces and be passed to the plugins, that would be 
>>>>>> perfectly
>>>>>> ok. The proposal doesn’t include implementations at this stage, therefore
>>>>>> there’s no duplication, or not yet... (we must get the interfaces right 
>>>>>> and
>>>>>> agreed upon before implementation). If some interface methods in the
>>>>>> proposal have a different name from equivalent methods in internal 
>>>>>> classes
>>>>>> we plan to use, of course let's rename one or the other.
>>>>>>
>>>>>> Existing internal abstractions are most of the time concrete classes
>>>>>> and not interfaces (Replica, Slice, DocCollection, ClusterState).
>>>>>> Making these visible to contrib code living elsewhere is making future
>>>>>> refactoring hard and contrib code will most likely end up reaching to
>>>>>> methods it shouldn’t be using. If we define a clean set of interfaces for
>>>>>> plugins, I wouldn’t hesitate to break external plugins that reach out to
>>>>>> other internal Solr classes, but will make everything possible to keep 
>>>>>> the
>>>>>> API backward compatible so existing plugins can be recompiled without
>>>>>> change.
>>>>>>
>>>>>> *> 24 interfaces to do this is definitely over engineering*
>>>>>> I don’t consider the number of classes or interfaces a metric of
>>>>>> complexity or of engineering quality. There are sample
>>>>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-ddbe185b5e7922b91b90dfabfc50df4c>
>>>>>> plugin implementations to serve as a base for plugin writers (and for us
>>>>>> defining this framework) and I believe the process is relatively simple.
>>>>>> Trying to do the same things with existing Solr classes might prove a lot
>>>>>> harder (but might be worth the effort for comparison purposes to make 
>>>>>> sure
>>>>>> we agree on the approach? For example, getting sister replicas of a given
>>>>>> replica in the proposed API is: replica.getShard()
>>>>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-a2d49bd52fddde54bb7fd2e96238507eR27>
>>>>>> .getReplicas()
>>>>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-9633f5e169fa3095062451599daac213R31>.
>>>>>> Doing so with the internal classes likely involves getting the
>>>>>> DocCollection and Slice name from the Replica, then get the
>>>>>> DocCollection from the cluster state, there get the Slice based on
>>>>>> its name and finally getReplicas() from the Slice). I consider the
>>>>>> role of this new framework is to make life as easy as possible for 
>>>>>> writing
>>>>>> placement code and the like, make life easy for us to maintain it, make 
>>>>>> it
>>>>>> easy to write a simulation engine (should be at least an order of 
>>>>>> magnitude
>>>>>> simpler than the previous one), etc.
>>>>>>
>>>>>> An example regarding readability and number of interfaces: rather
>>>>>> than defining an enum with runtime annotation for building its instances 
>>>>>> (
>>>>>> Variable.Type
>>>>>> <https://github.com/apache/lucene-solr/blob/branch_8_6/solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/Variable.java#L98>)
>>>>>> and then very generic access methods, the proposal defines a specific
>>>>>> interface for each “variable type” (called properties
>>>>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-4c0fa84354f93cb00e6643aefd00fd3c>).
>>>>>> Rather than concatenating strings to specify the data to return from a
>>>>>> remote node (based on snitches
>>>>>> <https://github.com/apache/lucene-solr/blame/branch_8_6/solr/core/src/java/org/apache/solr/cloud/rule/ImplicitSnitch.java#L60>,
>>>>>> see doc
>>>>>> <https://lucene.apache.org/solr/guide/8_1/solrcloud-autoscaling-policy-preferences.html#node-selector>),
>>>>>> the proposal is explicit and strongly typed (here
>>>>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-4ec32958f54ec8e1f7e2d5ce8de331bb>
>>>>>>  example
>>>>>> to get a specific system property from a node). This definitely does
>>>>>> increase the number of interfaces, but reduces IMO the effort to code to
>>>>>> these abstractions and provides a lot more compile time and IDE 
>>>>>> assistance.
>>>>>>
>>>>>> Goal is to hide all the boilerplate code and machinery (and to a
>>>>>> point - complexity) in the implementations of these interfaces rather 
>>>>>> than
>>>>>> have each plugin writer deal with the same problems.
>>>>>>
>>>>>> We’re moving from something that was complex and hard to read and
>>>>>> debug yet functionally extremely rich, to something simpler for us, more
>>>>>> demanding for users (write code rather than policy config if there's a 
>>>>>> need
>>>>>> for new behavior) but that should not be less "expressive" in any
>>>>>> significant way. One could even imagine reimplementing the former
>>>>>> Autoscaling config Domain Specific Language on top of these API (maybe 
>>>>>> as a
>>>>>> summer internship project :)
>>>>>>
>>>>>> *> This is a common mistake that we all do. When we design a feature
>>>>>> we think that is the most important thing.*
>>>>>> If by *"most important thing"* you mean investing the best
>>>>>> reasonable effort to do things right then yes.
>>>>>> If you mean trying to make a minor feature look more important and
>>>>>> inflated than it is, I disagree.
>>>>>> As a personal note, replica placement is not the aspect of SolrCloud
>>>>>> I'm most interested in, but the first bottleneck we hit when pushing the
>>>>>> scale of SolrCloud. I approach this with a state of mind "let's do it 
>>>>>> right
>>>>>> and get it out of the way" to move to topics I really want to work on
>>>>>> (around distribution in SolrCloud and the role of Overseer). Implementing
>>>>>> Autoscaling in a way that simplifies future refactoring (or that does not
>>>>>> make them harder than they already are) is therefore *very high* on
>>>>>> my priority list, to support modest changes (Slice to Shard
>>>>>> renaming) and more ambitious ones (replacing Zookeeper, removing 
>>>>>> Overseer,
>>>>>> you name it).
>>>>>>
>>>>>> Thanks for reading, again sorry for the long email, but I hope this
>>>>>> helps (at least helps the discussion),
>>>>>> Ilan
>>>>>>
>>>>>>
>>>>>> On Thu 23 Jul 2020 at 08:16, Noble Paul <notificati...@github.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I don't believe we should have a set of interfaces that duplicate
>>>>>>> existing classes just for this functionality. This is a common mistake 
>>>>>>> that
>>>>>>> we all do. When we design a feature we think that is the most important
>>>>>>> thing. We endup over designing and over engineering things. This feature
>>>>>>> will remain a tiny part of Solr. Anyone who wishes to implement this 
>>>>>>> should
>>>>>>> not require to learn a lot before even getting started. Let's try to 
>>>>>>> have a
>>>>>>> minimal set of interfaces so that people who try to implement them do 
>>>>>>> not
>>>>>>> have a huge learning cure.
>>>>>>>
>>>>>>> Let's try to understand the requirement
>>>>>>>
>>>>>>>    - Solr wants a set of positions to place a few replicas
>>>>>>>    - The implementation wants to know what is the current state of
>>>>>>>    the cluster so that it can make those decisions
>>>>>>>
>>>>>>> 24 interfaces to do this is definitely over engineering
>>>>>>>
>>>>>>> —
>>>>>>> You are receiving this because you authored the thread.
>>>>>>> Reply to this email directly, view it on GitHub
>>>>>>> <https://github.com/apache/lucene-solr/pull/1684#issuecomment-662837142>,
>>>>>>> or unsubscribe
>>>>>>> <https://github.com/notifications/unsubscribe-auth/AKIOMCFT5GU2II347GZ4HTTR47IVTANCNFSM4PC3HDKQ>
>>>>>>> .
>>>>>>>
>>>>>>
>>>>>>
>>
>> --
>> http://www.needhamsoftware.com (work)
>> http://www.the111shift.com (play)
>>
>

Re: Approach for a new Autoscaling framework

Reply via email to