Re: Approach for a new Autoscaling framework

Ishan Chattopadhyaya Thu, 23 Jul 2020 06:27:36 -0700

I think we should move the discussion back to the PR because it has more
context and inline comments are possible. Having this discussion in 4
places (jira, pr, slack and dev list is very hard to keep track of).


On Thu, 23 Jul, 2020, 5:57 pm Ilan Ginzburg, <ilans...@gmail.com> wrote:

> [I’m moving a discussion from the PR
> <https://github.com/apache/lucene-solr/pull/1684> for SOLR-14613
> <https://issues.apache.org/jira/browse/SOLR-14613> to the dev list for a
> wider audience. This is about replacing the now (in master) gone
> Autoscaling framework with a way for clients to write their customized
> placement code]
>
> It took me a long time to write this mail and it's quite long, sorry.
> Please anybody interested in the future of Autoscaling (not only those I
> cc'ed) do read it and provide feedback. Very impacting decisions have to be
> made now.
>
> Thanks Noble for your feedback.
> I believe it is important that we are aligned on what we build here, esp.
> at the early defining stages (now).
>
> Let me try to elaborate on your concerns and provide in general the
> rationale behind the approach.
>
> *> Anyone who wishes to implement this should not require to learn a lot
> before even getting started*
> For somebody who knows Solr (what is a Node, Collection, Shard, Replica)
> and basic notions related to Autoscaling (getting variables representing
> current state to make decisions), there’s not much to learn. The framework
> uses the same concepts, often with the same names.
>
> *> I don't believe we should have a set of interfaces that duplicate
> existing classes just for this functionality.*
> Where appropriate we can have existing classes be the implementations for
> these interfaces and be passed to the plugins, that would be perfectly ok.
> The proposal doesn’t include implementations at this stage, therefore
> there’s no duplication, or not yet... (we must get the interfaces right and
> agreed upon before implementation). If some interface methods in the
> proposal have a different name from equivalent methods in internal classes
> we plan to use, of course let's rename one or the other.
>
> Existing internal abstractions are most of the time concrete classes and
> not interfaces (Replica, Slice, DocCollection, ClusterState). Making
> these visible to contrib code living elsewhere is making future refactoring
> hard and contrib code will most likely end up reaching to methods it
> shouldn’t be using. If we define a clean set of interfaces for plugins, I
> wouldn’t hesitate to break external plugins that reach out to other
> internal Solr classes, but will make everything possible to keep the API
> backward compatible so existing plugins can be recompiled without change.
>
> *> 24 interfaces to do this is definitely over engineering*
> I don’t consider the number of classes or interfaces a metric of
> complexity or of engineering quality. There are sample
> <https://github.com/apache/lucene-solr/pull/1684/files#diff-ddbe185b5e7922b91b90dfabfc50df4c>
> plugin implementations to serve as a base for plugin writers (and for us
> defining this framework) and I believe the process is relatively simple.
> Trying to do the same things with existing Solr classes might prove a lot
> harder (but might be worth the effort for comparison purposes to make sure
> we agree on the approach? For example, getting sister replicas of a given
> replica in the proposed API is: replica.getShard()
> <https://github.com/apache/lucene-solr/pull/1684/files#diff-a2d49bd52fddde54bb7fd2e96238507eR27>
> .getReplicas()
> <https://github.com/apache/lucene-solr/pull/1684/files#diff-9633f5e169fa3095062451599daac213R31>.
> Doing so with the internal classes likely involves getting the
> DocCollection and Slice name from the Replica, then get the DocCollection
> from the cluster state, there get the Slice based on its name and finally
> getReplicas() from the Slice). I consider the role of this new framework
> is to make life as easy as possible for writing placement code and the
> like, make life easy for us to maintain it, make it easy to write a
> simulation engine (should be at least an order of magnitude simpler than
> the previous one), etc.
>
> An example regarding readability and number of interfaces: rather than
> defining an enum with runtime annotation for building its instances (
> Variable.Type
> <https://github.com/apache/lucene-solr/blob/branch_8_6/solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/Variable.java#L98>)
> and then very generic access methods, the proposal defines a specific
> interface for each “variable type” (called properties
> <https://github.com/apache/lucene-solr/pull/1684/files#diff-4c0fa84354f93cb00e6643aefd00fd3c>).
> Rather than concatenating strings to specify the data to return from a
> remote node (based on snitches
> <https://github.com/apache/lucene-solr/blame/branch_8_6/solr/core/src/java/org/apache/solr/cloud/rule/ImplicitSnitch.java#L60>,
> see doc
> <https://lucene.apache.org/solr/guide/8_1/solrcloud-autoscaling-policy-preferences.html#node-selector>),
> the proposal is explicit and strongly typed (here
> <https://github.com/apache/lucene-solr/pull/1684/files#diff-4ec32958f54ec8e1f7e2d5ce8de331bb>
>  example
> to get a specific system property from a node). This definitely does
> increase the number of interfaces, but reduces IMO the effort to code to
> these abstractions and provides a lot more compile time and IDE assistance.
>
> Goal is to hide all the boilerplate code and machinery (and to a point -
> complexity) in the implementations of these interfaces rather than have
> each plugin writer deal with the same problems.
>
> We’re moving from something that was complex and hard to read and debug
> yet functionally extremely rich, to something simpler for us, more
> demanding for users (write code rather than policy config if there's a need
> for new behavior) but that should not be less "expressive" in any
> significant way. One could even imagine reimplementing the former
> Autoscaling config Domain Specific Language on top of these API (maybe as a
> summer internship project :)
>
> *> This is a common mistake that we all do. When we design a feature we
> think that is the most important thing.*
> If by *"most important thing"* you mean investing the best reasonable
> effort to do things right then yes.
> If you mean trying to make a minor feature look more important and
> inflated than it is, I disagree.
> As a personal note, replica placement is not the aspect of SolrCloud I'm
> most interested in, but the first bottleneck we hit when pushing the scale
> of SolrCloud. I approach this with a state of mind "let's do it right and
> get it out of the way" to move to topics I really want to work on (around
> distribution in SolrCloud and the role of Overseer). Implementing
> Autoscaling in a way that simplifies future refactoring (or that does not
> make them harder than they already are) is therefore *very high* on my
> priority list, to support modest changes (Slice to Shard renaming) and
> more ambitious ones (replacing Zookeeper, removing Overseer, you name it).
>
> Thanks for reading, again sorry for the long email, but I hope this helps
> (at least helps the discussion),
> Ilan
>
>
> On Thu 23 Jul 2020 at 08:16, Noble Paul <notificati...@github.com> wrote:
>
>> I don't believe we should have a set of interfaces that duplicate
>> existing classes just for this functionality. This is a common mistake that
>> we all do. When we design a feature we think that is the most important
>> thing. We endup over designing and over engineering things. This feature
>> will remain a tiny part of Solr. Anyone who wishes to implement this should
>> not require to learn a lot before even getting started. Let's try to have a
>> minimal set of interfaces so that people who try to implement them do not
>> have a huge learning cure.
>>
>> Let's try to understand the requirement
>>
>>    - Solr wants a set of positions to place a few replicas
>>    - The implementation wants to know what is the current state of the
>>    cluster so that it can make those decisions
>>
>> 24 interfaces to do this is definitely over engineering
>>
>> —
>> You are receiving this because you authored the thread.
>> Reply to this email directly, view it on GitHub
>> <https://github.com/apache/lucene-solr/pull/1684#issuecomment-662837142>,
>> or unsubscribe
>> <https://github.com/notifications/unsubscribe-auth/AKIOMCFT5GU2II347GZ4HTTR47IVTANCNFSM4PC3HDKQ>
>> .
>>
>
>

Re: Approach for a new Autoscaling framework

Reply via email to