I think we should move the discussion back to the PR because it has more context and inline comments are possible. Having this discussion in 4 places (jira, pr, slack and dev list is very hard to keep track of).
On Thu, 23 Jul, 2020, 5:57 pm Ilan Ginzburg, <ilans...@gmail.com> wrote: > [I’m moving a discussion from the PR > <https://github.com/apache/lucene-solr/pull/1684> for SOLR-14613 > <https://issues.apache.org/jira/browse/SOLR-14613> to the dev list for a > wider audience. This is about replacing the now (in master) gone > Autoscaling framework with a way for clients to write their customized > placement code] > > It took me a long time to write this mail and it's quite long, sorry. > Please anybody interested in the future of Autoscaling (not only those I > cc'ed) do read it and provide feedback. Very impacting decisions have to be > made now. > > Thanks Noble for your feedback. > I believe it is important that we are aligned on what we build here, esp. > at the early defining stages (now). > > Let me try to elaborate on your concerns and provide in general the > rationale behind the approach. > > *> Anyone who wishes to implement this should not require to learn a lot > before even getting started* > For somebody who knows Solr (what is a Node, Collection, Shard, Replica) > and basic notions related to Autoscaling (getting variables representing > current state to make decisions), there’s not much to learn. The framework > uses the same concepts, often with the same names. > > *> I don't believe we should have a set of interfaces that duplicate > existing classes just for this functionality.* > Where appropriate we can have existing classes be the implementations for > these interfaces and be passed to the plugins, that would be perfectly ok. > The proposal doesn’t include implementations at this stage, therefore > there’s no duplication, or not yet... (we must get the interfaces right and > agreed upon before implementation). If some interface methods in the > proposal have a different name from equivalent methods in internal classes > we plan to use, of course let's rename one or the other. > > Existing internal abstractions are most of the time concrete classes and > not interfaces (Replica, Slice, DocCollection, ClusterState). Making > these visible to contrib code living elsewhere is making future refactoring > hard and contrib code will most likely end up reaching to methods it > shouldn’t be using. If we define a clean set of interfaces for plugins, I > wouldn’t hesitate to break external plugins that reach out to other > internal Solr classes, but will make everything possible to keep the API > backward compatible so existing plugins can be recompiled without change. > > *> 24 interfaces to do this is definitely over engineering* > I don’t consider the number of classes or interfaces a metric of > complexity or of engineering quality. There are sample > <https://github.com/apache/lucene-solr/pull/1684/files#diff-ddbe185b5e7922b91b90dfabfc50df4c> > plugin implementations to serve as a base for plugin writers (and for us > defining this framework) and I believe the process is relatively simple. > Trying to do the same things with existing Solr classes might prove a lot > harder (but might be worth the effort for comparison purposes to make sure > we agree on the approach? For example, getting sister replicas of a given > replica in the proposed API is: replica.getShard() > <https://github.com/apache/lucene-solr/pull/1684/files#diff-a2d49bd52fddde54bb7fd2e96238507eR27> > .getReplicas() > <https://github.com/apache/lucene-solr/pull/1684/files#diff-9633f5e169fa3095062451599daac213R31>. > Doing so with the internal classes likely involves getting the > DocCollection and Slice name from the Replica, then get the DocCollection > from the cluster state, there get the Slice based on its name and finally > getReplicas() from the Slice). I consider the role of this new framework > is to make life as easy as possible for writing placement code and the > like, make life easy for us to maintain it, make it easy to write a > simulation engine (should be at least an order of magnitude simpler than > the previous one), etc. > > An example regarding readability and number of interfaces: rather than > defining an enum with runtime annotation for building its instances ( > Variable.Type > <https://github.com/apache/lucene-solr/blob/branch_8_6/solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/Variable.java#L98>) > and then very generic access methods, the proposal defines a specific > interface for each “variable type” (called properties > <https://github.com/apache/lucene-solr/pull/1684/files#diff-4c0fa84354f93cb00e6643aefd00fd3c>). > Rather than concatenating strings to specify the data to return from a > remote node (based on snitches > <https://github.com/apache/lucene-solr/blame/branch_8_6/solr/core/src/java/org/apache/solr/cloud/rule/ImplicitSnitch.java#L60>, > see doc > <https://lucene.apache.org/solr/guide/8_1/solrcloud-autoscaling-policy-preferences.html#node-selector>), > the proposal is explicit and strongly typed (here > <https://github.com/apache/lucene-solr/pull/1684/files#diff-4ec32958f54ec8e1f7e2d5ce8de331bb> > example > to get a specific system property from a node). This definitely does > increase the number of interfaces, but reduces IMO the effort to code to > these abstractions and provides a lot more compile time and IDE assistance. > > Goal is to hide all the boilerplate code and machinery (and to a point - > complexity) in the implementations of these interfaces rather than have > each plugin writer deal with the same problems. > > We’re moving from something that was complex and hard to read and debug > yet functionally extremely rich, to something simpler for us, more > demanding for users (write code rather than policy config if there's a need > for new behavior) but that should not be less "expressive" in any > significant way. One could even imagine reimplementing the former > Autoscaling config Domain Specific Language on top of these API (maybe as a > summer internship project :) > > *> This is a common mistake that we all do. When we design a feature we > think that is the most important thing.* > If by *"most important thing"* you mean investing the best reasonable > effort to do things right then yes. > If you mean trying to make a minor feature look more important and > inflated than it is, I disagree. > As a personal note, replica placement is not the aspect of SolrCloud I'm > most interested in, but the first bottleneck we hit when pushing the scale > of SolrCloud. I approach this with a state of mind "let's do it right and > get it out of the way" to move to topics I really want to work on (around > distribution in SolrCloud and the role of Overseer). Implementing > Autoscaling in a way that simplifies future refactoring (or that does not > make them harder than they already are) is therefore *very high* on my > priority list, to support modest changes (Slice to Shard renaming) and > more ambitious ones (replacing Zookeeper, removing Overseer, you name it). > > Thanks for reading, again sorry for the long email, but I hope this helps > (at least helps the discussion), > Ilan > > > On Thu 23 Jul 2020 at 08:16, Noble Paul <notificati...@github.com> wrote: > >> I don't believe we should have a set of interfaces that duplicate >> existing classes just for this functionality. This is a common mistake that >> we all do. When we design a feature we think that is the most important >> thing. We endup over designing and over engineering things. This feature >> will remain a tiny part of Solr. Anyone who wishes to implement this should >> not require to learn a lot before even getting started. Let's try to have a >> minimal set of interfaces so that people who try to implement them do not >> have a huge learning cure. >> >> Let's try to understand the requirement >> >> - Solr wants a set of positions to place a few replicas >> - The implementation wants to know what is the current state of the >> cluster so that it can make those decisions >> >> 24 interfaces to do this is definitely over engineering >> >> — >> You are receiving this because you authored the thread. >> Reply to this email directly, view it on GitHub >> <https://github.com/apache/lucene-solr/pull/1684#issuecomment-662837142>, >> or unsubscribe >> <https://github.com/notifications/unsubscribe-auth/AKIOMCFT5GU2II347GZ4HTTR47IVTANCNFSM4PC3HDKQ> >> . >> > >