Hi Ilan, I like where we're going with https://github.com/apache/lucene-solr/pull/1684 . Correct me if I am wrong, but my understanding of this PR is we're defining the interfaces for creating policies
What's not clear to me is how will existing collection APIs like create-collections/add-replica etc make use of it? Is that something that has been discussed somewhere that I could read up on? On Sat, Jul 25, 2020 at 2:03 PM Ilan Ginzburg <ilans...@gmail.com> wrote: > Thanks Gus! > This makes a lot of sense but significantly increases IMO the scope and > effort to define an "Autoscaling" framework interface. > > I'd be happy to try to see what concepts could be shared and how a generic > plugin facade could be defined. > > What are the other types of plugins that would share such a unified > approach? Do they already exist under another form or are just projects at > this stage, like Autoscaling plugins? > > But... Assuming this is the first "facade" layer to be defined between > Solr and external code, it might be hard to make it generic and get it > right. There's value in starting simple, understanding the tradeoffs and > generalizing later. > > Also I'd like to make sure we're not paying a performance "genericity tax" > in Autoscaling for unneeded features. > > Ilan > > Le sam. 25 juil. 2020 à 16:02, Gus Heck <gus.h...@gmail.com> a écrit : > >> Scanned through the PR and read some of this thread. I likely have missed >> much other discussion, so forgive me if I'm dredging up somethings that are >> already discussed elsewhere. >> >> The idea of designing the interfaces defining what information is >> available seems good here, but I worry that it's too auto-scaling focused. >> In my imagination, I would see solr having a standard informational >> interface that is useful to any plugin of any sort. Autoscaling should be >> leveraging that and we should be enhancing that to enable autoscaling. The >> current state of the system is one key type of information, but another >> type of information that should exist within solr and be exposed to plugins >> (including autoscaling) is events. When a new node joins there should be an >> event for example so that plugins can listen for that rather than >> incessantly polling and comparing the list of 100 nodes to a cached list of >> 100 nodes. >> >> In the PR I see a bunch of classes all off in a separate package, which >> looks like an autoscaling fiefdom which will be tempted if not forced to >> duplicate lots of stuff relative to other plugins and/or core. >> >> As a side note I would think the metrics system could be a plugin that >> leverages the same set of informational interfaces.... >> >> So there should be 3 parts to this as I imagine it. >> >> 1) Enhancements to the **plugin system** that make information about the >> cluster available solr to ALL plugins >> 2) Enhancements to the **plugin system** API's provided to ALL plugins >> that allow them to mutate solr safely. >> 3) A plugin that we intend to support for our users currently using auto >> scaling utilizes the enhanced information to provide a similar level of >> functionality as is *promised* by our current documentation of autoscaling, >> there might be some gaps or differences but we should be discussing what >> they are and providing recommended workarounds for users that relied on >> those promises to the users. Even if there were cases where we failed to >> deliver, if there were at least some conditions under which we could >> deliver the promised functionality those should be supported. Only if we >> never were able to deliver and it never worked under any circumstance >> should we rip stuff out entirely. >> >> Implicit in the above is the concept that there should be a facade >> between plugins and the core of solr. >> >> WRT #1 which will necessarily involve information collected from remote >> nodes, we need to be designing that thinking about what informational >> guarantees it provides. Latency, consistency, delivery, etc. We also need >> to think about what is exposed in a read-only fashion vs what plugins might >> write back to solr. Certainly there will be a lot of information that most >> plugins ignore, and we might consider having groupings of information and >> interfaces or annotations that indicate what info is provided, but the >> simplest default state is to just give plugins a reference to a class that >> they can use to drill into information about the cluster as needed. >> (SolrInformationBooth? ... or less tongue in cheek... enhance >> SolrInfoBean? ) >> >> Finally a fourth thing that occurs to me as I write is we need to >> consider what information one plugin might make available to the rest of >> the solr plugins. This might come later, and is hard because it's very hard >> to anticipate what info might be generated by unknown plugins in the future. >> >> So some humorous, not seriously suggested but hopefully memorable class >> names encapsulating the concepts: >> >> SolrInformationBooth (place to query) >> SolrLoudspeaker (event announcements) >> SolrControlLevers (mutate solr cluster) >> SolrPluginFacebookPage (info published by the plugin that others can >> watch) >> >> The "facade" provided to plugins by the plugin system should grow and >> expand such that more and more plugins can rely on it. This effort should >> grow it enough to move autoscaling onto it without dropping (much) >> functionality that we've previously published. >> >> -Gus >> >> On Fri, Jul 24, 2020 at 4:40 PM Jan Høydahl <jan....@cominvent.com> >> wrote: >> >>> Not clear to me what type of "alternative proposal" you're thinking of >>> Jan >>> >>> >>> That would be the responsibility of Noble and others who have concerns >>> to detail - and try convince other peers. >>> It’s hard for me as a spectator to know whether to agree with Noble >>> without a clear picture of what the alternative API or approach would look >>> like. >>> I’m often a fan of loosely typed APIs since they tend to cause less >>> boilerplate code, but strong typing may indeed be a sound choice in this >>> API. >>> >>> Jan Høydahl >>> >>> 24. jul. 2020 kl. 01:44 skrev Ilan Ginzburg <ilans...@gmail.com>: >>> >>> >>> In my opinion we have to (and therefore will) ship at least a basic prod >>> ready implementation on top of the API that does simple things (not sure >>> about rack, but for example balance cores and disk size without co locating >>> replicas of same shard on same node). >>> Without such an implementation, I suspect adoption will be low. >>> Moreover, it's always a lot more friendly to start coding from a working >>> example than from scratch. >>> >>> Not clear to me what type of "alternative proposal" you're thinking of >>> Jan. Alternative API proposal? Alternative approach to replace Autoscaling? >>> >>> Ilan >>> >>> Ilan >>> >>> On Fri, Jul 24, 2020 at 12:11 AM Jan Høydahl <jan....@cominvent.com> >>> wrote: >>> >>>> Important discussion indeed. >>>> >>>> I don’t have time to dive deep into the PR or make up my mind whether >>>> there is a simpler and more future proof way of designing these APIs. But I >>>> understand that autoscaling is a complex beast and it is important we get >>>> it right. >>>> >>>> One question regarding having to write code vs config. Is the plan to >>>> ship some very simple light weight default placement rules ootb that gives >>>> 80% of users what they need with simple config, or would every user need to >>>> write code to e.g. spread replicas across hosts/racks? I’d be interested in >>>> seeing an alternative proposal laid out, perhaps not in code but with a >>>> design that can be compared and discussed. >>>> >>>> Jan Høydahl >>>> >>>> 23. jul. 2020 kl. 17:53 skrev Houston Putman <houstonput...@gmail.com>: >>>> >>>> >>>> I think this is a valid thing to discuss on the dev list, since this >>>> isn't just about code comments. >>>> It seems to me that Ilan wants to discuss the philosophy around how to >>>> design plugins and the interfaces in Solr which the plugins will talk to. >>>> This is broad and affects much more than just the Autoscaling >>>> framework. >>>> >>>> As a community & product, we have so far agreed that Solr should be >>>> lighter weight and additional features should live in plugins that are >>>> managed separately from Solr itself. >>>> At that point we need to think about the lifetime and support of these >>>> plugins. People love to refactor stuff in the solr core, which before >>>> plugins wasn't a large issue. >>>> However if we are now intending for many customers to rely on plugins, >>>> then we need to come up with standards and guarantees so that these plugins >>>> don't: >>>> >>>> - Stall people from upgrading Solr (minor or major versions) >>>> - Hinder the development of Solr Core >>>> - Cause us more headaches trying to keep multiple repos of plugins >>>> up to date with recent versions of Solr >>>> >>>> >>>> I am not completely sure where I stand right now, but this is >>>> definitely something that we should be thinking about when migrating all of >>>> this functionality to plugins. >>>> >>>> - Houston >>>> >>>> On Thu, Jul 23, 2020 at 9:27 AM Ishan Chattopadhyaya <is...@apache.org> >>>> wrote: >>>> >>>>> I think we should move the discussion back to the PR because it has >>>>> more context and inline comments are possible. Having this discussion in 4 >>>>> places (jira, pr, slack and dev list is very hard to keep track of). >>>>> >>>>> On Thu, 23 Jul, 2020, 5:57 pm Ilan Ginzburg, <ilans...@gmail.com> >>>>> wrote: >>>>> >>>>>> [I’m moving a discussion from the PR >>>>>> <https://github.com/apache/lucene-solr/pull/1684> for SOLR-14613 >>>>>> <https://issues.apache.org/jira/browse/SOLR-14613> to the dev list >>>>>> for a wider audience. This is about replacing the now (in master) gone >>>>>> Autoscaling framework with a way for clients to write their customized >>>>>> placement code] >>>>>> >>>>>> It took me a long time to write this mail and it's quite long, sorry. >>>>>> Please anybody interested in the future of Autoscaling (not only >>>>>> those I cc'ed) do read it and provide feedback. Very impacting decisions >>>>>> have to be made now. >>>>>> >>>>>> Thanks Noble for your feedback. >>>>>> I believe it is important that we are aligned on what we build here, >>>>>> esp. at the early defining stages (now). >>>>>> >>>>>> Let me try to elaborate on your concerns and provide in general the >>>>>> rationale behind the approach. >>>>>> >>>>>> *> Anyone who wishes to implement this should not require to learn a >>>>>> lot before even getting started* >>>>>> For somebody who knows Solr (what is a Node, Collection, Shard, >>>>>> Replica) and basic notions related to Autoscaling (getting variables >>>>>> representing current state to make decisions), there’s not much to learn. >>>>>> The framework uses the same concepts, often with the same names. >>>>>> >>>>>> *> I don't believe we should have a set of interfaces that duplicate >>>>>> existing classes just for this functionality.* >>>>>> Where appropriate we can have existing classes be the implementations >>>>>> for these interfaces and be passed to the plugins, that would be >>>>>> perfectly >>>>>> ok. The proposal doesn’t include implementations at this stage, therefore >>>>>> there’s no duplication, or not yet... (we must get the interfaces right >>>>>> and >>>>>> agreed upon before implementation). If some interface methods in the >>>>>> proposal have a different name from equivalent methods in internal >>>>>> classes >>>>>> we plan to use, of course let's rename one or the other. >>>>>> >>>>>> Existing internal abstractions are most of the time concrete classes >>>>>> and not interfaces (Replica, Slice, DocCollection, ClusterState). >>>>>> Making these visible to contrib code living elsewhere is making future >>>>>> refactoring hard and contrib code will most likely end up reaching to >>>>>> methods it shouldn’t be using. If we define a clean set of interfaces for >>>>>> plugins, I wouldn’t hesitate to break external plugins that reach out to >>>>>> other internal Solr classes, but will make everything possible to keep >>>>>> the >>>>>> API backward compatible so existing plugins can be recompiled without >>>>>> change. >>>>>> >>>>>> *> 24 interfaces to do this is definitely over engineering* >>>>>> I don’t consider the number of classes or interfaces a metric of >>>>>> complexity or of engineering quality. There are sample >>>>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-ddbe185b5e7922b91b90dfabfc50df4c> >>>>>> plugin implementations to serve as a base for plugin writers (and for us >>>>>> defining this framework) and I believe the process is relatively simple. >>>>>> Trying to do the same things with existing Solr classes might prove a lot >>>>>> harder (but might be worth the effort for comparison purposes to make >>>>>> sure >>>>>> we agree on the approach? For example, getting sister replicas of a given >>>>>> replica in the proposed API is: replica.getShard() >>>>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-a2d49bd52fddde54bb7fd2e96238507eR27> >>>>>> .getReplicas() >>>>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-9633f5e169fa3095062451599daac213R31>. >>>>>> Doing so with the internal classes likely involves getting the >>>>>> DocCollection and Slice name from the Replica, then get the >>>>>> DocCollection from the cluster state, there get the Slice based on >>>>>> its name and finally getReplicas() from the Slice). I consider the >>>>>> role of this new framework is to make life as easy as possible for >>>>>> writing >>>>>> placement code and the like, make life easy for us to maintain it, make >>>>>> it >>>>>> easy to write a simulation engine (should be at least an order of >>>>>> magnitude >>>>>> simpler than the previous one), etc. >>>>>> >>>>>> An example regarding readability and number of interfaces: rather >>>>>> than defining an enum with runtime annotation for building its instances >>>>>> ( >>>>>> Variable.Type >>>>>> <https://github.com/apache/lucene-solr/blob/branch_8_6/solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/Variable.java#L98>) >>>>>> and then very generic access methods, the proposal defines a specific >>>>>> interface for each “variable type” (called properties >>>>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-4c0fa84354f93cb00e6643aefd00fd3c>). >>>>>> Rather than concatenating strings to specify the data to return from a >>>>>> remote node (based on snitches >>>>>> <https://github.com/apache/lucene-solr/blame/branch_8_6/solr/core/src/java/org/apache/solr/cloud/rule/ImplicitSnitch.java#L60>, >>>>>> see doc >>>>>> <https://lucene.apache.org/solr/guide/8_1/solrcloud-autoscaling-policy-preferences.html#node-selector>), >>>>>> the proposal is explicit and strongly typed (here >>>>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-4ec32958f54ec8e1f7e2d5ce8de331bb> >>>>>> example >>>>>> to get a specific system property from a node). This definitely does >>>>>> increase the number of interfaces, but reduces IMO the effort to code to >>>>>> these abstractions and provides a lot more compile time and IDE >>>>>> assistance. >>>>>> >>>>>> Goal is to hide all the boilerplate code and machinery (and to a >>>>>> point - complexity) in the implementations of these interfaces rather >>>>>> than >>>>>> have each plugin writer deal with the same problems. >>>>>> >>>>>> We’re moving from something that was complex and hard to read and >>>>>> debug yet functionally extremely rich, to something simpler for us, more >>>>>> demanding for users (write code rather than policy config if there's a >>>>>> need >>>>>> for new behavior) but that should not be less "expressive" in any >>>>>> significant way. One could even imagine reimplementing the former >>>>>> Autoscaling config Domain Specific Language on top of these API (maybe >>>>>> as a >>>>>> summer internship project :) >>>>>> >>>>>> *> This is a common mistake that we all do. When we design a feature >>>>>> we think that is the most important thing.* >>>>>> If by *"most important thing"* you mean investing the best >>>>>> reasonable effort to do things right then yes. >>>>>> If you mean trying to make a minor feature look more important and >>>>>> inflated than it is, I disagree. >>>>>> As a personal note, replica placement is not the aspect of SolrCloud >>>>>> I'm most interested in, but the first bottleneck we hit when pushing the >>>>>> scale of SolrCloud. I approach this with a state of mind "let's do it >>>>>> right >>>>>> and get it out of the way" to move to topics I really want to work on >>>>>> (around distribution in SolrCloud and the role of Overseer). Implementing >>>>>> Autoscaling in a way that simplifies future refactoring (or that does not >>>>>> make them harder than they already are) is therefore *very high* on >>>>>> my priority list, to support modest changes (Slice to Shard >>>>>> renaming) and more ambitious ones (replacing Zookeeper, removing >>>>>> Overseer, >>>>>> you name it). >>>>>> >>>>>> Thanks for reading, again sorry for the long email, but I hope this >>>>>> helps (at least helps the discussion), >>>>>> Ilan >>>>>> >>>>>> >>>>>> On Thu 23 Jul 2020 at 08:16, Noble Paul <notificati...@github.com> >>>>>> wrote: >>>>>> >>>>>>> I don't believe we should have a set of interfaces that duplicate >>>>>>> existing classes just for this functionality. This is a common mistake >>>>>>> that >>>>>>> we all do. When we design a feature we think that is the most important >>>>>>> thing. We endup over designing and over engineering things. This feature >>>>>>> will remain a tiny part of Solr. Anyone who wishes to implement this >>>>>>> should >>>>>>> not require to learn a lot before even getting started. Let's try to >>>>>>> have a >>>>>>> minimal set of interfaces so that people who try to implement them do >>>>>>> not >>>>>>> have a huge learning cure. >>>>>>> >>>>>>> Let's try to understand the requirement >>>>>>> >>>>>>> - Solr wants a set of positions to place a few replicas >>>>>>> - The implementation wants to know what is the current state of >>>>>>> the cluster so that it can make those decisions >>>>>>> >>>>>>> 24 interfaces to do this is definitely over engineering >>>>>>> >>>>>>> — >>>>>>> You are receiving this because you authored the thread. >>>>>>> Reply to this email directly, view it on GitHub >>>>>>> <https://github.com/apache/lucene-solr/pull/1684#issuecomment-662837142>, >>>>>>> or unsubscribe >>>>>>> <https://github.com/notifications/unsubscribe-auth/AKIOMCFT5GU2II347GZ4HTTR47IVTANCNFSM4PC3HDKQ> >>>>>>> . >>>>>>> >>>>>> >>>>>> >> >> -- >> http://www.needhamsoftware.com (work) >> http://www.the111shift.com (play) >> >