Hi Nick, On first glance, this looks all great and like an exemplary PR that is easy to follow. And bonus props for the nice docs. I'll have more time for a through review over the weekend.
Cheers Jan — > On 18. Mar 2019, at 19:42, Nick Vatamaniuc <vatam...@gmail.com> wrote: > > Hello everyone, > > Thank you all (Joan, Jan, Mike, Ilya, Adam) who contributed to the API > discussion. There is now a PR open > https://github.com/apache/couchdb/pull/1972 . If you get a chance, I would > appreciate any reviews, feedback or comments. > > The PR message explains how the commits are organized and references the > RFC. Basically it starts with preparatory work, ensuring all the existing > components know how to deal with split shards. Then, some lower level bits > are implemented, like bulk copy, internal replicator updates, etc., > followed by the individual job implementation and the job manager which > stitches everything together. In the end is the HTTP API implementation > along with a suite of unit and Elixir integration tests. > > There is also a README_reshard.md file in src/mem3 that tries to provide a > more in-depth technical description of how everything fits together. > https://github.com/apache/couchdb/pull/1972/files#diff-5ac7b51ec4e03e068bf271f34ecf88df > (notice > this URL might changer after a rebase). > > Also special thanks to Paul (job module implementation, get_ring function, > a lot of architectural and implementation advice), Eric (finding many bugs, > fixes for the bugs, and writing bulk copy and change feed tests), and Jay > (testing and a thorough code review). > > Cheers, > -Nick > >> On Sun, Feb 17, 2019 at 2:32 AM Jan Lehnardt <m...@jan.io> wrote: >> >> Heya Nick, >> >> Nicely done. I think even though the majority of the discussion had >> already happened here, the RFC nicely pulled together the various >> discussion threads into a coherent whole. >> >> I would imagine the discussion on GH would be similarly fruitful. >> >> I gave it my +1, and as I said on the outset: I'm very excited about this >> feature! >> >> Best >> Jan >> — >> >>> On 15. Feb 2019, at 23:45, Nick Vatamaniuc <vatam...@gmail.com> wrote: >>> >>> Decided to kick the tires on the new RFC proposal issue type and created >>> one for shard splitting: >>> >>> https://github.com/apache/couchdb/issues/1920 >>> >>> Let's see how it goes. Being it's the first one let me know if I missed >>> anything obvious. >>> >>> Also I'd like to thank everyone who contributed to the discussion. The >> API >>> is looking more solid and is much improved from where it started. >>> >>> Cheers, >>> -Nick >>> >>> >>> >>>> On Wed, Feb 13, 2019 at 12:03 PM Nick Vatamaniuc <vatam...@gmail.com> >> wrote: >>>> >>>> >>>> >>>>> On Wed, Feb 13, 2019 at 11:52 AM Jan Lehnardt <j...@apache.org> wrote: >>>>> >>>>> >>>>> >>>>>> On 13. Feb 2019, at 17:12, Nick Vatamaniuc <vatam...@gmail.com> >> wrote: >>>>>> >>>>>> Hi Jan, >>>>>> >>>>>> Thanks for taking a look! >>>>>> >>>>>>> On Wed, Feb 13, 2019 at 6:28 AM Jan Lehnardt <j...@apache.org> wrote: >>>>>>> >>>>>>> Nick, this is great, I have a few tiny nits left, apologies I only >> now >>>>> got >>>>>>> to it. >>>>>>> >>>>>>>> On 12. Feb 2019, at 18:08, Nick Vatamaniuc <vatam...@gmail.com> >>>>> wrote: >>>>>>>> >>>>>>>> Shard Splitting API Proposal >>>>>>>> >>>>>>>> I'd like thank everyone who contributed to the API discussion. As a >>>>>>> result >>>>>>>> we have a much better and consistent API that what we started with. >>>>>>>> >>>>>>>> Before continuing I wanted to summarize to see what we ended up >> with. >>>>> The >>>>>>>> main changes since the initial proposal were switching to using >>>>> /_reshard >>>>>>>> as the main endpoint and having a detailed state transition history >>>>> for >>>>>>>> jobs. >>>>>>>> >>>>>>>> * GET /_reshard >>>>>>>> >>>>>>>> Top level summary. Besides the new _reshard endpoint, there `reason` >>>>> and >>>>>>>> the stats are more detailed. >>>>>>>> >>>>>>>> Returns >>>>>>>> >>>>>>>> { >>>>>>>> "completed": 3, >>>>>>>> "failed": 4, >>>>>>>> "running": 0, >>>>>>>> "state": "stopped", >>>>>>>> "state_reason": "Manual rebalancing", >>>>>>>> "stopped": 0, >>>>>>>> "total": 7 >>>>>>>> } >>>>>>>> >>>>>>>> * PUT /_reshard/state >>>>>>>> >>>>>>>> Start or stop global rebalacing. >>>>>>>> >>>>>>>> Body >>>>>>>> >>>>>>>> { >>>>>>>> "state": "stopped", >>>>>>>> "reason": "Manual rebalancing" >>>>>>>> } >>>>>>>> >>>>>>>> Returns >>>>>>>> >>>>>>>> { >>>>>>>> "ok": true >>>>>>>> } >>>>>>>> >>>>>>>> * GET /_reshard/state >>>>>>>> >>>>>>>> Return global resharding state and reason. >>>>>>>> >>>>>>>> { >>>>>>>> "reason": "Manual rebalancing", >>>>>>>> "state": “stopped” >>>>>>>> } >>>>>>> >>>>>>> More a note than a change request, but `state` is a very generic term >>>>> that >>>>>>> often confuses folks when they are new to something. If the set of >>>>> possible >>>>>>> states is `started` and `stopped`, how about making this endpoint a >>>>> boolean? >>>>>>> >>>>>>> /_reshard/enabled >>>>>>> >>>>>>> { >>>>>>> "enabled": true|false, >>>>>>> "reason": "Manual rebalancing" >>>>>>> } >>>>>>> >>>>>>> >>>>>> I thought of that as well. However _reshard/state seemed consistent >> with >>>>>> _reshard/jobs/$jobid/state. Setting "state":"stopped" _reshard/state >>>>> will >>>>>> lead to all individual running job state to become "stopped" as well. >>>>> And >>>>>> "running" will make jobs that are not individually stopped also become >>>>>> "running". In other words since it directly toggle job's state (with a >>>>> job >>>>>> being to override stopped state) I like that it had the same arguments >>>>> >>>>> Got it, makes perfect sense. >>>>> >>>>>> and": true|false >>>>>> >>>>>> >>>>>>> >>>>>>>> * GET /_reshard/jobs >>>>>>>> >>>>>>>> Get the state of all the resharding jobs on the cluster. Now we >> have a >>>>>>>> detailed >>>>>>>> state transition history which looks similar what _scheduler/jobs >>>>> have. >>>>>>>> >>>>>>>> { >>>>>>>> "jobs": [ >>>>>>>> { >>>>>>>> "history": [ >>>>>>>> { >>>>>>>> "detail": null, >>>>>>>> "timestamp": "2019-02-06T22:28:06Z", >>>>>>>> "type": "new" >>>>>>>> }, >>>>>>>> ... >>>>>>>> { >>>>>>>> "detail": null, >>>>>>>> "timestamp": "2019-02-06T22:28:10Z", >>>>>>>> "type": "completed" >>>>>>>> } >>>>>>>> ], >>>>>>>> "id": >>>>>>>> >>>>> "001-0a308ef9f7bd24bd4887d6e619682a6d3bb3d0fd94625866c5216ec1167b4e23", >>>>>>>> "job_state": "completed", >>>>>>>> "node": "node1@127.0.0.1", >>>>>>>> "source": "shards/00000000-ffffffff/db1.1549492084", >>>>>>>> "split_state": "completed", >>>>>>>> "start_time": "2019-02-06T22:28:06Z", >>>>>>>> "state_info": {}, >>>>>>>> "targets": [ >>>>>>>> "shards/00000000-7fffffff/db1.1549492084", >>>>>>>> "shards/80000000-ffffffff/db1.1549492084" >>>>>>>> ], >>>>>>> >>>>>>> Since we went from /_split to /_reshard to prepare for merging >> shards, >>>>> we >>>>>>> should reconsider source (singular) and targets (plural). Either a >>>>> merge >>>>>>> job (in the future) uses sources (plural) and target (singular) and >>>>> the job >>>>>>> schema is intentionally different, or we unify things to, maybe >>>>> singular: >>>>>>> source/target which would have the nice property of being analogous >> to >>>>> our >>>>>>> replication job schema. The type definition then is source:String and >>>>>>> target:Array(2) for split jobs and source:Array(2) target:String for >>>>>>> (future) merge jobs. >>>>>>> >>>>>>> >>>>>> Joan suggested adding a "type" field to both job creation POST body >> and >>>>>> also returning it when we inspect the job(s) state. So the >>>>> "type":"split" >>>>>> would toggle the schema. It could be "merge" in the future, or even >>>>>> something like "rebalance" where it would merge some and split others >>>>>> perhaps :-) and since we have a type it would be easier to >> differentiate >>>>>> between the merge and split jobs. But if there is a consensus from >>>>> others >>>>>> about switching targets to target that's easily as well. >>>>> >>>>> Ah, I’m less concerned here about not being able to tell whether it’s a >>>>> split or a merge, and more about that having an indiscriminate plural >>>>> form (sourceS/targetS) depending on the type. It’s just an easy thing >> to >>>>> get wrong. >>>>> >>>>> In addition, we already have source/target in CouchDB replication, >>>>> which people already use successfully, so making a similar thing that >>>>> behaves slightly differently doesn’t sit quite right with me. >>>>> >>>>> I understand that I’m arguing to remove an ’s’ for very nitpicky >>>>> but these are the kind of nitpick discussions we’ve done a lot in >>>>> the early days which resulted in a by and large decent API that >>>>> has served as well, and it’s something I’d like to see taken forward. >>>>> Apologies if this all sounds very strict ;) >>>>> >>>>> >>>> Thanks for the longer explanation. I understand now and agree, let's >> make >>>> it target. No worries about sounding nitpicky we should be nitpicky >> about >>>> APIs! >>>> >>>> >>>>>> >>>>>>> >>>>>>> And just another question, sorry if I missed this elsewhere, would we >>>>> ever >>>>>>> consider adding to split/merge ratio different from 1:2, say 1:4, or >>>>> will >>>>>>> folks have to run 1:2, 1:2, 1:2 to get to the same result? I’m fine >>>>> with >>>>>>> either and if 1:2 fixed makes things simpler, I’m all for it ;) >>>>>>> >>>>>>> >>>>>> Good point. Actually it's already implemented that way already :-) >> Right >>>>>> below the API surface it has a split=2 parameter and it just creates >> the >>>>>> targets based on that. It could be 2, 3, 4, ... 10 etc. However I was >>>>>> thinking of keeping it hard coded at 2 at first to keep the behavior >>>>>> simpler at first and open that parameter to be user facing in a later >>>>>> release based on user feedback. >>>>> >>>>> Ace, again, fully on board with shipping 1:2 first and maybe offering >>>>> other >>>>> options later. >>>>> >>>>> Best >>>>> Jan >>>>> — >>>>> >>>>>> >>>>>> Cheers, >>>>>> >>>>>> -Nick >>>>> >>>>> >> >>