Hi Nick,

On first glance, this looks all great and like an exemplary PR that is easy to 
follow. And bonus props for the nice docs. I'll have more time for a through 
review over the weekend.

Cheers
Jan
—

> On 18. Mar 2019, at 19:42, Nick Vatamaniuc <vatam...@gmail.com> wrote:
> 
> Hello everyone,
> 
> Thank you all (Joan, Jan, Mike, Ilya, Adam) who contributed to the API
> discussion. There is now a PR open
> https://github.com/apache/couchdb/pull/1972 . If you get a chance, I would
> appreciate any reviews, feedback or comments.
> 
> The PR message explains how the commits are organized and references the
> RFC. Basically it starts with preparatory work, ensuring all the existing
> components know how to deal with split shards. Then, some lower level bits
> are implemented, like bulk copy, internal replicator updates, etc.,
> followed by the individual job implementation and the job manager which
> stitches everything together. In the end is the HTTP API implementation
> along with a suite of unit and Elixir integration tests.
> 
> There is also a README_reshard.md file in src/mem3 that tries to provide a
> more in-depth technical description of how everything fits together.
> https://github.com/apache/couchdb/pull/1972/files#diff-5ac7b51ec4e03e068bf271f34ecf88df
> (notice
> this URL might changer after a rebase).
> 
> Also special thanks to Paul (job module implementation, get_ring function,
> a lot of architectural and implementation advice), Eric (finding many bugs,
> fixes for the bugs, and writing bulk copy and change feed tests), and Jay
> (testing and a thorough code review).
> 
> Cheers,
> -Nick
> 
>> On Sun, Feb 17, 2019 at 2:32 AM Jan Lehnardt <m...@jan.io> wrote:
>> 
>> Heya Nick,
>> 
>> Nicely done. I think even though the majority of the discussion had
>> already happened here, the RFC nicely pulled together the various
>> discussion threads into a coherent whole.
>> 
>> I would imagine the discussion on GH would be similarly fruitful.
>> 
>> I gave it my +1, and as I said on the outset: I'm very excited about this
>> feature!
>> 
>> Best
>> Jan
>> —
>> 
>>> On 15. Feb 2019, at 23:45, Nick Vatamaniuc <vatam...@gmail.com> wrote:
>>> 
>>> Decided to kick the tires on the new RFC proposal issue type and created
>>> one for shard splitting:
>>> 
>>> https://github.com/apache/couchdb/issues/1920
>>> 
>>> Let's see how it goes. Being it's the first one let me know if I missed
>>> anything obvious.
>>> 
>>> Also I'd like to thank everyone who contributed to the discussion. The
>> API
>>> is looking more solid and is much improved from where it started.
>>> 
>>> Cheers,
>>> -Nick
>>> 
>>> 
>>> 
>>>> On Wed, Feb 13, 2019 at 12:03 PM Nick Vatamaniuc <vatam...@gmail.com>
>> wrote:
>>>> 
>>>> 
>>>> 
>>>>> On Wed, Feb 13, 2019 at 11:52 AM Jan Lehnardt <j...@apache.org> wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>>> On 13. Feb 2019, at 17:12, Nick Vatamaniuc <vatam...@gmail.com>
>> wrote:
>>>>>> 
>>>>>> Hi Jan,
>>>>>> 
>>>>>> Thanks for taking a look!
>>>>>> 
>>>>>>> On Wed, Feb 13, 2019 at 6:28 AM Jan Lehnardt <j...@apache.org> wrote:
>>>>>>> 
>>>>>>> Nick, this is great, I have a few tiny nits left, apologies I only
>> now
>>>>> got
>>>>>>> to it.
>>>>>>> 
>>>>>>>> On 12. Feb 2019, at 18:08, Nick Vatamaniuc <vatam...@gmail.com>
>>>>> wrote:
>>>>>>>> 
>>>>>>>> Shard Splitting API Proposal
>>>>>>>> 
>>>>>>>> I'd like thank everyone who contributed to the API discussion. As a
>>>>>>> result
>>>>>>>> we have a much better and consistent API that what we started with.
>>>>>>>> 
>>>>>>>> Before continuing I wanted to summarize to see what we ended up
>> with.
>>>>> The
>>>>>>>> main changes since the initial proposal were switching to using
>>>>> /_reshard
>>>>>>>> as the main endpoint and having a detailed state transition history
>>>>> for
>>>>>>>> jobs.
>>>>>>>> 
>>>>>>>> * GET /_reshard
>>>>>>>> 
>>>>>>>> Top level summary. Besides the new _reshard endpoint, there `reason`
>>>>> and
>>>>>>>> the stats are more detailed.
>>>>>>>> 
>>>>>>>> Returns
>>>>>>>> 
>>>>>>>> {
>>>>>>>> "completed": 3,
>>>>>>>> "failed": 4,
>>>>>>>> "running": 0,
>>>>>>>> "state": "stopped",
>>>>>>>> "state_reason": "Manual rebalancing",
>>>>>>>> "stopped": 0,
>>>>>>>> "total": 7
>>>>>>>> }
>>>>>>>> 
>>>>>>>> * PUT /_reshard/state
>>>>>>>> 
>>>>>>>> Start or stop global rebalacing.
>>>>>>>> 
>>>>>>>> Body
>>>>>>>> 
>>>>>>>> {
>>>>>>>> "state": "stopped",
>>>>>>>> "reason": "Manual rebalancing"
>>>>>>>> }
>>>>>>>> 
>>>>>>>> Returns
>>>>>>>> 
>>>>>>>> {
>>>>>>>> "ok": true
>>>>>>>> }
>>>>>>>> 
>>>>>>>> * GET /_reshard/state
>>>>>>>> 
>>>>>>>> Return global resharding state and reason.
>>>>>>>> 
>>>>>>>> {
>>>>>>>> "reason": "Manual rebalancing",
>>>>>>>> "state": “stopped”
>>>>>>>> }
>>>>>>> 
>>>>>>> More a note than a change request, but `state` is a very generic term
>>>>> that
>>>>>>> often confuses folks when they are new to something. If the set of
>>>>> possible
>>>>>>> states is `started` and `stopped`, how about making this endpoint a
>>>>> boolean?
>>>>>>> 
>>>>>>> /_reshard/enabled
>>>>>>> 
>>>>>>> {
>>>>>>> "enabled": true|false,
>>>>>>> "reason": "Manual rebalancing"
>>>>>>> }
>>>>>>> 
>>>>>>> 
>>>>>> I thought of that as well. However _reshard/state seemed consistent
>> with
>>>>>> _reshard/jobs/$jobid/state. Setting "state":"stopped" _reshard/state
>>>>> will
>>>>>> lead to all individual running job state to become "stopped" as well.
>>>>> And
>>>>>> "running" will make jobs that are not individually stopped also become
>>>>>> "running". In other words since it directly toggle job's state (with a
>>>>> job
>>>>>> being to override stopped state) I like that it had the same arguments
>>>>> 
>>>>> Got it, makes perfect sense.
>>>>> 
>>>>>> and": true|false
>>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> * GET /_reshard/jobs
>>>>>>>> 
>>>>>>>> Get the state of all the resharding jobs on the cluster. Now we
>> have a
>>>>>>>> detailed
>>>>>>>> state transition history which looks similar what _scheduler/jobs
>>>>> have.
>>>>>>>> 
>>>>>>>> {
>>>>>>>> "jobs": [
>>>>>>>>     {
>>>>>>>>         "history": [
>>>>>>>>             {
>>>>>>>>                 "detail": null,
>>>>>>>>                 "timestamp": "2019-02-06T22:28:06Z",
>>>>>>>>                 "type": "new"
>>>>>>>>             },
>>>>>>>>             ...
>>>>>>>>             {
>>>>>>>>                 "detail": null,
>>>>>>>>                 "timestamp": "2019-02-06T22:28:10Z",
>>>>>>>>                 "type": "completed"
>>>>>>>>             }
>>>>>>>>         ],
>>>>>>>>         "id":
>>>>>>>> 
>>>>> "001-0a308ef9f7bd24bd4887d6e619682a6d3bb3d0fd94625866c5216ec1167b4e23",
>>>>>>>>         "job_state": "completed",
>>>>>>>>         "node": "node1@127.0.0.1",
>>>>>>>>         "source": "shards/00000000-ffffffff/db1.1549492084",
>>>>>>>>         "split_state": "completed",
>>>>>>>>         "start_time": "2019-02-06T22:28:06Z",
>>>>>>>>         "state_info": {},
>>>>>>>>         "targets": [
>>>>>>>>             "shards/00000000-7fffffff/db1.1549492084",
>>>>>>>>             "shards/80000000-ffffffff/db1.1549492084"
>>>>>>>>         ],
>>>>>>> 
>>>>>>> Since we went from /_split to /_reshard to prepare for merging
>> shards,
>>>>> we
>>>>>>> should reconsider source (singular) and targets (plural). Either a
>>>>> merge
>>>>>>> job (in the future) uses sources (plural) and target (singular) and
>>>>> the job
>>>>>>> schema is intentionally different, or we unify things to, maybe
>>>>> singular:
>>>>>>> source/target which would have the nice property of being analogous
>> to
>>>>> our
>>>>>>> replication job schema. The type definition then is source:String and
>>>>>>> target:Array(2) for split jobs and source:Array(2) target:String for
>>>>>>> (future) merge jobs.
>>>>>>> 
>>>>>>> 
>>>>>> Joan suggested adding a "type" field to both job creation POST body
>> and
>>>>>> also returning it when we inspect the job(s) state. So the
>>>>> "type":"split"
>>>>>> would toggle the schema. It could be "merge" in the future, or even
>>>>>> something like "rebalance" where it would merge some and split others
>>>>>> perhaps :-) and since we have a type it would be easier to
>> differentiate
>>>>>> between the merge and split jobs. But if there is a consensus from
>>>>> others
>>>>>> about switching targets to target that's easily as well.
>>>>> 
>>>>> Ah, I’m less concerned here about not being able to tell whether it’s a
>>>>> split or a merge, and more about that having an indiscriminate plural
>>>>> form (sourceS/targetS) depending on the type. It’s just an easy thing
>> to
>>>>> get wrong.
>>>>> 
>>>>> In addition, we already have source/target in CouchDB replication,
>>>>> which people already use successfully, so making a similar thing that
>>>>> behaves slightly differently doesn’t sit quite right with me.
>>>>> 
>>>>> I understand that I’m arguing to remove an ’s’ for very nitpicky
>>>>> but these are the kind of nitpick discussions we’ve done a lot in
>>>>> the early days which resulted in a by and large decent API that
>>>>> has served as well, and it’s something I’d like to see taken forward.
>>>>> Apologies if this all sounds very strict ;)
>>>>> 
>>>>> 
>>>> Thanks for the longer explanation. I understand now and agree, let's
>> make
>>>> it target. No worries about sounding nitpicky we should be nitpicky
>> about
>>>> APIs!
>>>> 
>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> And just another question, sorry if I missed this elsewhere, would we
>>>>> ever
>>>>>>> consider adding to split/merge ratio different from 1:2, say 1:4, or
>>>>> will
>>>>>>> folks have to run 1:2, 1:2, 1:2 to get to the same result? I’m fine
>>>>> with
>>>>>>> either and if 1:2 fixed makes things simpler, I’m all for it ;)
>>>>>>> 
>>>>>>> 
>>>>>> Good point. Actually it's already implemented that way already :-)
>> Right
>>>>>> below the API surface it has a split=2 parameter and it just creates
>> the
>>>>>> targets based on that. It could be 2, 3, 4, ... 10 etc. However I was
>>>>>> thinking of keeping it hard coded at 2 at first to keep the behavior
>>>>>> simpler at first and open that parameter to be user facing in a later
>>>>>> release based on user feedback.
>>>>> 
>>>>> Ace, again, fully on board with shipping 1:2 first and maybe offering
>>>>> other
>>>>> options later.
>>>>> 
>>>>> Best
>>>>> Jan
>>>>> —
>>>>> 
>>>>>> 
>>>>>> Cheers,
>>>>>> 
>>>>>> -Nick
>>>>> 
>>>>> 
>> 
>> 

Reply via email to