Re: Shard Splitting API Proposal

Jan Lehnardt Thu, 21 Mar 2019 07:45:15 -0700

Hi Nick,

On first glance, this looks all great and like an exemplary PR that is easy to 
follow. And bonus props for the nice docs. I'll have more time for a through 
review over the weekend.


Cheers
Jan
—

> On 18. Mar 2019, at 19:42, Nick Vatamaniuc <vatam...@gmail.com> wrote:
> 
> Hello everyone,
> 
> Thank you all (Joan, Jan, Mike, Ilya, Adam) who contributed to the API
> discussion. There is now a PR open
> https://github.com/apache/couchdb/pull/1972 . If you get a chance, I would
> appreciate any reviews, feedback or comments.
> 
> The PR message explains how the commits are organized and references the
> RFC. Basically it starts with preparatory work, ensuring all the existing
> components know how to deal with split shards. Then, some lower level bits
> are implemented, like bulk copy, internal replicator updates, etc.,
> followed by the individual job implementation and the job manager which
> stitches everything together. In the end is the HTTP API implementation
> along with a suite of unit and Elixir integration tests.
> 
> There is also a README_reshard.md file in src/mem3 that tries to provide a
> more in-depth technical description of how everything fits together.
> https://github.com/apache/couchdb/pull/1972/files#diff-5ac7b51ec4e03e068bf271f34ecf88df
> (notice
> this URL might changer after a rebase).
> 
> Also special thanks to Paul (job module implementation, get_ring function,
> a lot of architectural and implementation advice), Eric (finding many bugs,
> fixes for the bugs, and writing bulk copy and change feed tests), and Jay
> (testing and a thorough code review).
> 
> Cheers,
> -Nick
> 
>> On Sun, Feb 17, 2019 at 2:32 AM Jan Lehnardt <m...@jan.io> wrote:
>> 
>> Heya Nick,
>> 
>> Nicely done. I think even though the majority of the discussion had
>> already happened here, the RFC nicely pulled together the various
>> discussion threads into a coherent whole.
>> 
>> I would imagine the discussion on GH would be similarly fruitful.
>> 
>> I gave it my +1, and as I said on the outset: I'm very excited about this
>> feature!
>> 
>> Best
>> Jan
>> —
>> 
>>> On 15. Feb 2019, at 23:45, Nick Vatamaniuc <vatam...@gmail.com> wrote:
>>> 
>>> Decided to kick the tires on the new RFC proposal issue type and created
>>> one for shard splitting:
>>> 
>>> https://github.com/apache/couchdb/issues/1920
>>> 
>>> Let's see how it goes. Being it's the first one let me know if I missed
>>> anything obvious.
>>> 
>>> Also I'd like to thank everyone who contributed to the discussion. The
>> API
>>> is looking more solid and is much improved from where it started.
>>> 
>>> Cheers,
>>> -Nick
>>> 
>>> 
>>> 
>>>> On Wed, Feb 13, 2019 at 12:03 PM Nick Vatamaniuc <vatam...@gmail.com>
>> wrote:
>>>> 
>>>> 
>>>> 
>>>>> On Wed, Feb 13, 2019 at 11:52 AM Jan Lehnardt <j...@apache.org> wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>>> On 13. Feb 2019, at 17:12, Nick Vatamaniuc <vatam...@gmail.com>
>> wrote:
>>>>>> 
>>>>>> Hi Jan,
>>>>>> 
>>>>>> Thanks for taking a look!
>>>>>> 
>>>>>>> On Wed, Feb 13, 2019 at 6:28 AM Jan Lehnardt <j...@apache.org> wrote:
>>>>>>> 
>>>>>>> Nick, this is great, I have a few tiny nits left, apologies I only
>> now
>>>>> got
>>>>>>> to it.
>>>>>>> 
>>>>>>>> On 12. Feb 2019, at 18:08, Nick Vatamaniuc <vatam...@gmail.com>
>>>>> wrote:
>>>>>>>> 
>>>>>>>> Shard Splitting API Proposal
>>>>>>>> 
>>>>>>>> I'd like thank everyone who contributed to the API discussion. As a
>>>>>>> result
>>>>>>>> we have a much better and consistent API that what we started with.
>>>>>>>> 
>>>>>>>> Before continuing I wanted to summarize to see what we ended up
>> with.
>>>>> The
>>>>>>>> main changes since the initial proposal were switching to using
>>>>> /_reshard
>>>>>>>> as the main endpoint and having a detailed state transition history
>>>>> for
>>>>>>>> jobs.
>>>>>>>> 
>>>>>>>> * GET /_reshard
>>>>>>>> 
>>>>>>>> Top level summary. Besides the new _reshard endpoint, there `reason`
>>>>> and
>>>>>>>> the stats are more detailed.
>>>>>>>> 
>>>>>>>> Returns
>>>>>>>> 
>>>>>>>> {
>>>>>>>> "completed": 3,
>>>>>>>> "failed": 4,
>>>>>>>> "running": 0,
>>>>>>>> "state": "stopped",
>>>>>>>> "state_reason": "Manual rebalancing",
>>>>>>>> "stopped": 0,
>>>>>>>> "total": 7
>>>>>>>> }
>>>>>>>> 
>>>>>>>> * PUT /_reshard/state
>>>>>>>> 
>>>>>>>> Start or stop global rebalacing.
>>>>>>>> 
>>>>>>>> Body
>>>>>>>> 
>>>>>>>> {
>>>>>>>> "state": "stopped",
>>>>>>>> "reason": "Manual rebalancing"
>>>>>>>> }
>>>>>>>> 
>>>>>>>> Returns
>>>>>>>> 
>>>>>>>> {
>>>>>>>> "ok": true
>>>>>>>> }
>>>>>>>> 
>>>>>>>> * GET /_reshard/state
>>>>>>>> 
>>>>>>>> Return global resharding state and reason.
>>>>>>>> 
>>>>>>>> {
>>>>>>>> "reason": "Manual rebalancing",
>>>>>>>> "state": “stopped”
>>>>>>>> }
>>>>>>> 
>>>>>>> More a note than a change request, but `state` is a very generic term
>>>>> that
>>>>>>> often confuses folks when they are new to something. If the set of
>>>>> possible
>>>>>>> states is `started` and `stopped`, how about making this endpoint a
>>>>> boolean?
>>>>>>> 
>>>>>>> /_reshard/enabled
>>>>>>> 
>>>>>>> {
>>>>>>> "enabled": true|false,
>>>>>>> "reason": "Manual rebalancing"
>>>>>>> }
>>>>>>> 
>>>>>>> 
>>>>>> I thought of that as well. However _reshard/state seemed consistent
>> with
>>>>>> _reshard/jobs/$jobid/state. Setting "state":"stopped" _reshard/state
>>>>> will
>>>>>> lead to all individual running job state to become "stopped" as well.
>>>>> And
>>>>>> "running" will make jobs that are not individually stopped also become
>>>>>> "running". In other words since it directly toggle job's state (with a
>>>>> job
>>>>>> being to override stopped state) I like that it had the same arguments
>>>>> 
>>>>> Got it, makes perfect sense.
>>>>> 
>>>>>> and": true|false
>>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> * GET /_reshard/jobs
>>>>>>>> 
>>>>>>>> Get the state of all the resharding jobs on the cluster. Now we
>> have a
>>>>>>>> detailed
>>>>>>>> state transition history which looks similar what _scheduler/jobs
>>>>> have.
>>>>>>>> 
>>>>>>>> {
>>>>>>>> "jobs": [
>>>>>>>>     {
>>>>>>>>         "history": [
>>>>>>>>             {
>>>>>>>>                 "detail": null,
>>>>>>>>                 "timestamp": "2019-02-06T22:28:06Z",
>>>>>>>>                 "type": "new"
>>>>>>>>             },
>>>>>>>>             ...
>>>>>>>>             {
>>>>>>>>                 "detail": null,
>>>>>>>>                 "timestamp": "2019-02-06T22:28:10Z",
>>>>>>>>                 "type": "completed"
>>>>>>>>             }
>>>>>>>>         ],
>>>>>>>>         "id":
>>>>>>>> 
>>>>> "001-0a308ef9f7bd24bd4887d6e619682a6d3bb3d0fd94625866c5216ec1167b4e23",
>>>>>>>>         "job_state": "completed",
>>>>>>>>         "node": "node1@127.0.0.1",
>>>>>>>>         "source": "shards/00000000-ffffffff/db1.1549492084",
>>>>>>>>         "split_state": "completed",
>>>>>>>>         "start_time": "2019-02-06T22:28:06Z",
>>>>>>>>         "state_info": {},
>>>>>>>>         "targets": [
>>>>>>>>             "shards/00000000-7fffffff/db1.1549492084",
>>>>>>>>             "shards/80000000-ffffffff/db1.1549492084"
>>>>>>>>         ],
>>>>>>> 
>>>>>>> Since we went from /_split to /_reshard to prepare for merging
>> shards,
>>>>> we
>>>>>>> should reconsider source (singular) and targets (plural). Either a
>>>>> merge
>>>>>>> job (in the future) uses sources (plural) and target (singular) and
>>>>> the job
>>>>>>> schema is intentionally different, or we unify things to, maybe
>>>>> singular:
>>>>>>> source/target which would have the nice property of being analogous
>> to
>>>>> our
>>>>>>> replication job schema. The type definition then is source:String and
>>>>>>> target:Array(2) for split jobs and source:Array(2) target:String for
>>>>>>> (future) merge jobs.
>>>>>>> 
>>>>>>> 
>>>>>> Joan suggested adding a "type" field to both job creation POST body
>> and
>>>>>> also returning it when we inspect the job(s) state. So the
>>>>> "type":"split"
>>>>>> would toggle the schema. It could be "merge" in the future, or even
>>>>>> something like "rebalance" where it would merge some and split others
>>>>>> perhaps :-) and since we have a type it would be easier to
>> differentiate
>>>>>> between the merge and split jobs. But if there is a consensus from
>>>>> others
>>>>>> about switching targets to target that's easily as well.
>>>>> 
>>>>> Ah, I’m less concerned here about not being able to tell whether it’s a
>>>>> split or a merge, and more about that having an indiscriminate plural
>>>>> form (sourceS/targetS) depending on the type. It’s just an easy thing
>> to
>>>>> get wrong.
>>>>> 
>>>>> In addition, we already have source/target in CouchDB replication,
>>>>> which people already use successfully, so making a similar thing that
>>>>> behaves slightly differently doesn’t sit quite right with me.
>>>>> 
>>>>> I understand that I’m arguing to remove an ’s’ for very nitpicky
>>>>> but these are the kind of nitpick discussions we’ve done a lot in
>>>>> the early days which resulted in a by and large decent API that
>>>>> has served as well, and it’s something I’d like to see taken forward.
>>>>> Apologies if this all sounds very strict ;)
>>>>> 
>>>>> 
>>>> Thanks for the longer explanation. I understand now and agree, let's
>> make
>>>> it target. No worries about sounding nitpicky we should be nitpicky
>> about
>>>> APIs!
>>>> 
>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> And just another question, sorry if I missed this elsewhere, would we
>>>>> ever
>>>>>>> consider adding to split/merge ratio different from 1:2, say 1:4, or
>>>>> will
>>>>>>> folks have to run 1:2, 1:2, 1:2 to get to the same result? I’m fine
>>>>> with
>>>>>>> either and if 1:2 fixed makes things simpler, I’m all for it ;)
>>>>>>> 
>>>>>>> 
>>>>>> Good point. Actually it's already implemented that way already :-)
>> Right
>>>>>> below the API surface it has a split=2 parameter and it just creates
>> the
>>>>>> targets based on that. It could be 2, 3, 4, ... 10 etc. However I was
>>>>>> thinking of keeping it hard coded at 2 at first to keep the behavior
>>>>>> simpler at first and open that parameter to be user facing in a later
>>>>>> release based on user feedback.
>>>>> 
>>>>> Ace, again, fully on board with shipping 1:2 first and maybe offering
>>>>> other
>>>>> options later.
>>>>> 
>>>>> Best
>>>>> Jan
>>>>> —
>>>>> 
>>>>>> 
>>>>>> Cheers,
>>>>>> 
>>>>>> -Nick
>>>>> 
>>>>> 
>> 
>>

Re: Shard Splitting API Proposal

Reply via email to