Hi.

We have made some progress on the topic.

The JRaft fork is merged to Ignite 3 master, now it's integrated with other
ready components.

The design of transactional protocol in the first iteration is published on
the master [1]

[1] https://github.com/apache/ignite-3/tree/main/modules/transactions


сб, 20 мар. 2021 г. в 21:00, Alexei Scherbakov <alexey.scherbak...@gmail.com
>:

> Folks,
>
> I want to share some information about progress in implementing the raft
> protocol in ignite 3, which is a prerequisite for metastorage.
>
> The implementation will consist of client and server modules. The client
> is responsible for interoperability between raft server node and any other
> remote/local java process
>
> I have recently finished a raft client API. The public API part is
> available here [1] for review. The entry point is RaftGroupService
> interface. The service implementation has not been finished yet and can be
> skipped for now.
>
> As for the server part, currently we are investigating two options. First
> is etcd [2] implementation ported to Java. The drawback here is the amount
> of work required to make it working. Second option is the adoption of
> jraft [3] implementation. It is a full featured implementation already
> written in Java, but the code is not quite clean in my opinion and will
> require some refactoring.
>
> The next step is to make a raft client working with server
> implementations. At least one is required for the next alpha. It is planned
> to have the same client for both server implementations. As soon as both
> will be ready, we will compare them by running consistency tests and
> benchmarks and drop the worst. I will give the next update when we will
> have a working client and at least one server implementation ready.
>
> [1] https://github.com/apache/ignite-3/pull/59/files
> [2] https://github.com/etcd-io/etcd/tree/master/raft
> [3] https://github.com/sofastack/sofa-jraft
>
> пт, 27 нояб. 2020 г. в 20:26, Alexey Goncharuk <alexey.goncha...@gmail.com
> >:
>
>> Folks, thanks to everyone who joined the call. Summary:
>>
>>    - We agree that it may be beneficial to separate metastorage and group
>>    membership services, however, the abstractions should be clean enough
>> so
>>    that we could implement group membership via metastorage
>>    - Production cluster setup will involve an administrator 'init' command
>>    that will initialize the metastorage raft group. Once the metastorage
>> is
>>    initialized, all nodes may be restarted arbitrarily
>>    - HA cluster must contain at least 3 nodes. 2-node cluster will stop
>>    progress when one of the nodes fails (due to metastorage requirements)
>>    - We will provide a 'developer' cluster mode which will allow a 1-node
>>    setup and auto-initialization without the 'init' command
>>    - We are targeting centralized affinity calculation that will be stored
>>    to the metastorage. Metastorage downtime does not necessarily mean
>> cluster
>>    availability (subject to the partition replication protocol choice). It
>>    would be good to maximally hide the partition object so that we could
>>    support range partitioning in the future
>>
>> To discuss at the next meeting (do not hesitate to send questions here
>> before the meeting):
>>
>>    - Raft implementation details (API model, porting, etc)
>>    - Transactions interaction with replication protocol
>>    - Weaker consistency options
>>
>> Please add more if I forgot something and let's choose a time for the next
>> meeting.
>>
>> --AG
>>
>> чт, 26 нояб. 2020 г. в 16:12, Kseniya Romanova <romanova.ks....@gmail.com
>> >:
>>
>> > Done
>> >
>> > чт, 26 нояб. 2020 г. в 13:18, Ivan Daschinsky <ivanda...@gmail.com>:
>> >
>> > > Alexey, is it possible to manage call at 16:00 MSK?
>> > >
>> > > чт, 26 нояб. 2020 г. в 12:30, Alexey Goncharuk <
>> > alexey.goncha...@gmail.com
>> > > >:
>> > >
>> > > > Hi Ivan,
>> > > >
>> > > > Unfortunately, the earliest window available for us is 12:00 MSK (1
>> > hour
>> > > > slot), or after 14:30 MSK. Let me know what time works best for you.
>> > > >
>> > > > ср, 25 нояб. 2020 г. в 21:38, Ivan Daschinsky <ivanda...@gmail.com
>> >:
>> > > >
>> > > > > Alexey, I kindly ask you to move the meeting a little bit earlier,
>> > > ideal
>> > > > > variant -- in the morning.
>> > > > >
>> > > > > ср, 25 нояб. 2020 г. в 20:10, Alexey Goncharuk <
>> > > > alexey.goncha...@gmail.com
>> > > > > >:
>> > > > >
>> > > > > > Folks, let's have the call on Friday, Nov 27th at 18:00 MSK? We
>> can
>> > > use
>> > > > > the
>> > > > > > following waiting room link:
>> > > > > >
>> > https://zoom.us/j/99450012496?pwd=RWZmOGhCNWlRK0ZpamdOOTZsYTJ0dz09
>> > > > > >
>> > > > > > Let me know if this time works for everybody.
>> > > > > >
>> > > > > > ср, 25 нояб. 2020 г. в 16:42, Alexey Goncharuk <
>> > > > > alexey.goncha...@gmail.com
>> > > > > > >:
>> > > > > >
>> > > > > > > Folks,
>> > > > > > >
>> > > > > > > I've made some edits in IEP-61 [1] regarding the group
>> membership
>> > > > > service
>> > > > > > > and transaction protocol interaction with the replication
>> > > > > infrastructure,
>> > > > > > > please take a look before our Friday call.
>> > > > > > >
>> > > > > > > [1]
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-61%3A+Common+Replication+Infrastructure
>> > > > > > >
>> > > > > > > пн, 23 нояб. 2020 г. в 13:28, Alexey Goncharuk <
>> > > > > > alexey.goncha...@gmail.com
>> > > > > > > >:
>> > > > > > >
>> > > > > > >> Thanks, Ivan,
>> > > > > > >>
>> > > > > > >> Another protocol for group membership worth checking out is
>> > RAPID
>> > > > [1]
>> > > > > (a
>> > > > > > >> recent one). Not sure though if there are any available
>> > > > > implementations
>> > > > > > for
>> > > > > > >> it already.
>> > > > > > >>
>> > > > > > >> [1]
>> > > > > >
>> > > https://www.usenix.org/system/files/conference/atc18/atc18-suresh.pdf
>> > > > > > >>
>> > > > > > >> пн, 23 нояб. 2020 г. в 10:46, Ivan Daschinsky <
>> > > ivanda...@gmail.com
>> > > > >:
>> > > > > > >>
>> > > > > > >>> Also, here is some interesting reading about gossip, SWIM
>> etc.
>> > > > > > >>>
>> > > > > > >>> 1 --
>> > > > > > >>>
>> > > > >
>> > http://www.cs.cornell.edu/Info/Projects/Spinglass/public_pdfs/SWIM.pdf
>> > > > > > >>> 2 --
>> > > > > > >>>
>> > > > > > >>>
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> http://www.antonkharenko.com/2015/09/swim-distributed-group-membership.html
>> > > > > > >>> 3 -- https://github.com/hashicorp/memberlist (Foundation
>> > library
>> > > > of
>> > > > > > >>> hashicorp serf)
>> > > > > > >>> 4 -- https://github.com/scalecube/scalecube-cluster --
>> (Java
>> > > > > > >>> implementation
>> > > > > > >>> of SWIM)
>> > > > > > >>>
>> > > > > > >>> чт, 19 нояб. 2020 г. в 16:35, Ivan Daschinsky <
>> > > ivanda...@gmail.com
>> > > > >:
>> > > > > > >>>
>> > > > > > >>> > >> Friday, Nov 27th work for you? If ok, let's have an
>> open
>> > > call
>> > > > > > then.
>> > > > > > >>> > Yes, great
>> > > > > > >>> > >> As for the protocol port - we will not be dealing with
>> the
>> > > > > > >>> > concurrency...
>> > > > > > >>> > >>Judging by the Rust port, it seems fairly
>> straightforward.
>> > > > > > >>> > Yes, they chose split transport and logic. But original Go
>> > > > package
>> > > > > > from
>> > > > > > >>> > etcd (see raft/node.go) contains some  heartbeats
>> mechanism
>> > > etc.
>> > > > > > >>> > I agree with you, this seems not to be a huge deal to
>> port.
>> > > > > > >>> >
>> > > > > > >>> > чт, 19 нояб. 2020 г. в 16:13, Alexey Goncharuk <
>> > > > > > >>> alexey.goncha...@gmail.com
>> > > > > > >>> > >:
>> > > > > > >>> >
>> > > > > > >>> >> Ivan,
>> > > > > > >>> >>
>> > > > > > >>> >> Agree, let's have a call to discuss the IEP. I have some
>> > more
>> > > > > > thoughts
>> > > > > > >>> >> regarding how the replication infrastructure works with
>> > > > > > >>> >> atomic/transactional caches, will put this info to the
>> IEP.
>> > > Does
>> > > > > > next
>> > > > > > >>> >> Friday, Nov 27th work for you? If ok, let's have an open
>> > call
>> > > > > then.
>> > > > > > >>> >>
>> > > > > > >>> >> As for the protocol port - we will not be dealing with
>> the
>> > > > > > concurrency
>> > > > > > >>> >> model if we choose this way, this is what I like about
>> their
>> > > > code
>> > > > > > >>> >> structure. Essentially, the raft module is a
>> single-threaded
>> > > > > > automata
>> > > > > > >>> >> which
>> > > > > > >>> >> has a callback to process a message, process a tick
>> > (timeout)
>> > > > and
>> > > > > > >>> produces
>> > > > > > >>> >> messages that should be sent and log entries that should
>> be
>> > > > > > persisted.
>> > > > > > >>> >> Judging by the Rust port, it seems fairly
>> straightforward.
>> > > Will
>> > > > be
>> > > > > > >>> happy
>> > > > > > >>> >> to
>> > > > > > >>> >> discuss this and other alternatives on the call as well.
>> > > > > > >>> >>
>> > > > > > >>> >> чт, 19 нояб. 2020 г. в 14:41, Ivan Daschinsky <
>> > > > > ivanda...@gmail.com
>> > > > > > >:
>> > > > > > >>> >>
>> > > > > > >>> >> > > Any existing library that can be used to avoid
>> > > > re-implementing
>> > > > > > the
>> > > > > > >>> >> > protocol ourselves? Perhaps, porting the existing
>> > > > implementation
>> > > > > > to
>> > > > > > >>> Java
>> > > > > > >>> >> > Personally, I like this idea. Go libraries (either raft
>> > > module
>> > > > > of
>> > > > > > >>> etcd
>> > > > > > >>> >> or
>> > > > > > >>> >> > serf by Hashicorp) are famous for clean code, good
>> design,
>> > > > > > >>> stability,
>> > > > > > >>> >> not
>> > > > > > >>> >> > enormous size.
>> > > > > > >>> >> > But, on other side, Go has different model for
>> concurrency
>> > > and
>> > > > > > >>> porting
>> > > > > > >>> >> > probably will not be so straightforward.
>> > > > > > >>> >> >
>> > > > > > >>> >> >
>> > > > > > >>> >> >
>> > > > > > >>> >> > чт, 19 нояб. 2020 г. в 13:48, Ivan Daschinsky <
>> > > > > > ivanda...@gmail.com
>> > > > > > >>> >:
>> > > > > > >>> >> >
>> > > > > > >>> >> > > I'd suggest to discuss this IEP and technical
>> details in
>> > > > open
>> > > > > > ZOOM
>> > > > > > >>> >> > > meeting.
>> > > > > > >>> >> > >
>> > > > > > >>> >> > > чт, 19 нояб. 2020 г. в 13:47, Ivan Daschinsky <
>> > > > > > >>> ivanda...@gmail.com>:
>> > > > > > >>> >> > >
>> > > > > > >>> >> > >>
>> > > > > > >>> >> > >>
>> > > > > > >>> >> > >> ---------- Forwarded message ---------
>> > > > > > >>> >> > >> От: Ivan Daschinsky <ivanda...@gmail.com>
>> > > > > > >>> >> > >> Date: чт, 19 нояб. 2020 г. в 13:02
>> > > > > > >>> >> > >> Subject: Re: IEP-61 Technical discussion
>> > > > > > >>> >> > >> To: Alexey Goncharuk <alexey.goncha...@gmail.com>
>> > > > > > >>> >> > >>
>> > > > > > >>> >> > >>
>> > > > > > >>> >> > >> Alexey, let's arise another question. Specifically,
>> how
>> > > > nodes
>> > > > > > >>> >> initially
>> > > > > > >>> >> > >> find each other (discovery) and how they detect
>> > failures.
>> > > > > > >>> >> > >>
>> > > > > > >>> >> > >> I suppose, that gossip protocol is an ideal
>> candidate.
>> > > For
>> > > > > > >>> example,
>> > > > > > >>> >> > >> consul [1] uses this approach, using serf [2]
>> library
>> > to
>> > > > > > discover
>> > > > > > >>> >> > members
>> > > > > > >>> >> > >> of cluster.
>> > > > > > >>> >> > >> Then consul forms raft ensemble (server nodes) and
>> > client
>> > > > use
>> > > > > > >>> raft
>> > > > > > >>> >> > >> ensemble only as lock service.
>> > > > > > >>> >> > >>
>> > > > > > >>> >> > >> PacificA suggests internal heartbeats mechanism for
>> > > failure
>> > > > > > >>> >> detection of
>> > > > > > >>> >> > >> replicated group, but it says nothing about initial
>> > > > discovery
>> > > > > > of
>> > > > > > >>> >> nodes.
>> > > > > > >>> >> > >>
>> > > > > > >>> >> > >> WDYT?
>> > > > > > >>> >> > >>
>> > > > > > >>> >> > >> [1] --
>> https://www.consul.io/docs/architecture/gossip
>> > > > > > >>> >> > >> [2] -- https://www.serf.io/
>> > > > > > >>> >> > >>
>> > > > > > >>> >> > >> чт, 19 нояб. 2020 г. в 12:46, Alexey Goncharuk <
>> > > > > > >>> >> > >> alexey.goncha...@gmail.com>:
>> > > > > > >>> >> > >>
>> > > > > > >>> >> > >>> Following up the Ignite 3.0 scope/development
>> approach
>> > > > > > threads,
>> > > > > > >>> >> this is
>> > > > > > >>> >> > >>> a separate thread to discuss technical aspects of
>> the
>> > > IEP.
>> > > > > > >>> >> > >>>
>> > > > > > >>> >> > >>> Let's reiterate one more time on the questions
>> raised
>> > by
>> > > > > Ivan
>> > > > > > >>> and
>> > > > > > >>> >> also
>> > > > > > >>> >> > >>> see if there are any other thoughts on the IEP:
>> > > > > > >>> >> > >>>
>> > > > > > >>> >> > >>>    - *Whether to deploy metastorage on a separate
>> > subset
>> > > > of
>> > > > > > the
>> > > > > > >>> >> nodes
>> > > > > > >>> >> > >>>    or allow Ignite to choose these nodes
>> > > automatically.* I
>> > > > > > >>> think it
>> > > > > > >>> >> is
>> > > > > > >>> >> > >>>    feasible to maintain both modes: by default,
>> Ignite
>> > > > will
>> > > > > > >>> choose
>> > > > > > >>> >> > >>>    metastorage nodes automatically which
>> essentially
>> > > will
>> > > > > > >>> provide
>> > > > > > >>> >> the
>> > > > > > >>> >> > same
>> > > > > > >>> >> > >>>    seamless user experience as TCP discovery SPI -
>> no
>> > > > > separate
>> > > > > > >>> >> roles,
>> > > > > > >>> >> > >>>    simplistic deployment. For deployments where
>> people
>> > > > want
>> > > > > to
>> > > > > > >>> have
>> > > > > > >>> >> > more
>> > > > > > >>> >> > >>>    fine-grained control over the nodes'
>> assignments,
>> > we
>> > > > will
>> > > > > > >>> >> provide a
>> > > > > > >>> >> > runtime
>> > > > > > >>> >> > >>>    configuration which will allow pinning
>> metastorage
>> > > > group
>> > > > > to
>> > > > > > >>> >> certain
>> > > > > > >>> >> > nodes,
>> > > > > > >>> >> > >>>    thus eliminating the latency concerns.
>> > > > > > >>> >> > >>>    - *Whether there are any TLA+ specs for the
>> > PacificA
>> > > > > > >>> protocol.*
>> > > > > > >>> >> Not
>> > > > > > >>> >> > >>>    to my knowledge, but it is known to be used in
>> > > > production
>> > > > > > by
>> > > > > > >>> >> > Microsoft and
>> > > > > > >>> >> > >>>    other projects, e.g. [1]
>> > > > > > >>> >> > >>>
>> > > > > > >>> >> > >>> I would like to collect general feedback on the
>> IEP,
>> > as
>> > > > well
>> > > > > > as
>> > > > > > >>> >> > feedback
>> > > > > > >>> >> > >>> on specific parts of it, such as:
>> > > > > > >>> >> > >>>
>> > > > > > >>> >> > >>>    - Metastorage API
>> > > > > > >>> >> > >>>    - Any existing library that can be used to avoid
>> > > > > > >>> re-implementing
>> > > > > > >>> >> the
>> > > > > > >>> >> > >>>    protocol ourselves? Perhaps, porting the
>> existing
>> > > > > > >>> implementation
>> > > > > > >>> >> to
>> > > > > > >>> >> > Java
>> > > > > > >>> >> > >>>    (the way TiKV did with etcd-raft [2] [3]? This
>> is a
>> > > > very
>> > > > > > >>> neat way
>> > > > > > >>> >> > btw in my
>> > > > > > >>> >> > >>>    opinion because I like the finite automata-like
>> > > > approach
>> > > > > of
>> > > > > > >>> the
>> > > > > > >>> >> > replication
>> > > > > > >>> >> > >>>    module, and, additionally, we could sync bug
>> fixes
>> > > and
>> > > > > > >>> >> improvements
>> > > > > > >>> >> > from
>> > > > > > >>> >> > >>>    the upstream project)
>> > > > > > >>> >> > >>>
>> > > > > > >>> >> > >>>
>> > > > > > >>> >> > >>> Thanks,
>> > > > > > >>> >> > >>> --AG
>> > > > > > >>> >> > >>>
>> > > > > > >>> >> > >>> [1]
>> > > > > > >>> >> > >>>
>> > > > > > >>> >>
>> > > > > >
>> > > https://cwiki.apache.org/confluence/display/INCUBATOR/PegasusProposal
>> > > > > > >>> >> > >>> [2]
>> https://github.com/etcd-io/etcd/tree/master/raft
>> > > > > > >>> >> > >>> [3] https://github.com/tikv/raft-rs
>> > > > > > >>> >> > >>>
>> > > > > > >>> >> > >>
>> > > > > > >>> >> > >>
>> > > > > > >>> >> > >> --
>> > > > > > >>> >> > >> Sincerely yours, Ivan Daschinskiy
>> > > > > > >>> >> > >>
>> > > > > > >>> >> > >>
>> > > > > > >>> >> > >> --
>> > > > > > >>> >> > >> Sincerely yours, Ivan Daschinskiy
>> > > > > > >>> >> > >>
>> > > > > > >>> >> > >
>> > > > > > >>> >> > >
>> > > > > > >>> >> > > --
>> > > > > > >>> >> > > Sincerely yours, Ivan Daschinskiy
>> > > > > > >>> >> > >
>> > > > > > >>> >> >
>> > > > > > >>> >> >
>> > > > > > >>> >> > --
>> > > > > > >>> >> > Sincerely yours, Ivan Daschinskiy
>> > > > > > >>> >> >
>> > > > > > >>> >>
>> > > > > > >>> >
>> > > > > > >>> >
>> > > > > > >>> > --
>> > > > > > >>> > Sincerely yours, Ivan Daschinskiy
>> > > > > > >>> >
>> > > > > > >>>
>> > > > > > >>>
>> > > > > > >>> --
>> > > > > > >>> Sincerely yours, Ivan Daschinskiy
>> > > > > > >>>
>> > > > > > >>
>> > > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > Sincerely yours, Ivan Daschinskiy
>> > > > >
>> > > >
>> > >
>> > >
>> > > --
>> > > Sincerely yours, Ivan Daschinskiy
>> > >
>> >
>>
>
>
> --
>
> Best regards,
> Alexei Scherbakov
>


-- 

Best regards,
Alexei Scherbakov

Reply via email to