Folks, let's have the call on Friday, Nov 27th at 18:00 MSK? We can use the following waiting room link: https://zoom.us/j/99450012496?pwd=RWZmOGhCNWlRK0ZpamdOOTZsYTJ0dz09
Let me know if this time works for everybody. ср, 25 нояб. 2020 г. в 16:42, Alexey Goncharuk <alexey.goncha...@gmail.com>: > Folks, > > I've made some edits in IEP-61 [1] regarding the group membership service > and transaction protocol interaction with the replication infrastructure, > please take a look before our Friday call. > > [1] > https://cwiki.apache.org/confluence/display/IGNITE/IEP-61%3A+Common+Replication+Infrastructure > > пн, 23 нояб. 2020 г. в 13:28, Alexey Goncharuk <alexey.goncha...@gmail.com > >: > >> Thanks, Ivan, >> >> Another protocol for group membership worth checking out is RAPID [1] (a >> recent one). Not sure though if there are any available implementations for >> it already. >> >> [1] https://www.usenix.org/system/files/conference/atc18/atc18-suresh.pdf >> >> пн, 23 нояб. 2020 г. в 10:46, Ivan Daschinsky <ivanda...@gmail.com>: >> >>> Also, here is some interesting reading about gossip, SWIM etc. >>> >>> 1 -- >>> http://www.cs.cornell.edu/Info/Projects/Spinglass/public_pdfs/SWIM.pdf >>> 2 -- >>> >>> http://www.antonkharenko.com/2015/09/swim-distributed-group-membership.html >>> 3 -- https://github.com/hashicorp/memberlist (Foundation library of >>> hashicorp serf) >>> 4 -- https://github.com/scalecube/scalecube-cluster -- (Java >>> implementation >>> of SWIM) >>> >>> чт, 19 нояб. 2020 г. в 16:35, Ivan Daschinsky <ivanda...@gmail.com>: >>> >>> > >> Friday, Nov 27th work for you? If ok, let's have an open call then. >>> > Yes, great >>> > >> As for the protocol port - we will not be dealing with the >>> > concurrency... >>> > >>Judging by the Rust port, it seems fairly straightforward. >>> > Yes, they chose split transport and logic. But original Go package from >>> > etcd (see raft/node.go) contains some heartbeats mechanism etc. >>> > I agree with you, this seems not to be a huge deal to port. >>> > >>> > чт, 19 нояб. 2020 г. в 16:13, Alexey Goncharuk < >>> alexey.goncha...@gmail.com >>> > >: >>> > >>> >> Ivan, >>> >> >>> >> Agree, let's have a call to discuss the IEP. I have some more thoughts >>> >> regarding how the replication infrastructure works with >>> >> atomic/transactional caches, will put this info to the IEP. Does next >>> >> Friday, Nov 27th work for you? If ok, let's have an open call then. >>> >> >>> >> As for the protocol port - we will not be dealing with the concurrency >>> >> model if we choose this way, this is what I like about their code >>> >> structure. Essentially, the raft module is a single-threaded automata >>> >> which >>> >> has a callback to process a message, process a tick (timeout) and >>> produces >>> >> messages that should be sent and log entries that should be persisted. >>> >> Judging by the Rust port, it seems fairly straightforward. Will be >>> happy >>> >> to >>> >> discuss this and other alternatives on the call as well. >>> >> >>> >> чт, 19 нояб. 2020 г. в 14:41, Ivan Daschinsky <ivanda...@gmail.com>: >>> >> >>> >> > > Any existing library that can be used to avoid re-implementing the >>> >> > protocol ourselves? Perhaps, porting the existing implementation to >>> Java >>> >> > Personally, I like this idea. Go libraries (either raft module of >>> etcd >>> >> or >>> >> > serf by Hashicorp) are famous for clean code, good design, >>> stability, >>> >> not >>> >> > enormous size. >>> >> > But, on other side, Go has different model for concurrency and >>> porting >>> >> > probably will not be so straightforward. >>> >> > >>> >> > >>> >> > >>> >> > чт, 19 нояб. 2020 г. в 13:48, Ivan Daschinsky <ivanda...@gmail.com >>> >: >>> >> > >>> >> > > I'd suggest to discuss this IEP and technical details in open ZOOM >>> >> > > meeting. >>> >> > > >>> >> > > чт, 19 нояб. 2020 г. в 13:47, Ivan Daschinsky < >>> ivanda...@gmail.com>: >>> >> > > >>> >> > >> >>> >> > >> >>> >> > >> ---------- Forwarded message --------- >>> >> > >> От: Ivan Daschinsky <ivanda...@gmail.com> >>> >> > >> Date: чт, 19 нояб. 2020 г. в 13:02 >>> >> > >> Subject: Re: IEP-61 Technical discussion >>> >> > >> To: Alexey Goncharuk <alexey.goncha...@gmail.com> >>> >> > >> >>> >> > >> >>> >> > >> Alexey, let's arise another question. Specifically, how nodes >>> >> initially >>> >> > >> find each other (discovery) and how they detect failures. >>> >> > >> >>> >> > >> I suppose, that gossip protocol is an ideal candidate. For >>> example, >>> >> > >> consul [1] uses this approach, using serf [2] library to discover >>> >> > members >>> >> > >> of cluster. >>> >> > >> Then consul forms raft ensemble (server nodes) and client use >>> raft >>> >> > >> ensemble only as lock service. >>> >> > >> >>> >> > >> PacificA suggests internal heartbeats mechanism for failure >>> >> detection of >>> >> > >> replicated group, but it says nothing about initial discovery of >>> >> nodes. >>> >> > >> >>> >> > >> WDYT? >>> >> > >> >>> >> > >> [1] -- https://www.consul.io/docs/architecture/gossip >>> >> > >> [2] -- https://www.serf.io/ >>> >> > >> >>> >> > >> чт, 19 нояб. 2020 г. в 12:46, Alexey Goncharuk < >>> >> > >> alexey.goncha...@gmail.com>: >>> >> > >> >>> >> > >>> Following up the Ignite 3.0 scope/development approach threads, >>> >> this is >>> >> > >>> a separate thread to discuss technical aspects of the IEP. >>> >> > >>> >>> >> > >>> Let's reiterate one more time on the questions raised by Ivan >>> and >>> >> also >>> >> > >>> see if there are any other thoughts on the IEP: >>> >> > >>> >>> >> > >>> - *Whether to deploy metastorage on a separate subset of the >>> >> nodes >>> >> > >>> or allow Ignite to choose these nodes automatically.* I >>> think it >>> >> is >>> >> > >>> feasible to maintain both modes: by default, Ignite will >>> choose >>> >> > >>> metastorage nodes automatically which essentially will >>> provide >>> >> the >>> >> > same >>> >> > >>> seamless user experience as TCP discovery SPI - no separate >>> >> roles, >>> >> > >>> simplistic deployment. For deployments where people want to >>> have >>> >> > more >>> >> > >>> fine-grained control over the nodes' assignments, we will >>> >> provide a >>> >> > runtime >>> >> > >>> configuration which will allow pinning metastorage group to >>> >> certain >>> >> > nodes, >>> >> > >>> thus eliminating the latency concerns. >>> >> > >>> - *Whether there are any TLA+ specs for the PacificA >>> protocol.* >>> >> Not >>> >> > >>> to my knowledge, but it is known to be used in production by >>> >> > Microsoft and >>> >> > >>> other projects, e.g. [1] >>> >> > >>> >>> >> > >>> I would like to collect general feedback on the IEP, as well as >>> >> > feedback >>> >> > >>> on specific parts of it, such as: >>> >> > >>> >>> >> > >>> - Metastorage API >>> >> > >>> - Any existing library that can be used to avoid >>> re-implementing >>> >> the >>> >> > >>> protocol ourselves? Perhaps, porting the existing >>> implementation >>> >> to >>> >> > Java >>> >> > >>> (the way TiKV did with etcd-raft [2] [3]? This is a very >>> neat way >>> >> > btw in my >>> >> > >>> opinion because I like the finite automata-like approach of >>> the >>> >> > replication >>> >> > >>> module, and, additionally, we could sync bug fixes and >>> >> improvements >>> >> > from >>> >> > >>> the upstream project) >>> >> > >>> >>> >> > >>> >>> >> > >>> Thanks, >>> >> > >>> --AG >>> >> > >>> >>> >> > >>> [1] >>> >> > >>> >>> >> https://cwiki.apache.org/confluence/display/INCUBATOR/PegasusProposal >>> >> > >>> [2] https://github.com/etcd-io/etcd/tree/master/raft >>> >> > >>> [3] https://github.com/tikv/raft-rs >>> >> > >>> >>> >> > >> >>> >> > >> >>> >> > >> -- >>> >> > >> Sincerely yours, Ivan Daschinskiy >>> >> > >> >>> >> > >> >>> >> > >> -- >>> >> > >> Sincerely yours, Ivan Daschinskiy >>> >> > >> >>> >> > > >>> >> > > >>> >> > > -- >>> >> > > Sincerely yours, Ivan Daschinskiy >>> >> > > >>> >> > >>> >> > >>> >> > -- >>> >> > Sincerely yours, Ivan Daschinskiy >>> >> > >>> >> >>> > >>> > >>> > -- >>> > Sincerely yours, Ivan Daschinskiy >>> > >>> >>> >>> -- >>> Sincerely yours, Ivan Daschinskiy >>> >>