Folks, thanks to everyone who joined the call. Summary: - We agree that it may be beneficial to separate metastorage and group membership services, however, the abstractions should be clean enough so that we could implement group membership via metastorage - Production cluster setup will involve an administrator 'init' command that will initialize the metastorage raft group. Once the metastorage is initialized, all nodes may be restarted arbitrarily - HA cluster must contain at least 3 nodes. 2-node cluster will stop progress when one of the nodes fails (due to metastorage requirements) - We will provide a 'developer' cluster mode which will allow a 1-node setup and auto-initialization without the 'init' command - We are targeting centralized affinity calculation that will be stored to the metastorage. Metastorage downtime does not necessarily mean cluster availability (subject to the partition replication protocol choice). It would be good to maximally hide the partition object so that we could support range partitioning in the future
To discuss at the next meeting (do not hesitate to send questions here before the meeting): - Raft implementation details (API model, porting, etc) - Transactions interaction with replication protocol - Weaker consistency options Please add more if I forgot something and let's choose a time for the next meeting. --AG чт, 26 нояб. 2020 г. в 16:12, Kseniya Romanova <romanova.ks....@gmail.com>: > Done > > чт, 26 нояб. 2020 г. в 13:18, Ivan Daschinsky <ivanda...@gmail.com>: > > > Alexey, is it possible to manage call at 16:00 MSK? > > > > чт, 26 нояб. 2020 г. в 12:30, Alexey Goncharuk < > alexey.goncha...@gmail.com > > >: > > > > > Hi Ivan, > > > > > > Unfortunately, the earliest window available for us is 12:00 MSK (1 > hour > > > slot), or after 14:30 MSK. Let me know what time works best for you. > > > > > > ср, 25 нояб. 2020 г. в 21:38, Ivan Daschinsky <ivanda...@gmail.com>: > > > > > > > Alexey, I kindly ask you to move the meeting a little bit earlier, > > ideal > > > > variant -- in the morning. > > > > > > > > ср, 25 нояб. 2020 г. в 20:10, Alexey Goncharuk < > > > alexey.goncha...@gmail.com > > > > >: > > > > > > > > > Folks, let's have the call on Friday, Nov 27th at 18:00 MSK? We can > > use > > > > the > > > > > following waiting room link: > > > > > > https://zoom.us/j/99450012496?pwd=RWZmOGhCNWlRK0ZpamdOOTZsYTJ0dz09 > > > > > > > > > > Let me know if this time works for everybody. > > > > > > > > > > ср, 25 нояб. 2020 г. в 16:42, Alexey Goncharuk < > > > > alexey.goncha...@gmail.com > > > > > >: > > > > > > > > > > > Folks, > > > > > > > > > > > > I've made some edits in IEP-61 [1] regarding the group membership > > > > service > > > > > > and transaction protocol interaction with the replication > > > > infrastructure, > > > > > > please take a look before our Friday call. > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/IGNITE/IEP-61%3A+Common+Replication+Infrastructure > > > > > > > > > > > > пн, 23 нояб. 2020 г. в 13:28, Alexey Goncharuk < > > > > > alexey.goncha...@gmail.com > > > > > > >: > > > > > > > > > > > >> Thanks, Ivan, > > > > > >> > > > > > >> Another protocol for group membership worth checking out is > RAPID > > > [1] > > > > (a > > > > > >> recent one). Not sure though if there are any available > > > > implementations > > > > > for > > > > > >> it already. > > > > > >> > > > > > >> [1] > > > > > > > https://www.usenix.org/system/files/conference/atc18/atc18-suresh.pdf > > > > > >> > > > > > >> пн, 23 нояб. 2020 г. в 10:46, Ivan Daschinsky < > > ivanda...@gmail.com > > > >: > > > > > >> > > > > > >>> Also, here is some interesting reading about gossip, SWIM etc. > > > > > >>> > > > > > >>> 1 -- > > > > > >>> > > > > > http://www.cs.cornell.edu/Info/Projects/Spinglass/public_pdfs/SWIM.pdf > > > > > >>> 2 -- > > > > > >>> > > > > > >>> > > > > > > > > > > > > > > > http://www.antonkharenko.com/2015/09/swim-distributed-group-membership.html > > > > > >>> 3 -- https://github.com/hashicorp/memberlist (Foundation > library > > > of > > > > > >>> hashicorp serf) > > > > > >>> 4 -- https://github.com/scalecube/scalecube-cluster -- (Java > > > > > >>> implementation > > > > > >>> of SWIM) > > > > > >>> > > > > > >>> чт, 19 нояб. 2020 г. в 16:35, Ivan Daschinsky < > > ivanda...@gmail.com > > > >: > > > > > >>> > > > > > >>> > >> Friday, Nov 27th work for you? If ok, let's have an open > > call > > > > > then. > > > > > >>> > Yes, great > > > > > >>> > >> As for the protocol port - we will not be dealing with the > > > > > >>> > concurrency... > > > > > >>> > >>Judging by the Rust port, it seems fairly straightforward. > > > > > >>> > Yes, they chose split transport and logic. But original Go > > > package > > > > > from > > > > > >>> > etcd (see raft/node.go) contains some heartbeats mechanism > > etc. > > > > > >>> > I agree with you, this seems not to be a huge deal to port. > > > > > >>> > > > > > > >>> > чт, 19 нояб. 2020 г. в 16:13, Alexey Goncharuk < > > > > > >>> alexey.goncha...@gmail.com > > > > > >>> > >: > > > > > >>> > > > > > > >>> >> Ivan, > > > > > >>> >> > > > > > >>> >> Agree, let's have a call to discuss the IEP. I have some > more > > > > > thoughts > > > > > >>> >> regarding how the replication infrastructure works with > > > > > >>> >> atomic/transactional caches, will put this info to the IEP. > > Does > > > > > next > > > > > >>> >> Friday, Nov 27th work for you? If ok, let's have an open > call > > > > then. > > > > > >>> >> > > > > > >>> >> As for the protocol port - we will not be dealing with the > > > > > concurrency > > > > > >>> >> model if we choose this way, this is what I like about their > > > code > > > > > >>> >> structure. Essentially, the raft module is a single-threaded > > > > > automata > > > > > >>> >> which > > > > > >>> >> has a callback to process a message, process a tick > (timeout) > > > and > > > > > >>> produces > > > > > >>> >> messages that should be sent and log entries that should be > > > > > persisted. > > > > > >>> >> Judging by the Rust port, it seems fairly straightforward. > > Will > > > be > > > > > >>> happy > > > > > >>> >> to > > > > > >>> >> discuss this and other alternatives on the call as well. > > > > > >>> >> > > > > > >>> >> чт, 19 нояб. 2020 г. в 14:41, Ivan Daschinsky < > > > > ivanda...@gmail.com > > > > > >: > > > > > >>> >> > > > > > >>> >> > > Any existing library that can be used to avoid > > > re-implementing > > > > > the > > > > > >>> >> > protocol ourselves? Perhaps, porting the existing > > > implementation > > > > > to > > > > > >>> Java > > > > > >>> >> > Personally, I like this idea. Go libraries (either raft > > module > > > > of > > > > > >>> etcd > > > > > >>> >> or > > > > > >>> >> > serf by Hashicorp) are famous for clean code, good design, > > > > > >>> stability, > > > > > >>> >> not > > > > > >>> >> > enormous size. > > > > > >>> >> > But, on other side, Go has different model for concurrency > > and > > > > > >>> porting > > > > > >>> >> > probably will not be so straightforward. > > > > > >>> >> > > > > > > >>> >> > > > > > > >>> >> > > > > > > >>> >> > чт, 19 нояб. 2020 г. в 13:48, Ivan Daschinsky < > > > > > ivanda...@gmail.com > > > > > >>> >: > > > > > >>> >> > > > > > > >>> >> > > I'd suggest to discuss this IEP and technical details in > > > open > > > > > ZOOM > > > > > >>> >> > > meeting. > > > > > >>> >> > > > > > > > >>> >> > > чт, 19 нояб. 2020 г. в 13:47, Ivan Daschinsky < > > > > > >>> ivanda...@gmail.com>: > > > > > >>> >> > > > > > > > >>> >> > >> > > > > > >>> >> > >> > > > > > >>> >> > >> ---------- Forwarded message --------- > > > > > >>> >> > >> От: Ivan Daschinsky <ivanda...@gmail.com> > > > > > >>> >> > >> Date: чт, 19 нояб. 2020 г. в 13:02 > > > > > >>> >> > >> Subject: Re: IEP-61 Technical discussion > > > > > >>> >> > >> To: Alexey Goncharuk <alexey.goncha...@gmail.com> > > > > > >>> >> > >> > > > > > >>> >> > >> > > > > > >>> >> > >> Alexey, let's arise another question. Specifically, how > > > nodes > > > > > >>> >> initially > > > > > >>> >> > >> find each other (discovery) and how they detect > failures. > > > > > >>> >> > >> > > > > > >>> >> > >> I suppose, that gossip protocol is an ideal candidate. > > For > > > > > >>> example, > > > > > >>> >> > >> consul [1] uses this approach, using serf [2] library > to > > > > > discover > > > > > >>> >> > members > > > > > >>> >> > >> of cluster. > > > > > >>> >> > >> Then consul forms raft ensemble (server nodes) and > client > > > use > > > > > >>> raft > > > > > >>> >> > >> ensemble only as lock service. > > > > > >>> >> > >> > > > > > >>> >> > >> PacificA suggests internal heartbeats mechanism for > > failure > > > > > >>> >> detection of > > > > > >>> >> > >> replicated group, but it says nothing about initial > > > discovery > > > > > of > > > > > >>> >> nodes. > > > > > >>> >> > >> > > > > > >>> >> > >> WDYT? > > > > > >>> >> > >> > > > > > >>> >> > >> [1] -- https://www.consul.io/docs/architecture/gossip > > > > > >>> >> > >> [2] -- https://www.serf.io/ > > > > > >>> >> > >> > > > > > >>> >> > >> чт, 19 нояб. 2020 г. в 12:46, Alexey Goncharuk < > > > > > >>> >> > >> alexey.goncha...@gmail.com>: > > > > > >>> >> > >> > > > > > >>> >> > >>> Following up the Ignite 3.0 scope/development approach > > > > > threads, > > > > > >>> >> this is > > > > > >>> >> > >>> a separate thread to discuss technical aspects of the > > IEP. > > > > > >>> >> > >>> > > > > > >>> >> > >>> Let's reiterate one more time on the questions raised > by > > > > Ivan > > > > > >>> and > > > > > >>> >> also > > > > > >>> >> > >>> see if there are any other thoughts on the IEP: > > > > > >>> >> > >>> > > > > > >>> >> > >>> - *Whether to deploy metastorage on a separate > subset > > > of > > > > > the > > > > > >>> >> nodes > > > > > >>> >> > >>> or allow Ignite to choose these nodes > > automatically.* I > > > > > >>> think it > > > > > >>> >> is > > > > > >>> >> > >>> feasible to maintain both modes: by default, Ignite > > > will > > > > > >>> choose > > > > > >>> >> > >>> metastorage nodes automatically which essentially > > will > > > > > >>> provide > > > > > >>> >> the > > > > > >>> >> > same > > > > > >>> >> > >>> seamless user experience as TCP discovery SPI - no > > > > separate > > > > > >>> >> roles, > > > > > >>> >> > >>> simplistic deployment. For deployments where people > > > want > > > > to > > > > > >>> have > > > > > >>> >> > more > > > > > >>> >> > >>> fine-grained control over the nodes' assignments, > we > > > will > > > > > >>> >> provide a > > > > > >>> >> > runtime > > > > > >>> >> > >>> configuration which will allow pinning metastorage > > > group > > > > to > > > > > >>> >> certain > > > > > >>> >> > nodes, > > > > > >>> >> > >>> thus eliminating the latency concerns. > > > > > >>> >> > >>> - *Whether there are any TLA+ specs for the > PacificA > > > > > >>> protocol.* > > > > > >>> >> Not > > > > > >>> >> > >>> to my knowledge, but it is known to be used in > > > production > > > > > by > > > > > >>> >> > Microsoft and > > > > > >>> >> > >>> other projects, e.g. [1] > > > > > >>> >> > >>> > > > > > >>> >> > >>> I would like to collect general feedback on the IEP, > as > > > well > > > > > as > > > > > >>> >> > feedback > > > > > >>> >> > >>> on specific parts of it, such as: > > > > > >>> >> > >>> > > > > > >>> >> > >>> - Metastorage API > > > > > >>> >> > >>> - Any existing library that can be used to avoid > > > > > >>> re-implementing > > > > > >>> >> the > > > > > >>> >> > >>> protocol ourselves? Perhaps, porting the existing > > > > > >>> implementation > > > > > >>> >> to > > > > > >>> >> > Java > > > > > >>> >> > >>> (the way TiKV did with etcd-raft [2] [3]? This is a > > > very > > > > > >>> neat way > > > > > >>> >> > btw in my > > > > > >>> >> > >>> opinion because I like the finite automata-like > > > approach > > > > of > > > > > >>> the > > > > > >>> >> > replication > > > > > >>> >> > >>> module, and, additionally, we could sync bug fixes > > and > > > > > >>> >> improvements > > > > > >>> >> > from > > > > > >>> >> > >>> the upstream project) > > > > > >>> >> > >>> > > > > > >>> >> > >>> > > > > > >>> >> > >>> Thanks, > > > > > >>> >> > >>> --AG > > > > > >>> >> > >>> > > > > > >>> >> > >>> [1] > > > > > >>> >> > >>> > > > > > >>> >> > > > > > > > https://cwiki.apache.org/confluence/display/INCUBATOR/PegasusProposal > > > > > >>> >> > >>> [2] https://github.com/etcd-io/etcd/tree/master/raft > > > > > >>> >> > >>> [3] https://github.com/tikv/raft-rs > > > > > >>> >> > >>> > > > > > >>> >> > >> > > > > > >>> >> > >> > > > > > >>> >> > >> -- > > > > > >>> >> > >> Sincerely yours, Ivan Daschinskiy > > > > > >>> >> > >> > > > > > >>> >> > >> > > > > > >>> >> > >> -- > > > > > >>> >> > >> Sincerely yours, Ivan Daschinskiy > > > > > >>> >> > >> > > > > > >>> >> > > > > > > > >>> >> > > > > > > > >>> >> > > -- > > > > > >>> >> > > Sincerely yours, Ivan Daschinskiy > > > > > >>> >> > > > > > > > >>> >> > > > > > > >>> >> > > > > > > >>> >> > -- > > > > > >>> >> > Sincerely yours, Ivan Daschinskiy > > > > > >>> >> > > > > > > >>> >> > > > > > >>> > > > > > > >>> > > > > > > >>> > -- > > > > > >>> > Sincerely yours, Ivan Daschinskiy > > > > > >>> > > > > > > >>> > > > > > >>> > > > > > >>> -- > > > > > >>> Sincerely yours, Ivan Daschinskiy > > > > > >>> > > > > > >> > > > > > > > > > > > > > > > > > -- > > > > Sincerely yours, Ivan Daschinskiy > > > > > > > > > > > > > -- > > Sincerely yours, Ivan Daschinskiy > > >