Hi Team,
Greetings, Apologies for the delay in reply as I was down with flu. We actually reached out to you for IT/ SAP/ Oracle/ Infor / Microsoft “VOTEC IT SERVICE PARTNERSHIP” “IT SERVICE OUTSOURCING” “ “PARTNER SERVICE SUBCONTRACTING” We have very attractive newly introduce reasonably price PARTNER IT SERVICE ODC SUBCONTRACTING MODEL in USA, Philippines, India and Singapore etc with White Label Model. Our LOW COST IT SERVICE ODC MODEL eliminate the cost of expensive employee payroll, Help partner to get profit more than 50% on each project.. ..We really mean it. We are already working with platinum partner like NTT DATA, NEC Singapore, Deloitte, Hitachi consulting. ACCENTURE, Abeam Singapore etc. Are u keen to understand VOTEC IT SERVICE MODEL PARTNERSHIP offerings? Let us know your availability this week OR Next week?? We can arrange discussion with Partner Manager. > On 01/25/2024 7:50 AM +08 José Armando García Sancio > <jsan...@confluent.io.invalid> wrote: > > > Thanks for the feedback Luke. See my comments below: > > On Wed, Jan 24, 2024 at 4:20 AM Luke Chen <show...@gmail.com> wrote: > > 1. About "VotersRecord": > > > > > When a KRaft voter becomes leader it will write a KRaftVersionRecord and > > VotersRecord to the log if the log or the latest snapshot doesn't contain > > any VotersRecord. This is done to make sure that the voter set in the > > bootstrap snapshot gets replicated to all of the voters and to not rely on > > all of the voters being configured with the same bootstrapped voter set. > > > > > This record will also be written to the log if it has never been written > > to the log in the past. This semantic is nice to have to consistently > > replicate the bootstrapping snapshot, at > > 00000000000000000000-0000000000.checkpoint, of the leader to all of the > > voters. > > > > If the `VotersRecord` has written into > > 00000000000000000000-0000000000.checkpoint, > > later, a new voter added. Will we write a new checkpoint to the file? > > If so, does that mean the `metadata.log.max.snapshot.interval.ms` will be > > ignored? > > KRaft (KafkaRaftClient) won't initiate the snapshot generation. The > snapshot generation will be initiated by the state machine (controller > or broker) using the RaftClient::createSnapshot method. When the state > machine calls into RaftClient::createSnapshot the KafkaRaftClient will > compute the set of voters at the provided offset and epoch, and write > the VotersRecord after the SnapshotHeaderRecord. This does mean that > the KafkaRaftClient needs to store in memory all of the voter set > configurations between the RaftClient::latestSnapshotId and the LEO > for the KRaft partition. > > > If not, then how could we make sure the voter set in the bootstrap snapshot > > gets replicated to all of the voters and to not rely on all of the voters > > being configured with the same bootstrapped voter set? > > I think my answer above should answer your question. VoterRecord-s > will be in the log (log segments) and the snapshots so they will be > replicated by Fetch and FetchSnapshot. When the voter set is changed > or bootstrapped, the leader will write the VotersRecord to the log > (active log segment). When the state machine (controller or broker) > asks to create a snapshot, KRaft will write the VotersRecord at the > start to the snapshot after the SnapshotHeaderRecord. > > > 2. After "RemoveVoter", what is the role of the node? > > It looks like after the voter got removed from the voter set, it is not a > > voter anymore. But I think it can still fetch with the leader. So it should > > be an observer, with a "process.role=controller"? And if the node was > > originally "process.role=controller,broker", it'll become a broker-only > > node? > > Kafka nodes need to allow for controllers that are not voters. I don't > expect too many issues from an implementation point of view. Most of > it may just be aggressive validation in KafkaConfig. I think the > easier way to explain this state is that there will be controllers > that will never become active controllers. If we want, we can have a > monitor that turns on (1) if a node is in this state. What do you > think? > > > 3. "UpdateVoter" RPC V.S. "ControllerRegistration" RPC? > > To me, the purpose of "UpdateVoter" is basically identical with > > "ControllerRegistration", to register the voter info in the active > > controller/metadata. Should we re-use the existing "ControllerRegistration" > > one? > > Yeah. I thought about this option. There are two issues here. > 1) The semantics are a bit different. I want the UpdateVoter RPC to > only update the endpoints and kraft.version. I don't want the RPC to > add the controller to the voter set. ControllerRegistration RPC is an > upsert. It creates a registration if it doesn't exist or it updates a > registration if one already exists. > > 2) These are two different layers. Metadata is the application layers. > KRaft is the consensus (control) layer. To give a precise example, the > controller is determining if it should send a > ControllerRegistrationRequest based on the state of the cluster > metadata (ControllerRegistrationRecord) while KRaft needs to send this > information based on the state of the VotesRecord control record. It > is beneficial to keep these layers separate even if they duplicate > information as it leads to a better implementation that we can test, > simulate and verify. > > > 4. > When the kraft.version is upgraded to a version greater 0 and the > > version is supported by the voters, the leader will write a control record > > batch that will include the kraft.version record and all of the AddVoter > > records for all of the voters... > > > > When can't we just write a "VoterRecord" including all voters? > > Yes. I need to update the KIP to remove the AddVoterRecord and > RemoveVoterRecord control records. Jason had a similar question and I > provided this answer just to give some context on the current state: > > "Yes. We can remove them and I'll remove them. For context, I > originally only had AddVoterRecord and RemoveVoterRecord. When > fleshing out the details of the bootstrapping process, I realized that > I needed a control record that would override the voter set and not do > an incremental change. I needed this functionality because there is no > guarantee that the content of the bootstrap checkpoint > (...0000-...0000.checkpoint) matches for all of the voters. > > After I added the VotersRecord, my thinking for keeping AddVoterRecord > and RemoveVoterRecord was to make it explicitly that the protocol only > allows changing one voter at a time. I can instead write a comparison > function that KRaft can use whenever it attempts to write or read a > VotersRecord. The comparison function would fail if all of the > possible majorities of the old voter set don't intersect with all of > the possible majority of the new voter set." > > > 5. In "LeaderChangeMessage", You said: > > > Add an optional VoterUuid to Voter. This change is not needed for > > correctness but it is nice to have for tracing and debugging. The leader > > will write version 0 if the kraft.version is 0. The leader will write > > version 1 if the kraft.version is 1. > > > > And in the code change below: > > + { "name": "VoterUuid", "type": "int32", "versions": "1+" } > > So, a. this VoterUuid field is a int32 field? > > No, it is not. It is a uuid. Let me fix that. Note that this value > comes from the directory.id in meta.properties which was introduced by > the JBOD KIP. > > > b. Where do we write "Kraft.version" to? > > The kraft.version is written in the KRaftVersionRecord control record > This record is first written by the KRaft leader to the log (active > segment). This record is also included in the snapshot and will be > added by KRaft when the state machine (controller and broker) calls > RaftClient::createSnapshot. > > If the control record doesn't exist (in the partition; log or > snapshot), the kraft.version is 0. If it exists then the kraft.version > is the latest version specified in the KRaftVersion field. > > > 6. In the "UpdateVoter", the handling section contains many `AddVoter` > > names, which is confusing. > > Okay. Let me review that section and fix it. > > > 7. About this: > > > This KIP requires the KRaft implementation now read uncommitted data from > > log and snapshot to discover the voter set. This also means that the KRaft > > implementation needs to handle this uncommitted state getting truncated and > > reverted. > > > > I understand why this is required now, but I imagine it will make the > > protocol more complicated. Could you elaborate more about how we're going > > to handle the uncommitted data in the KIP? > > I don't think it should complicate the protocol. It will complicate > the implementation. I suspect that my first implementation will just > keep the voter sets in-memory from the latest snapshot to the LEO. If > the KafkaRaftClient needs to truncate the log then it will also remove > from heap memory the voter sets that were truncated. > > Note that the latest snapshot id is less than or equal to the HWM > which is less than or equal to the LEO. All of these offsets are > exclusive. In the rare case (data-loss) that the KafkaRaftClient > truncates beyond the latest snapshot, then the raft client will > reconstruct the entire voter sets state by reading the latest snapshot > and the control records after that snapshot. When reading the latest > snapshot the KRaft only needs to read the first few control records > (SnapshotHeaderRecord, KRaftVersionRecord and VotersRecord). > > Thanks for the feedback Luke. It is greatly appreciated. Let me make > all of the suggested changes and I'll ping the discussion thread when > I am finished. > -- > -José