Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

Igor Soarez Tue, 27 Sep 2022 14:00:26 -0700

Hi Ziming,

Thank you for having another look at this KIP, and please accept my apologies 
as to my delay in replying.

The migration introduces JBOD support, so before the migration there should 
only be one log directory configured per broker. This assumption simplifies how 
the controller learns about the initial log directory placement for existing 
partitions. The first broker registration referencing log directories must 
declare a single log directory - which must be online as brokers cannot start 
without any online log dirs. If there are no previous log dirs registered for 
this  broker, the controller assigns all existing partitions in the broker to 
this single directory.

Since we assume a single log dir per broker as a migration starting point and 
brokers cannot start without any online log directories, the 
`OfflineLogDirectories` field simply cannot be used in the first registration 
after the upgrade. However, the issue you described applies to 
`OnlineLogDirectories`. If use of this field is gated by metadata.version then 
we do require a double roll for brokers to make their first registration with a 
log directory.

As you describe, we could move the list of online log directories from the 
broker registration request to the heartbea request, but I believe we shouldn't 
do this because:
  a) The intention is to address a transient problem — the migration into the 
feature, which should happen only once — by accepting a long term commitment as 
to how the online log dirs are signaled.
  b) While the log directory status can change at any point from online to 
offline, the list of existing log directories is static for broker's lifetime, 
so it makes little sense to send this information over and over again to the 
controller.

We should also avoid gating on metadata.version to include the 
`OnlineLogDirectories` field in the broker registration. If the controllers are 
upgraded first, the existing versioning in the message schemas should suffice 
to guarantee compatibility. After the controllers are upgraded, the brokers can 
then be upgraded and each of them can register their single online log 
directory from the first instantiation.

Please, let me know if this makes sense and any other thoughts you might have.

Best,

--
Igor

On Thu, Sep 1, 2022, at 2:04 AM, deng ziming wrote:
> Hi Igor,
>
> I think this KIP can solve the current problems, I have some problems 
> relating to the migration section.
>
> Since we have bumped broker RPC version and metadata record version, 
> there will be some problems between brokers/controllers of different 
> versions. In ZK mode we use IBP as a flag to help solve this, in KRaft 
> mode we use a feature flag(metadata.version) as a flag for using new 
> RPC/metadata or not. 
>
> Assuming that we are upgrading from 3.3 to 3.4, firstly the finalized 
> version of metadata.version is 3.3, brokers will use version 1 of 
> `BrokerRegistrationRequest` which contains no `OfflineLogDirectories`, 
> finally the finalized version of metadata.version is 3.4, but brokers 
> will no longer send `BrokerRegistrationRequest` unless we restart the 
> broker, so controllers can’t be aware of the `OfflineLogDirectories` of 
> each broker, so we should reconsider the suggestion of Jason to use 
> `BrokerHeartbeatRequest` to communicate `OfflineLogDirectories`.
>
> Of course this problem can be solved through a double roll(restart 
> broker twice when upgrading), but we should trying to avoid it since we 
> now have feature flag.
>
> One solution is that we include `OfflineLogDirectories` both in 
> `BrokerRegistrationRequest` and `BrokerHeartbeatRequest` requests, when 
> upgrading from 3.3 to 3.4 the 
> `BrokerRegistrationRequest.OfflineLogDirectories` is empty whereas when 
> upgrading from 3.4 to 3.5 it will not be empty. And maybe we can also 
> remove `LogDirectoriesOfflineRequest` you proposed in this KIP.
>
> --
> Best,
> Ziming
>
>
>> On Aug 18, 2022, at 11:24 PM, Igor Soarez <i...@soarez.me> wrote:
>> 
>> Hi Ziming,
>> 
>> I'm sorry it took me a while to reply.
>> 
>> Thank you for having a look at this KIP and providing feedback.
>> 
>>> 1. We have a version field in meta.properties, currently it’s 1, and we can
>>> set it to 2 in this KIP, and we can give an example of server.properties and
>>> it’s corresponding meta.properties generated by the storage command tool.
>> 
>>> 2. When using storage format tool we should specify cluster-id, it will be 
>>> an
>>> arduous work if we should manually specify directory-ids for all log dirs,
>>> I think you can make it more clear about this change that the directory-ids
>>> are generated automatically.
>> 
>> Thank you for these suggestions. I've updated the KIP to:
>> 
>> * include an example
>> * clarify that the version field would be changed to 2
>> * clarify that the log directory UUIDs are automatically generated
>> 
>>> 3. When controller place a replica, it will select a broker as well as a log
>>> directory, the latter is currently accomplished in the broker side, so this
>>> will be a big change?
>> 
>> I think there can be benefits, as Jason described previously, if we change 
>> how
>> log directories are assigned as follow-up work.
>> 
>> From a codebase surface area perspective, it is definitely a big change
>> because there are many models, types and interfaces that assume replicas are
>> identified solely by a broker id.
>> That will have to change to include the directory UUID, many lines of code 
>> will
>> be affected.
>> 
>> But in terms of behavior it shouldn't be a big change at all. Brokers 
>> currently
>> select the log directory with the least logs in it. This isn't a very nice
>> policy, as logs can have wildly different sizes, and log directories can have
>> different capacities. But it is a policy that the controller can keep.
>> 
>> If we decide to extend the selection policy and keep it in the broker,
>> the broker will continue to be able to override the controller's selection
>> of log directory, using the `AssignReplicasToDirectories` RPC.
>> 
>>> 4. When handling log directory failures we will update Leader and ISR using
>>> the existing replica state machine, what is this state machine referring to,
>>> do you mean the AlterPartition RPC?
>> 
>> No, I mean that it will use the same rules and mechanism
>> (`PartitionChangeBuilder`) that is used when a broker is fenced, shutdown or
>> unregistered.
>> 
>> I think maybe the term "replica state machine" isn't the very appropriate.
>> I've updated the KIP to rephrase this section.
>> 
>> Thanks,
>> 
>> --
>> Igor

Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

Reply via email to