### Motivation

There is a user case about data availability.

1. They have two racks, they have a rack aware policy that ensures it
writes across two racks.
2. They had some data on a topic with long retention
3. They ran a disaster recovery(DR) test, during this test, they shut down
one rack.
4. During the period of the DR test, auto-recovery ran. Because the DR test
only has one rack active, and because the default of auto-recovery is to do
rack aware with the best effort, it recovered up to an expected number of
replicas.
5. They stopped the DR test and all was well, but now that ledger was only
on one rack
6. They ran another DR test, this time basically moving data to the another
zone, but now data is missing because it is all only on one rack

We should supply a feature to support this case.

#### Auditor placement policy check logic

At now, we already support config
`auditorPeriodicPlacementPolicyCheckInterval` to check the ledger's segment
ensemble is adhering the placement policy. If the value of
`auditorPeriodicPlacementPolicyCheckInterval` > 0, `Auditor` will check it
by scheduled task. Default value is 0, means that not check placement
policy.

This feature is supporteed by [BP-34](
https://bookkeeper.apache.org/bps/BP-34-cluster-metadata-checker/)

#### Drawbacks

In BP-34 Implementation, it detect which ledger fragment's ensemble is not
adhering placement policy, only record it to `LoggerState`, not to repaired
the data to adhere placement policy.



### Proposal

Based on the above issues, we introduce a new config
`repairedPlacementPolicyNotAdheringBookieEnabled` to handle this case.

In `Auditor`, if user config `auditorPeriodicPlacementPolicyCheckInterval`
> 0, the scheduled task will check ledger fragment's ensemble is adhering
placement policy. If not adhere and config
`repairedPlacementPolicyNotAdheringBookieEnabled` is true, the `Auditor`
will mark the ledger underreplicated.

In `ReplicationWorker`,  it will get the undererplicated ledger, then will
check the ledger data integrity then try to move data to alive bookie at
now. If config `repairedPlacementPolicyNotAdheringBookieEnabled` is true,
it will check the ledger fragment ensemble is adhering placement policy.
The ledger fragment maybe loss data and not adhere placement policy at the
same time,

we will ignore repaired adhering placement policy problem in this time,
just replicate the data to active bookie and update ensemble info, cause
the data integrity is more important. If the ensemble is still not adhering
placement policy, the `Auditor` will mark this ledger again, then
`ReplicationWorker` will repaired adhering placement policy problem.

If the ledger fragment only not adhere placement policy,
`ReplicationWorker` will select other rack bookie to take place of old
bookie which in the same rack with other bookies. If there is no more rack
bookie, it won't repaired, record no more bookie to `LoggerState`.

### Changes

1. Support a new config `repairedPlacementPolicyNotAdheringBookieEnabled`
to control is repaired ensemble not adhere placement policy problem.
2. In `Auditor` placement policy check process, mark ledger if the ledger
ensemble not adhere placement policy
when`repairedPlacementPolicyNotAdheringBookieEnabled` is true.
3. In `ReplicationWorker` rereplicate, repaired the ledger fragment to
adhere placement policy.
4. Add this feature in the docs.

### Compatibility, Deprecation, and Migration Plan

The `repairedPlacementPolicyNotAdheringBookieEnabled` default is false, if
user upgrade the new release, it won't change any behavior compared to
before.


### Test Plan

We will add tests for the following module.

1. Auditor, test the ledger is marked underreplicated when the ledger
fragment policy is not adhering placement policy.
2. ReplicationWorker, test the not adhering placement policy fragment is
repaired to adhere placement policy.

Reply via email to