> I am not very familiar with bookkeeper and auditor history (so please let
> me know if this understanding doesn't work), but it seems to me that the
> process responsible for draining the bookie could be local to the bookie
> itself to limit network hops.

This is a very good point. One of the reasons we did the data
integrity work was that if there were entries missing from a bookie,
they would have to be copied to a replicator process and then copied
to the destination. The data integrity checker (which I promise we
will push upstream soon), runs on the bookie and only does one copy.
>From another node to the local node.

One thing that running the draining on the local bookie doesn't cover,
is that, if the bookie is down and unrecoverable, the bookie will
never be drained, so the data on the bookie would remain
underreplicated.

Perhaps this is a different case, and needs to be handled differently,
but it could also be handled by a mechanism similar to the data
integrity. There could be an auditor like process that scans all
ledger metadata to find ledger segments where any of the bookies are
missing ("draining" could be considered another missing state). When
it finds a ledger with unreplicated data, it selects a bookie from the
remaining bookies to take the data. This bookie is then marked as an
extra replica in the metadata. From this point the mechanism is the
data integrity mechanism. A bookie periodically checks if it has all
the data it is supposed to have (taking the extra replica metadata
into account), and copies anything that is missing.

> Do you see the bookkeeper tiered storage being used in every case?
No, I doubt it. For us, the thinking is that we want to use tiered
storage anyhow. So if it's there, we may as well use it for
autoscaling, and not spend too many cycles on another mechanism.

Another aspect of this is cost. Without tiered storage, the variable
that decides the number of nodes you need is the
(throughput)*(retention). So, assuming that throughput doesn't change
much, or is cyclical, there'll be very few autoscaling decisions taken
(nodes need to stick around for retention).
If you use tiered storage, the number of nodes needed is purely based
on throughput. You'll have fewer bookies, but autoscaling will need to
respond more frequently to variations in throughput.

-Ivan

Reply via email to