Re: FYI - Apache ZooKeeper Backup, a Treatise

Flavio Junqueira Sat, 18 Jun 2016 07:50:27 -0700

For acls, you can simply re-run the acl command to re-introduce them, unless 
you assume that no record of acls is maintained once they are introduced. If 
that's the case, then another way is to simply read periodically the zk state 
and keep that information somewhere else to be extra safe. This seems easier 
than dealing with raw zk backups. For topic configs, we would need to contact 
servers to reconstruct the data.


It is important to keep in mind that this happens quite rarely, though, and if 
such a daunting scenario does happen, it is quite possible that recovering the 
zk state is the least important of our problems. If you do worry about losing 
too many replicas of anything, be it zk or kafka brokers, to the point of not 
being able to recover, then it is indeed important to have a plan to restore 
data. Typically we try to avoid these scenarios by having enough replicas and 
making sure that we reduce the chance of correlated events (e.g., by having 
remote replicas, rack awareness), for some definition of enough.  

-Flavio

> On 16 Jun 2016, at 22:47, Ismael Juma <ism...@juma.me.uk> wrote:
> 
> Hi Jordan,
> 
> Kafka stores ACLs as well as client and topic configs in ZooKeeper so that
> lends credence to your argument, I think.
> 
> Ismael
> 
> On Thu, Jun 16, 2016 at 11:41 PM, Jordan Zimmerman <
> jor...@jordanzimmerman.com> wrote:
> 
>> Contrary to recommendations everywhere, my experience is that almost
>> everyone is storing source of truth data in ZooKeeper. It’s just too
>> tempting. You have a distributed file system just sitting there and it’s
>> too easy to use. You get a lot of great features like watches, etc. People
>> are using it to store configuration data, sequence numbers, etc. They are
>> storing these things without a good means of reproducing them in case of a
>> catastrophic outage. Further, I’ve heard of several orgs who just back up
>> the transaction logs and think they can restore them for DR. Anyway, that’s
>> the genesis of my blog post.
>> 
>> -Jordan
>> 
>>> On Jun 16, 2016, at 2:39 PM, Chris Nauroth <cnaur...@hortonworks.com>
>> wrote:
>>> 
>>> Yes, thank you to Jordan for the article!
>>> 
>>> Like Flavio, I personally have never come across the requirement for
>>> ZooKeeper backups.  I've generally followed the pattern that data stored
>>> in ZooKeeper is truly transient, and applications are built either to
>>> tolerate loss of that data or reconstruct it from first principles if it
>>> goes missing.  Adding observers in a second data center would give a
>>> rudimentary approximation of off-site backup in the case of a data center
>>> disaster, with the usual caveats around propagation delays.
>>> 
>>> Jordan, I'd be curious if you can share more specific details about the
>>> kind of data that you have that necessitates a backup/restore.  (If
>> you're
>>> not at liberty to share this, then I can understand that.)  It might
>>> inform if we have a motivating use case for backup/restore features
>> within
>>> ZooKeeper, such as some of the transaction log filtering that the article
>>> mentions.
>>> 
>>> --Chris Nauroth
>>> 
>>> 
>>> 
>>> 
>>> On 6/16/16, 1:03 AM, "Flavio Junqueira" <f...@apache.org> wrote:
>>> 
>>>> Great write-up, Jordan, thanks!
>>>> 
>>>> Whether to backup zk data or not is possibly an open topic for this
>>>> community, even though we have discussed it at times. My sense has been
>>>> that precisely because of the issues you mention in your post, it is
>>>> typically best to have a way to recreate its data upon a disaster rather
>>>> than backup the data. I think there could be three general scenarios in
>>>> which folks would prefer to backup data, but you correct me if these
>>>> aren't accurate:
>>>> 
>>>> - The data in zk isn't elsewhere, so it can't be recreated: zk isn't a
>>>> regular database, so I'd think it is best not to store data and focus on
>>>> cluster data or metadata.
>>>> - There is a just a lot of data and I'd rather have a shorter time to
>>>> recover: zk in general shouldn't have that much data in db, but let's go
>>>> with the assumption that for the requirements of the application it is a
>>>> lot. For such a case, it probably depends on whether your application
>> can
>>>> efficiently and effectively recover from a backup. Basically, as pointed
>>>> out in the post, the data could be inconsistent and cause trouble if you
>>>> don't think about the corner cases.
>>>> - The code to recreate the zk metadata for my application is super
>>>> complex: if you decide to code against zk, it is good to think whether
>>>> reconstructing in the case of a disaster is doable and if it is design
>>>> and implement to reconstruct the state upon a disaster.
>>>> 
>>>> Also, we typically provision enough replicas, often replicating across
>>>> data centers, to make sure that the data isn't all gone. Having more
>>>> replicas does not rule out completely the possibility of a disaster, but
>>>> in such rare cases we resort to the expensive path.
>>>> 
>>>> I personally have never worked with an application that was taking
>>>> backups of zk data in prod, so I'm really interested in what others
>>>> think.
>>>> 
>>>> -Flavio
>>>> 
>>>> 
>>>>> On 16 Jun 2016, at 00:43, Jordan Zimmerman <jor...@jordanzimmerman.com
>>> 
>>>>> wrote:
>>>>> 
>>>>> FYI - I wrote a blog about backing up ZooKeeper:
>>>>> 
>>>>> https://www.elastic.co/blog/zookeeper-backup-a-treatise
>>>>> 
>>>>> -Jordan
>>>> 
>>> 
>> 
>>

Re: FYI - Apache ZooKeeper Backup, a Treatise

Reply via email to