HBase stores replication peering configuration in ZK. We're working on undoing that, but for now that information exists nowhere else.
On Thu, Jun 16, 2016 at 2:47 PM, Ismael Juma <ism...@juma.me.uk> wrote: > Hi Jordan, > > Kafka stores ACLs as well as client and topic configs in ZooKeeper so that > lends credence to your argument, I think. > > Ismael > > On Thu, Jun 16, 2016 at 11:41 PM, Jordan Zimmerman < > jor...@jordanzimmerman.com> wrote: > > > Contrary to recommendations everywhere, my experience is that almost > > everyone is storing source of truth data in ZooKeeper. It’s just too > > tempting. You have a distributed file system just sitting there and it’s > > too easy to use. You get a lot of great features like watches, etc. > People > > are using it to store configuration data, sequence numbers, etc. They are > > storing these things without a good means of reproducing them in case of > a > > catastrophic outage. Further, I’ve heard of several orgs who just back up > > the transaction logs and think they can restore them for DR. Anyway, > that’s > > the genesis of my blog post. > > > > -Jordan > > > > > On Jun 16, 2016, at 2:39 PM, Chris Nauroth <cnaur...@hortonworks.com> > > wrote: > > > > > > Yes, thank you to Jordan for the article! > > > > > > Like Flavio, I personally have never come across the requirement for > > > ZooKeeper backups. I've generally followed the pattern that data > stored > > > in ZooKeeper is truly transient, and applications are built either to > > > tolerate loss of that data or reconstruct it from first principles if > it > > > goes missing. Adding observers in a second data center would give a > > > rudimentary approximation of off-site backup in the case of a data > center > > > disaster, with the usual caveats around propagation delays. > > > > > > Jordan, I'd be curious if you can share more specific details about the > > > kind of data that you have that necessitates a backup/restore. (If > > you're > > > not at liberty to share this, then I can understand that.) It might > > > inform if we have a motivating use case for backup/restore features > > within > > > ZooKeeper, such as some of the transaction log filtering that the > article > > > mentions. > > > > > > --Chris Nauroth > > > > > > > > > > > > > > > On 6/16/16, 1:03 AM, "Flavio Junqueira" <f...@apache.org> wrote: > > > > > >> Great write-up, Jordan, thanks! > > >> > > >> Whether to backup zk data or not is possibly an open topic for this > > >> community, even though we have discussed it at times. My sense has > been > > >> that precisely because of the issues you mention in your post, it is > > >> typically best to have a way to recreate its data upon a disaster > rather > > >> than backup the data. I think there could be three general scenarios > in > > >> which folks would prefer to backup data, but you correct me if these > > >> aren't accurate: > > >> > > >> - The data in zk isn't elsewhere, so it can't be recreated: zk isn't a > > >> regular database, so I'd think it is best not to store data and focus > on > > >> cluster data or metadata. > > >> - There is a just a lot of data and I'd rather have a shorter time to > > >> recover: zk in general shouldn't have that much data in db, but let's > go > > >> with the assumption that for the requirements of the application it > is a > > >> lot. For such a case, it probably depends on whether your application > > can > > >> efficiently and effectively recover from a backup. Basically, as > pointed > > >> out in the post, the data could be inconsistent and cause trouble if > you > > >> don't think about the corner cases. > > >> - The code to recreate the zk metadata for my application is super > > >> complex: if you decide to code against zk, it is good to think whether > > >> reconstructing in the case of a disaster is doable and if it is design > > >> and implement to reconstruct the state upon a disaster. > > >> > > >> Also, we typically provision enough replicas, often replicating across > > >> data centers, to make sure that the data isn't all gone. Having more > > >> replicas does not rule out completely the possibility of a disaster, > but > > >> in such rare cases we resort to the expensive path. > > >> > > >> I personally have never worked with an application that was taking > > >> backups of zk data in prod, so I'm really interested in what others > > >> think. > > >> > > >> -Flavio > > >> > > >> > > >>> On 16 Jun 2016, at 00:43, Jordan Zimmerman < > jor...@jordanzimmerman.com > > > > > >>> wrote: > > >>> > > >>> FYI - I wrote a blog about backing up ZooKeeper: > > >>> > > >>> https://www.elastic.co/blog/zookeeper-backup-a-treatise > > >>> > > >>> -Jordan > > >> > > > > > > > > -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)