Folks: Our group has been approached by a customer who asked if we could backup/archive 50 petabytes of data. And yes, they are serious.
We've begun building questions for the customer, but as this is roughly 1000 times the current amount of data we backup, we are on unfamiliar turf here. At a high level, here are some of the questions we are asking: 1) Is the 50 Petabytes an initial, or envisioned data size? If envisioned, how big is the initial data load and how fast will it grow? 2) What makes up the data: databases, video/audio files, other? (subtext: how many objects are involved? What are the opportunities to compress/deduplicate?) 3) how is the data distributed - over a number of systems or from a supercluster? 4) Is the data static, or changing slowly or changing rapidly? (subtext: is it a backup or archive scenario) 5) What are the security requirments? 6) What are the restore (aka RTO) requirements? We are planning on approaching vendors to get some sense of the probable data center requirements (cooling, power, footprint). If anyone in the community has experience with managing petatybes of backup data, we'd appreciate any feedback we could incorporate. Thanks in advance!