On Wed, May 27, 2015 at 6:57 PM, Christian Balzer <ch...@gol.com> wrote: > On Wed, 27 May 2015 14:06:43 -0700 Gregory Farnum wrote: > >> On Tue, May 19, 2015 at 7:35 PM, John Peebles <johnp...@gmail.com> wrote: >> > Hi, >> > >> > I'm hoping for advice on whether Ceph could be used in an atypical use >> > case. Specifically, I have about ~20TB of files that need replicated >> > to 2 different sites. Each site has its own internal gigabit ethernet >> > network. However, the connection between the sites is only ~320kbits. >> > I'm trying to find a solution where I set up one server at each site >> > which has its own full copy of the data, and when changes are made >> > they are synced between the sites. >> > >> > At first, this might seem hopeless because of the low bandwidth. >> > However, the long-term average rate of writes to the files is actually >> > substantially smaller than the available bandwidth, so this might not >> > actually be a problem. >> > >> > Off hand, does it seem like Ceph could yield decent performance in >> > this use case? In particular, I had a few questions: >> > >> > (1) Will clients at each site automatically prefer connecting to a >> > site-local Ceph node for reading files or will they try and pull files >> > over the slow site-to-site connection even when they are available >> > site-locally? If preferring a site-local node doesn't happen >> > automatically, can it be forced manually? >> > (2) When doing blocking IO to things backed by Ceph, will it block >> > until the data has been replicated? In other words, will my write >> > speeds be effectively be limited to 320kbits even if I am writing to a >> > site-local node? >> >> What kind of storage system are you looking for? >> In the raw RADOS sense, this is pretty unsuitable. There's a >> read-from-replica feature you can enable under specific circumstances, >> but that's basically only for snapshotted RBD "parent" images. And >> everything is replicated synchronously. >> > It would be nice (I think this has come up before) if somewhere down the > road Ceph (as in RADOS) would acquire this capability. > Most likely/preferably with something akin to the DRBD proxy, a > smarty-pants box (cluster ^o^) that ACKs things as they come in (I know, I > know, data consistency) and streams stuff in an optimized fashion over the > slow link.
It's never going to work quite like that, but we are (oh so slowly) working on asynchronous replication based on time-based snapshots. ...I've just realized the student project we had working on an algorithm for that doesn't have results online. I'll try and make that happen. In the meantime: We've got some clock sync algorithms to get consistent snapshots across a RADOS cluster. Work hasn't started yet on actually storing and moving those snapshots between geographically disparate clusters, but being based on snapshots it will deal well with frequently-overwritten data, and although it won't be streaming in real time it will be self-consistent, which means you can do failover if necessary. Two-way active-active stuff at the RADOS layer is going to require something entirely different and I don't see that ever being compressible. We've talked vaguely about creating a PaxosPG (my personal favorite is one based on ePaxos) which would allow local reads and create consistent updates, but you'll still have pretty bad latencies and need a ton of bandwidth. -Greg _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com