On Wed, May 27, 2015 at 6:57 PM, Christian Balzer <ch...@gol.com> wrote:
> On Wed, 27 May 2015 14:06:43 -0700 Gregory Farnum wrote:
>
>> On Tue, May 19, 2015 at 7:35 PM, John Peebles <johnp...@gmail.com> wrote:
>> > Hi,
>> >
>> > I'm hoping for advice on whether Ceph could be used in an atypical use
>> > case. Specifically, I have about ~20TB of files that need replicated
>> > to 2 different sites. Each site has its own internal gigabit ethernet
>> > network. However, the connection between the sites is only ~320kbits.
>> > I'm trying to find a solution where I set up one server at each site
>> > which has its own full copy of the data, and when changes are made
>> > they are synced between the sites.
>> >
>> > At first, this might seem hopeless because of the low bandwidth.
>> > However, the long-term average rate of writes to the files is actually
>> > substantially smaller than the available bandwidth, so this might not
>> > actually be a problem.
>> >
>> > Off hand, does it seem like Ceph could yield decent performance in
>> > this use case?  In particular, I had a few questions:
>> >
>> > (1) Will clients at each site automatically prefer connecting to a
>> > site-local Ceph node for reading files or will they try and pull files
>> > over the slow site-to-site connection even when they are available
>> > site-locally? If preferring a site-local node doesn't happen
>> > automatically, can it be forced manually?
>> > (2) When doing blocking IO to things backed by Ceph, will it block
>> > until the data has been replicated? In other words, will my write
>> > speeds be effectively be limited to 320kbits even if I am writing to a
>> > site-local node?
>>
>> What kind of storage system are you looking for?
>> In the raw RADOS sense, this is pretty unsuitable. There's a
>> read-from-replica feature you can enable under specific circumstances,
>> but that's basically only for snapshotted RBD "parent" images. And
>> everything is replicated synchronously.
>>
> It would be nice (I think this has come up before) if somewhere down the
> road Ceph (as in RADOS) would acquire this capability.
> Most likely/preferably with something akin to the DRBD proxy, a
> smarty-pants box (cluster ^o^) that ACKs things as they come in (I know, I
> know, data consistency) and streams stuff in an optimized fashion over the
> slow link.

It's never going to work quite like that, but we are (oh so slowly)
working on asynchronous replication based on time-based snapshots.
...I've just realized the student project we had working on an
algorithm for that doesn't have results online. I'll try and make that
happen. In the meantime:

We've got some clock sync algorithms to get consistent snapshots
across a RADOS cluster. Work hasn't started yet on actually storing
and moving those snapshots between geographically disparate clusters,
but being based on snapshots it will deal well with
frequently-overwritten data, and although it won't be streaming in
real time it will be self-consistent, which means you can do failover
if necessary.

Two-way active-active stuff at the RADOS layer is going to require
something entirely different and I don't see that ever being
compressible. We've talked vaguely about creating a PaxosPG (my
personal favorite is one based on ePaxos) which would allow local
reads and create consistent updates, but you'll still have pretty bad
latencies and need a ton of bandwidth.
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to