This was very helpful -thanks.  However I'm still trying to reconcile this
with something that Sage mentioned a while back on a similar topic.
Apparently you can disable the journal if you're using  btrfs.  Is that
possible because btrfs takes care of things like atomic object writes and
updates to the osd metadata ?


-----Original Message-----
From: ceph-users-boun...@lists.ceph.com [mailto:
ceph-users-boun...@lists.ceph.com] On Behalf Of Sage Weil
Sent: Thursday, July 11, 2013 8:39 PM
To: Mark Nelson
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Turning off ceph journaling with xfs ?



Note that you *can* disable teh journal if you use btrfs, but your write
latency will tend to be pretty terrible.  This is only viable for
bulk-storage use cases where throughput trumps all and latency is not an
issue at all (it may be seconds).



We are planning on eliminating the double-write for at least large writes
when using btrfs by cloning data out of the journal and into the target
file.  This is not a hugely complex task (although it is non-trivial) but
it hasn't made it to the top of the priority list yet.



sage


On Mon, Aug 26, 2013 at 4:05 PM, Samuel Just <sam.j...@inktank.com> wrote:

> ceph-osd builds a transactional interface on top of the usual posix
> operations so that we can do things like atomically perform an object
> write and update the osd metadata.  The current implementation
> requires our own journal and some metadata ordering (which is provided
> by the backing filesystem's own journal) to implement our own atomic
> operations.  It's true that in some cases you might be able to get
> away with having the client replay the operation (which we do anyway
> for other reasons), but that wouldn't be enough to ensure consistency
> of the filesystem's own internal structures.  It also wouldn't be
> enough to ensure that the OSD's internal structure remain consistent
> in the case of a crash.  Also, if the client is unavailable to do the
> replay, you'd have a problem.
>
> In summary, it's actually really hard to to detect partial/corrupted
> writes after a crash without journaling of some form.
> -Sam
>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to