Re: [ceph-users] Ceph data consistency

Chen, Xiaoxi Tue, 30 Dec 2014 00:41:34 -0800

Hi,
   First of all, the data is safe since it's persistent in journal, if error 
occurs on OSD data partition, replay the journal will get the data back.
 
   And,  there is a wbthrottle there, you can config how much data(ios, bytes, 
inodes) you wants to remain in memory. A background thread will start to flush 
data into disk when  any of the value exceeds 
"filestore_wbthrottle_[xfs,btrfs]_[bytes,ios,inodes]_start_flusher",  and will 
block the filestore op thread when hard limit exceeds. You could set these 
value to something smaller if you still not feeling comfortable:)

                                                                Xiaoxi

-----Original Message-----
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Pawe? 
Sadowski
Sent: Tuesday, December 30, 2014 4:10 PM
To: ceph-users
Subject: [ceph-users] Ceph data consistency

Hi,

On our Ceph cluster from time to time we have some inconsistent PGs (after 
deep-scrub). We have some issues with disk/sata cables/lsi controller causing 
IO errors from time to time (but that's not the point in this case).

When IO error occurs on OSD journal partition everything works as is should -> 
OSD is crashed and that's ok - Ceph will handle that.

But when IO error occurs on OSD data partition during journal flush OSD 
continue to work. After calling *writev* (in buffer::list::write_fd) OSD does 
check return code from this call but does NOT verify if write has been 
successful to disk (data are still only in memory and there is no fsync). That 
way OSD thinks that data has been stored on disk but it might be discarded 
(during sync dirty page will be reclaimed and you'll see "lost page write due 
to I/O error" in dmesg).

Since there is no checksumming of data I just wanted to make sure that this is 
by design. Maybe there is a way to tell OSD to call fsync after write and have 
data consistent?

--
Cheers,
PS
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph data consistency

Reply via email to