Hi,
       My 0.02 :

       > Secondly, I'm unclear about how OSDs use the journal. It appears they 
write to the journal (in all cases, can't be turned

>off), ack to the client and then read the journal later to write to backing 
>storage. Is that correct?


I would like to say NO, the journal will NEVER BE READ except recoverying( 
replay the journal in that case).
There are two configurations named 'filestore journal parallel' and  'filestore 
journal writeahead ",
with "journal parallel", the data will be write to both journal and OSD in 
parallel, either journal or OSD finished the write, ceph will ack to the 
client.This is ONLY for BTRFS ,since BTRFS has bulid-in mechanism which can 
help to keep consistency/
With "journal writeahead",the data first write to journal ,ack to the client, 
and write to OSD, note that, the data always keep in memory before it write to 
both OSD and journal,so the write is directly from memory to OSDs. This mode 
suite for XFS and EXT4.
The term "wirte to journal " means the data is physically write into journal, 
but not for "write to OSD", ceph open the file in OSD withOUT O_DIRECT so the 
write will goes to pagecache (kernel cache).

>On a similar note, I am using XFS on the OSDs which also journals, does this 
>affect performance in any way?
Again ,NO, journal in XFS only journal File system related metadata, it never 
journal the data extend, so you can not rely on the XFS journal.

         > Can you share any information on the SSD you are using, is it PCIe 
connected?
       Depends, if you use HDD as your OSD data disk,  a SATA/SAS SSD is enough 
for you. Instead of Intel 520, I would like to suggest you use the Intel 
DCS3700 since it provide better durability for write. Since a DCS3700 can 
provide 400~500MB/s for write and HDD can only have ~100MB/s ,it's safe for a 
DCS3700 to provide journal for 4~5 HDDs.
        And , if you have some insight/assumption on your workload, say " I 
don't care throughtput at all , all my workload doing random access". With such 
assumption , you can have very high SSD:HDD ratio, 8:1 or even 10:1 will also 
be fine
        But if you want to use SSD as data disk, you may need to find something 
really really fast to journal the SSD. High-end PCIE-SSD or NVRAM may be the 
choice.

                                                                                
              Xiaoxi


From: 
ceph-users-boun...@lists.ceph.com<mailto:ceph-users-boun...@lists.ceph.com> 
[mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Charles 'Boyo
Sent: Monday, July 22, 2013 5:04 AM
To: Mikaël Cluseau
Cc: ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
Subject: Re: [ceph-users] SSD recommendations for OSD journals

Thank you for the information Mikael.
Counting on the kernel's cache, it appears I will be best served purchasing 
write-optimized SSDs?
Can you share any information on the SSD you are using, is it PCIe connected?
Another question, since the intention of this storage cluster is relatively 
cheap storage on commodity hardware, what's the balance between cheap SSDs and 
reliability since journal failure might result in data loss or will such an 
event just 'down' the affected OSDs? On a similar note, I am using XFS on the 
OSDs which also journals, does this affect performance in any way?

Charles

On Sun, Jul 21, 2013 at 9:27 PM, Mikaël Cluseau 
<mclus...@isi.nc<mailto:mclus...@isi.nc>> wrote:
Hi,


On 07/22/13 06:05, Charles 'Boyo wrote:

Secondly, I'm unclear about how OSDs use the journal. It appears they write to 
the journal (in all cases, can't be turned off), ack to the client and then 
read the journal later to write to backing storage. Is that correct?

Yes



I'm coming from enterprise ZFS with an SSD is also used for write journalling 
but data flushes are from the disk cache in memory, hence the use of write 
optimized SSDs. Why can't Ceph be configured to write from RAM instead of 
reading the journal on flush?

>From my stats I can tell that the journal flushes use the kernel's cache and 
>do not hit the SSD. Here, sdd is my journal SSD :



_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to