Re: [ceph-users] Impact of fancy striping

2013-12-13 Thread Robert van Leeuwen
; ceph-users@lists.ceph.com Subject: Re: [ceph-users] Impact of fancy striping Hi James, Robert, Craig, Thank your for those informative answers! You all pointed out interesting issues. I know losing 1 SAS disk in RAID0 means losing all journals, but this is for testing so I do not care. I do

Re: [ceph-users] Impact of fancy striping

2013-12-12 Thread nicolasc
Hi James, Robert, Craig, Thank your for those informative answers! You all pointed out interesting issues. I know losing 1 SAS disk in RAID0 means losing all journals, but this is for testing so I do not care. I do not think sequential write speed to the RAID0 array is the bottleneck (I be

Re: [ceph-users] Impact of fancy striping

2013-12-10 Thread Craig Lewis
A general rule of thumb for separate journal devices is to use 1 SSD for every 4 OSDs. Since SSDs have no seek penalty, 4 partitions are fine. Going much above the 1:4 ratio can saturate the SSD. On your SAS journal device, by creating 9 partitions, you're forcing head seeks for every journa

Re: [ceph-users] Impact of fancy striping

2013-12-06 Thread Robert van Leeuwen
If I understand correctly you have one sas disk as a journal for multiple OSDs. If you do small synchronous writes it will become a IO bottleneck pretty quickly: Due to multiple journals on the same disk it will no longer be sequential writes writes to one journal but 4k writes to x journals mak

Re: [ceph-users] Impact of fancy striping

2013-12-06 Thread James Pearce
Hopefully a Ceph developer will be able to clarify how small writes are journaled? The write-through 'bug' seems to explain small-block performance I've measured in various configurations (I find similar results to you). I've not still tested the patch cited, but it would be *very* interesti

Re: [ceph-users] Impact of fancy striping

2013-12-06 Thread nicolasc
Hi James, Thank you for this clarification. I am quite aware of that, which is why the journals are on SAS disks in RAID0 (SSDs out of scope). I still have trouble believing that fast-but-not-super-fast journals is the main reason for the poor performances observed. Maybe I am mistaken? Bes

Re: [ceph-users] Impact of fancy striping

2013-12-03 Thread James Pearce
I would really appreciate it if someone could: - explain why the journal setup is way more important than striping settings; I'm not sure if it's what you're asking, but any write must be physically written to the journal before the operation is acknowledged. So the overall cluster performa

Re: [ceph-users] Impact of fancy striping

2013-12-03 Thread nicolasc
Hi Kyle, All OSDs are SATA drives in JBOD. The journals are all on a pair of SAS in RAID0. All of those are on a shared backplane with a single RAID controller (8 ports -> 12 disks). I also have a pair of SAS in RAID1 holding the OS, which may be on a different port/data-path. I am going to

Re: [ceph-users] Impact of fancy striping

2013-11-30 Thread Kyle Bader
> This journal problem is a bit of wizardry to me, I even had weird intermittent issues with OSDs not starting because the journal was not found, so please do not hesitate to suggest a better journal setup. You mentioned using SAS for journal, if your OSDs are SATA and a expander is in the data pa

Re: [ceph-users] Impact of fancy striping

2013-11-29 Thread James Pearce
I will try to look into this issue of device cache flush. Do you have a tracker link for the bug? How I wish this were a forum! But here is a link: http://www.spinics.net/lists/ceph-users/msg05966.html And this: https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?

Re: [ceph-users] Impact of fancy striping

2013-11-29 Thread nicolasc
Hi James, Unfortunately, SSDs are out of budget. Currently there are 2 SAS disks in RAID0 on each node, split into 9 partitions: one for each OSD journal on the node. I benchmarked the RAID0 volumes at around 500MB/s in sequential sustained write, so that's not bad — maybe access latency is a

Re: [ceph-users] Impact of fancy striping

2013-11-29 Thread James Pearce
Did you try moving the journals to separate SSDs? It was recently discovered that due to a kernel bug/design, the journal writes are translated into device cache flush commands, so thinking about that I wonder also whether there would be performance improvement in the case that journal and OSD

[ceph-users] Impact of fancy striping

2013-11-29 Thread nicolasc
Hi every one, I am currently testing a use-case with large rbd images (several TB), each containing an XFS filesystem, which I mount on local clients. I have been testing the throughput writing on a single file in the XFS mount, using "dd oflag=direct", for various block sizes. With a defaul