Because Ceph is not perfectly distributed there will be more PGs/objects in
one drive than others. That drive will become a bottleneck for the entire
cluster. The current IO scheduler poses some challenges in this regard.
I've implemented a new scheduler which I've seen much better drive
utilization across the cluster as well as 3-17% performance increase and a
substantial reduction in client performance deviation (all clients are
getting the same amount of performance). Hopefully we will be able to get
that into Jewel.

Robert LeBlanc

Sent from a mobile device please excuse any typos.
On Dec 31, 2015 12:20 AM, "Francois Lafont" <flafdiv...@free.fr> wrote:

> Hi,
>
> On 30/12/2015 10:23, Yan, Zheng wrote:
>
> >> And it seems to me that I can see the bottleneck of my little cluster
> (only
> >> 5 OSD servers with each 4 osds daemons). According to the "atop"
> command, I
> >> can see that some disks (4TB SATA 7200rpm Western digital WD4000FYYZ)
> are
> >> very busy. It's curious because during the bench I have some disks very
> busy
> >> and some other disks not so busy. But I think the reason is that is a
> little
> >> cluster and with just 15 osds (the 5 other osds are full SSD osds
> cephfsmetadata
> >> dedicated), I can have a perfect repartition of data, especially when
> the
> >> bench concern just a specific file of few hundred MB.
> >
> > do these disks have same size and performance? large disks (with
> > higher wights) or slow disks are likely busy.
>
> The disks are exactly the same model with the same size (4TB SATA 7200rpm
> Western digital WD4000FYYZ). I'm not completely sure but it seems to me
> that in a specific node I have a disk which is a little slower than the
> others (maybe minus ~50-75 iops) and it seems to me that it's the busiest
> disk during a bench.
>
> Is it possible (or frequent) to have difference of perfs between exactly
> same model of disks?
>
> >> That being said, when you talk about "using buffered IO" I'm not sure to
> >> understand the option of fio which is concerns by that. Is it the
> --buffered
> >> option ? Because with this option I have noticed no change concerning
> iops.
> >> Personally, I was able to increase global iops only with the --numjobs
> option.
> >>
> >
> > I didn't make it clear. I actually meant buffered write (add
> > --rwmixread=0 option to fio) .
>
> But with fio if I set "--readwrite=randrw --rwmixread=0", it's completely
> equivalent to just set "--readwrite=randwrite", no?
>
> > In your test case, writes mix with reads.
>
> Yes indeed.
>
> > read is synchronous when cache miss.
>
> You mean that I have SYNC IO for reading if I set --direct=0, is it
> correct?
> Is it valid for any file system or just for cephfs?
>
> Regards.
>
> --
> François Lafont
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to