Because Ceph is not perfectly distributed there will be more PGs/objects in one drive than others. That drive will become a bottleneck for the entire cluster. The current IO scheduler poses some challenges in this regard. I've implemented a new scheduler which I've seen much better drive utilization across the cluster as well as 3-17% performance increase and a substantial reduction in client performance deviation (all clients are getting the same amount of performance). Hopefully we will be able to get that into Jewel.
Robert LeBlanc Sent from a mobile device please excuse any typos. On Dec 31, 2015 12:20 AM, "Francois Lafont" <flafdiv...@free.fr> wrote: > Hi, > > On 30/12/2015 10:23, Yan, Zheng wrote: > > >> And it seems to me that I can see the bottleneck of my little cluster > (only > >> 5 OSD servers with each 4 osds daemons). According to the "atop" > command, I > >> can see that some disks (4TB SATA 7200rpm Western digital WD4000FYYZ) > are > >> very busy. It's curious because during the bench I have some disks very > busy > >> and some other disks not so busy. But I think the reason is that is a > little > >> cluster and with just 15 osds (the 5 other osds are full SSD osds > cephfsmetadata > >> dedicated), I can have a perfect repartition of data, especially when > the > >> bench concern just a specific file of few hundred MB. > > > > do these disks have same size and performance? large disks (with > > higher wights) or slow disks are likely busy. > > The disks are exactly the same model with the same size (4TB SATA 7200rpm > Western digital WD4000FYYZ). I'm not completely sure but it seems to me > that in a specific node I have a disk which is a little slower than the > others (maybe minus ~50-75 iops) and it seems to me that it's the busiest > disk during a bench. > > Is it possible (or frequent) to have difference of perfs between exactly > same model of disks? > > >> That being said, when you talk about "using buffered IO" I'm not sure to > >> understand the option of fio which is concerns by that. Is it the > --buffered > >> option ? Because with this option I have noticed no change concerning > iops. > >> Personally, I was able to increase global iops only with the --numjobs > option. > >> > > > > I didn't make it clear. I actually meant buffered write (add > > --rwmixread=0 option to fio) . > > But with fio if I set "--readwrite=randrw --rwmixread=0", it's completely > equivalent to just set "--readwrite=randwrite", no? > > > In your test case, writes mix with reads. > > Yes indeed. > > > read is synchronous when cache miss. > > You mean that I have SYNC IO for reading if I set --direct=0, is it > correct? > Is it valid for any file system or just for cephfs? > > Regards. > > -- > François Lafont > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com