Hi Ilya,
Thanks for your explanation. This makes sense. Will you make max_segments to be 
configurable? Could you pls point me the fix you have made? We might help to 
test it.
Thanks.
David Zhang 

> Date: Fri, 26 Jun 2015 18:21:55 +0300
> Subject: Re: [ceph-users] krbd splitting large IO's into smaller IO's
> From: idryo...@gmail.com
> To: zhangz.da...@outlook.com
> CC: ceph-users@lists.ceph.com
> 
> On Fri, Jun 26, 2015 at 3:17 PM, Z Zhang <zhangz.da...@outlook.com> wrote:
> > Hi Ilya,
> >
> > I am seeing your recent email talking about krbd splitting large IO's into
> > smaller IO's, see below link.
> >
> > https://www.mail-archive.com/ceph-users@lists.ceph.com/msg20587.html
> >
> > I just tried it on my ceph cluster using kernel 3.10.0-1. I adjust both
> > max_sectors_kb and max_hw_sectors_kb of rbd device to 4096.
> >
> > Use fio with 4M block size for read:
> >
> > Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz
> > avgqu-sz   await r_await w_await  svctm  %util
> > rbd3             81.00     0.00  135.00    0.00   108.00     0.00  1638.40
> > 2.72   20.15   20.15    0.00   7.41 100.00
> >
> >
> > Use fio with 1M or 2M block size for read:
> >
> > Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz
> > avgqu-sz   await r_await w_await  svctm  %util
> > rbd3              0.00     0.00  213.00    0.00   106.50     0.00  1024.00
> > 2.56   12.02   12.02    0.00   4.69 100.00
> >
> >
> > Use fio with 4M block size for write:
> >
> > Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz
> > avgqu-sz   await r_await w_await  svctm  %util
> > rbd3              0.00    40.00    0.00   40.00     0.00    40.00  2048.00
> > 2.87   70.90    0.00   70.90  24.90  99.60
> >
> >
> > Use fio with 1M or 2M block size for write:
> >
> > Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz
> > avgqu-sz   await r_await w_await  svctm  %util
> > rbd3              0.00     0.00    0.00   80.00     0.00    40.00  1024.00
> > 3.55   48.20    0.00   48.20  12.50 100.00
> >
> >
> > So why the IO size here is far less than 4096 (If using default value 512,
> > all the IO size is 1024)? Is there some other parameters need to adjust, or
> > is it about this kernel version?
> 
> It's about this kernel version.  Assuming you are doing direct I/Os
> with fio, setting max_sectors_kb to 4096 is really the only thing you
> can do, and that's enough to *sometimes* see 8192 sector (i.e. 4M) I/Os.
> The problem is the max_segments value, which in 3.10 is 128 and which
> you cannot adjust via sysfs.
> 
> It all comes down to a memory allocator.  To get a 4M I/O, the total
> number of segments (physically contiguous chunks of memory) in the
> 8 bios (8*512k = 4M) that need to be merged has to be <= 128.  When you
> are allocated such nice and contiguous bios, you get 4M I/Os.  In other
> cases you don't.
> 
> This will be fixed in 4.2, along with a bunch of other things.  This
> particular max_segment fix is a one liner, so we will probably backport
> it to older kernels, including 3.10.
> 
> Thanks,
> 
>                 Ilya
                                          
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to