Ilya, 

I will try doing that once again tonight as this is a production cluster and 
when dds trigger that dmesg error the cluster's io becomes very bad and I have 
to reboot the server to get things on track. Most of my vms start having 70-90% 
iowait until that server is rebooted. 

I've actually checked what you've asked last time i've ran the test. 

When I do 4 dds concurrently nothing aprears in the dmesg output. No messages 
at all. 

The kern.log file that i've sent last time is what I got about a minute after 
i've started 8 dds. I've pasted the full output. The 8 dds did actually 
complete, but it took a rather long time. I was getting about 6MB/s per dd 
process compared to around 70MB/s per dd process when 4 dds were running. Do 
you still want me to run this or is the information i've provided enough? 

Cheers 

Andrei 

----- Original Message -----

> From: "Ilya Dryomov" <ilya.dryo...@inktank.com>
> To: "Andrei Mikhailovsky" <and...@arhont.com>
> Cc: "ceph-users" <ceph-users@lists.ceph.com>, "Gregory Farnum"
> <g...@gregs42.com>
> Sent: Monday, 1 December, 2014 8:22:08 AM
> Subject: Re: [ceph-users] Giant + nfs over cephfs hang tasks

> On Mon, Dec 1, 2014 at 12:30 AM, Andrei Mikhailovsky
> <and...@arhont.com> wrote:
> >
> > Ilya, further to your email I have switched back to the 3.18 kernel
> > that
> > you've sent and I got similar looking dmesg output as I had on the
> > 3.17
> > kernel. Please find it attached for your reference. As before, this
> > is the
> > command I've ran on the client:
> >
> >
> > time dd if=/dev/zero of=4G00 bs=4M count=5K oflag=direct & time dd
> > if=/dev/zero of=4G11 bs=4M count=5K oflag=direct &time dd
> > if=/dev/zero
> > of=4G22 bs=4M count=5K oflag=direct &time dd if=/dev/zero of=4G33
> > bs=4M
> > count=5K oflag=direct & time dd if=/dev/zero of=4G44 bs=4M count=5K
> > oflag=direct & time dd if=/dev/zero of=4G55 bs=4M count=5K
> > oflag=direct
> > &time dd if=/dev/zero of=4G66 bs=4M count=5K oflag=direct &time dd
> > if=/dev/zero of=4G77 bs=4M count=5K oflag=direct &

> Can you run that command again - on 3.18 kernel, to completion - and
> paste

> - the entire dmesg
> - "time" results for each dd

> ?

> Compare those to your results with four dds (or any other number
> which
> doesn't trigger page allocation failures).

> Thanks,

> Ilya
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to