Re: [ceph-users] failing to respond to cache pressure

Mark Nelson Tue, 17 May 2016 05:17:01 -0700


On 05/17/2016 01:27 AM, Andrus, Brian Contractor wrote:

Yes, I use the fuse client because the kernel client isn't happy with selinux 
settings.
I have experienced the same symptoms with both clients, however.

Yes, the clients that had nothing were merely mounted and nothing, not even an 
'ls' was done on the filesystem. I did do 'df' on some of the clients, but all 
of them ended up with the message.
"let it clear" for me was to wait until I saw "HEALTH_OK"
When the messages show up, I notice my io write speed line stops showing up 
when I do 'ceph -s'. I am assuming there is little to no writes going on and 
see no progress on the rsync command, so I stop it, unmount cephFS and wait.

As far as layout, I do have a bit of a uniq setup in that my osds are served 
via SRP over infiniband from a DDN system. They are also multipathed. I 
currently only have 4 nodes that I map 4 OSDs to each one. The nodes are pretty 
beefy and I can (and have) increased the inode_max to (temporarily) alleviate 
the cache pressure messages.

That older CephFS setup we tested at ORNL was on DDN SFA10Ks. We had a*ton* of trouble getting good performance, but in the end made it workfairly well. A couple of tips from experience:

- Make sure you have cache mirroring disabled (assuming your hardwarehas it). This was a huge source of problems.- RAID5 LUNs worked better than RAID6 LUNs. Single disk RAID0 LUNswould likely have been better but we didn't have beefy enough servernodes to pull it off. Granted, Inifiniband to the DDN was the limitingfactor in that setup (at least for writes).

We were also hitting a really annoying bug in the kernel around thistime that was greatly hurting CephFS read performance:


http://lwn.net/Articles/517082/

Hopefully you are on a new enough kernel that this isn't an issue though.

Mark


I will be rebuilding the entire filesystem tomorrow with the latest (10.2.1) at 
which point I will be starting the rsync job again and watching what happens.
If there is anything in particular you think I should keep an eye out for, 
please let me know and I will collect data where I can.


Brian Andrus
ITACS/Research Computing
Naval Postgraduate School
Monterey, California
voice: 831-656-6238






-----Original Message-----
From: John Spray [mailto:jsp...@redhat.com]
Sent: Monday, May 16, 2016 7:36 AM
To: Andrus, Brian Contractor
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] failing to respond to cache pressure

On Mon, May 16, 2016 at 3:11 PM, Andrus, Brian Contractor <bdand...@nps.edu> 
wrote:

Both client and server are Jewel 10.2.0


So the fuse client, correct?  If you are up for investigating further, with 
potential client bugs (or performance issues) it is often useful to compare the 
fuse vs. kernel clients (using the most recent kernel you can) to work out 
what's misbehaving.

"All kinds of issues"  include that EVERY node ended up with the cache pressure 
message, even if they had done no access at all.


Hmm, interesting.  I wonder if we do have a bug where inactive clients are being 
"unfairly" asked to clear some cache content but are appearing not to do so because there 
isn't anything much in their cache.  To be clear, when you say "no access at all", you 
mean a client that was mounted and then just sat there (i.e. not even an ls), right?

Are any of the clients holding a lot of files open?  Roughly what is the 
workload doing?

I ended up with some 200 degraded pgs.


That's extremely unlikely to be related to CephFS, other than that CephFS will 
be sending lots of IOs to the OSDs.  You should investigate the health of your 
RADOS cluster (i.e. your OSDs) to work out why you're seeing degraded PGs.

Quite a few with other of the 'standard' errors of suck waiting and such.


It might be useful if you just paste your ceph status so that we can see exactly which 
warnings you're getting.  If you're getting "slow OSD request" type messages 
then that may also be something at the RADOS level that needs investigating.

I ended up disconnecting all mounted clients and waiting about 45 minutes for 
it to clear. I couldn't effectively do any writes until I let it clear.


When you say "let it clear", do you mean the cluster going completely healthy, 
or some particular message clearing?  What happened when you tried to do writes in the 
interim?

I am watching my write speeds and while I can get it to peak at a couple 
hundred MB/s, it is usually below 10 and often below 1.
That isn't the kind of performance I would expect from a parallel file system, 
hence my questioning if it should be used in my environment.


Performance is a whole other question.  Knowing nothing at all about your 
disks, servers, network or workload, I have no clue whether you're seeing 
expected performance or you're seeing the outcome of a bug.

John



Brian Andrus
ITACS/Research Computing
Naval Postgraduate School
Monterey, California
voice: 831-656-6238




-----Original Message-----
From: John Spray [mailto:jsp...@redhat.com]
Sent: Monday, May 16, 2016 2:28 AM
To: Andrus, Brian Contractor
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] failing to respond to cache pressure

On Mon, May 16, 2016 at 5:42 AM, Andrus, Brian Contractor <bdand...@nps.edu> 
wrote:

So this ‘production ready’ CephFS for jewel seems a little not quite….



Currently I have a single system mounting CephFS and merely scp-ing
data to it.

The CephFS mount has 168 TB used, 345 TB / 514 TB avail.



Every so often, I get a HEALTH_WARN message of mds0: Client failing
to respond to cache pressure


What client, what version?

Even if I stop the scp, it will not go away until I umount/remount
the filesystem.



For testing, I had the cephfs mounted on about 50 systems and when
updated started on the, I got all kinds of issues with it all.


All kinds of issues...?  Need more specific bug reports than that to fix things.

John

I figured having updated run on a few systems would be a good ‘see
what happens’ if there is a fair amount of access to it.



So, should I not be even considering using CephFS as a large storage
mount for a compute cluster? Is there a sweet spot for what CephFS
would be good for?





Brian Andrus

ITACS/Research Computing

Naval Postgraduate School

Monterey, California

voice: 831-656-6238






_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] failing to respond to cache pressure

Reply via email to