I have a small cluster of 4 machines and quite a few drives.  After about 2
- 3 weeks cephfs fails.  It's not properly mounted anymore in /mnt/cephfs,
which of course causes the VM's running to fail too.

In /var/log/syslog I have "/mnt/cephfs: File exists at
/usr/share/perl5/PVE/Storage/DirPlugin.pm line 52" repeatedly.

​There doesn't seem to be anything wrong with ceph at the time.

# ceph -s
    cluster 40f26838-4760-4b10-a65c-b9c1cd671f2f
     health HEALTH_WARN clock skew detected on mon.s1
     monmap e2: 2 mons at {h1=192.168.121.30:6789/0,s1=192.168.121.33:6789/0},
election epoch 312, quorum 0,1 h1,s1
     mdsmap e401: 1/1/1 up {0=s3=up:active}, 1 up:standby
     osdmap e5577: 19 osds: 19 up, 19 in
      pgmap v11191838: 384 pgs, 3 pools, 774 GB data, 455 kobjects
            1636 GB used, 9713 GB / 11358 GB avail
                 384 active+clean
  client io 12240 kB/s rd, 1524 B/s wr, 24 op/s
​
​# ceph osd tree
# id  weight   type name    up/down  reweight
-1    11.13    root default
-2     8.14        host h1
 1     0.9             osd.1    up    1
 3     0.9             osd.3    up    1
 4     0.9             osd.4    up    1
 5     0.68            osd.5    up    1
 6     0.68            osd.6    up    1
 7     0.68            osd.7    up    1
 8     0.68            osd.8    up    1
 9     0.68            osd.9    up    1
10     0.68            osd.10   up    1
11     0.68            osd.11   up    1
12     0.68            osd.12   up    1
-3     0.45        host s3
 2     0.45            osd.2    up    1
-4     0.9         host s2
13     0.9             osd.13   up    1
-5     1.64        host s1
14     0.29            osd.14   up    1
 0     0.27            osd.0    up    1
15     0.27            osd.15   up    1
16     0.27            osd.16   up    1
17     0.27            osd.17   up    1
18     0.27            osd.18   up    1

​When I "umount -l /mnt/cephfs" and then "mount -a" after that, the the
ceph volume is loaded again.  I can restart the VM's and all seems well.

I can't find errors pertaining to cephfs in the the other logs either.

System information:

Linux s1 2.6.32-34-pve #1 SMP Fri Dec 19 07:42:04 CET 2014 x86_64 GNU/Linux

I can't upgrade to kernel v3.13 since I'm using containers.

Of course, I want to prevent this from happening!  How do I troubleshoot
that?  What is causing this?​

​regards


*Roland Giesler*
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to