I have a small cluster of 4 machines and quite a few drives. After about 2 - 3 weeks cephfs fails. It's not properly mounted anymore in /mnt/cephfs, which of course causes the VM's running to fail too.
In /var/log/syslog I have "/mnt/cephfs: File exists at /usr/share/perl5/PVE/Storage/DirPlugin.pm line 52" repeatedly. There doesn't seem to be anything wrong with ceph at the time. # ceph -s cluster 40f26838-4760-4b10-a65c-b9c1cd671f2f health HEALTH_WARN clock skew detected on mon.s1 monmap e2: 2 mons at {h1=192.168.121.30:6789/0,s1=192.168.121.33:6789/0}, election epoch 312, quorum 0,1 h1,s1 mdsmap e401: 1/1/1 up {0=s3=up:active}, 1 up:standby osdmap e5577: 19 osds: 19 up, 19 in pgmap v11191838: 384 pgs, 3 pools, 774 GB data, 455 kobjects 1636 GB used, 9713 GB / 11358 GB avail 384 active+clean client io 12240 kB/s rd, 1524 B/s wr, 24 op/s # ceph osd tree # id weight type name up/down reweight -1 11.13 root default -2 8.14 host h1 1 0.9 osd.1 up 1 3 0.9 osd.3 up 1 4 0.9 osd.4 up 1 5 0.68 osd.5 up 1 6 0.68 osd.6 up 1 7 0.68 osd.7 up 1 8 0.68 osd.8 up 1 9 0.68 osd.9 up 1 10 0.68 osd.10 up 1 11 0.68 osd.11 up 1 12 0.68 osd.12 up 1 -3 0.45 host s3 2 0.45 osd.2 up 1 -4 0.9 host s2 13 0.9 osd.13 up 1 -5 1.64 host s1 14 0.29 osd.14 up 1 0 0.27 osd.0 up 1 15 0.27 osd.15 up 1 16 0.27 osd.16 up 1 17 0.27 osd.17 up 1 18 0.27 osd.18 up 1 When I "umount -l /mnt/cephfs" and then "mount -a" after that, the the ceph volume is loaded again. I can restart the VM's and all seems well. I can't find errors pertaining to cephfs in the the other logs either. System information: Linux s1 2.6.32-34-pve #1 SMP Fri Dec 19 07:42:04 CET 2014 x86_64 GNU/Linux I can't upgrade to kernel v3.13 since I'm using containers. Of course, I want to prevent this from happening! How do I troubleshoot that? What is causing this? regards *Roland Giesler*
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com