On 15 June 2015 at 13:09, Gregory Farnum <g...@gregs42.com> wrote: > On Mon, Jun 15, 2015 at 4:03 AM, Roland Giesler <rol...@giesler.za.net> > wrote: > > I have a small cluster of 4 machines and quite a few drives. After > about 2 > > - 3 weeks cephfs fails. It's not properly mounted anymore in > /mnt/cephfs, > > which of course causes the VM's running to fail too. > > > > In /var/log/syslog I have "/mnt/cephfs: File exists at > > /usr/share/perl5/PVE/Storage/DirPlugin.pm line 52" repeatedly. > > > > There doesn't seem to be anything wrong with ceph at the time. > > > > # ceph -s > > cluster 40f26838-4760-4b10-a65c-b9c1cd671f2f > > health HEALTH_WARN clock skew detected on mon.s1 > > monmap e2: 2 mons at > > {h1=192.168.121.30:6789/0,s1=192.168.121.33:6789/0}, election epoch 312, > > quorum 0,1 h1,s1 > > mdsmap e401: 1/1/1 up {0=s3=up:active}, 1 up:standby > > osdmap e5577: 19 osds: 19 up, 19 in > > pgmap v11191838: 384 pgs, 3 pools, 774 GB data, 455 kobjects > > 1636 GB used, 9713 GB / 11358 GB avail > > 384 active+clean > > client io 12240 kB/s rd, 1524 B/s wr, 24 op/s > > # ceph osd tree > > # id weight type name up/down reweight > > -1 11.13 root default > > -2 8.14 host h1 > > 1 0.9 osd.1 up 1 > > 3 0.9 osd.3 up 1 > > 4 0.9 osd.4 up 1 > > 5 0.68 osd.5 up 1 > > 6 0.68 osd.6 up 1 > > 7 0.68 osd.7 up 1 > > 8 0.68 osd.8 up 1 > > 9 0.68 osd.9 up 1 > > 10 0.68 osd.10 up 1 > > 11 0.68 osd.11 up 1 > > 12 0.68 osd.12 up 1 > > -3 0.45 host s3 > > 2 0.45 osd.2 up 1 > > -4 0.9 host s2 > > 13 0.9 osd.13 up 1 > > -5 1.64 host s1 > > 14 0.29 osd.14 up 1 > > 0 0.27 osd.0 up 1 > > 15 0.27 osd.15 up 1 > > 16 0.27 osd.16 up 1 > > 17 0.27 osd.17 up 1 > > 18 0.27 osd.18 up 1 > > > > When I "umount -l /mnt/cephfs" and then "mount -a" after that, the the > ceph > > volume is loaded again. I can restart the VM's and all seems well. > > > > I can't find errors pertaining to cephfs in the the other logs either. > > > > System information: > > > > Linux s1 2.6.32-34-pve #1 SMP Fri Dec 19 07:42:04 CET 2014 x86_64 > GNU/Linux > > I'm not sure what version of Linux this really is (I assume it's a > vendor kernel of some kind!), but it's definitely an old one! CephFS > sees pretty continuous improvements to stability and it could be any > number of resolved bugs. >
This is the stock standard installation of Proxmox with CephFS. > If you can't upgrade the kernel, you might try out the ceph-fuse > client instead as you can run a much newer and more up-to-date version > of it, even on the old kernel. I'm under the impression that CephFS is the filesystem implimented by ceph-fuse. Is it not? > Other than that, can you include more > information about exactly what you mean when saying CephFS unmounts > itself? > Everything runs fine for weeks. Then suddenly a user reports that a VM is not functioning anymore. On investigation is transpires than CephFS is not mounted anymore and the error I reported is logged. I can't see anything else wrong at this stage. ceph is running, the osd are all up. thanks again Roland > -Greg > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com