Hey Zheng,

I've been in the #ceph irc channel all day about this.

We did that, we set max_mds back to 1, but, instead of stopping mds 1, we
did a "ceph mds rmfailed 1". Running ceph mds stop 1 produces:

# ceph mds stop 1
Error EEXIST: mds.1 not active (???)


Our mds in a state of resolve, and will not come back.

We then tried to roll back the mds map to the epoch just before we set
max_mds to 2, but that command crashes all but one of our monitors and
never completes

We do not know what to do at this point, if there was a way to get the mds
back up just so we could back it up, we're okay with rebuilding. We just
need the data back.

Mike C



On Thu, Jan 14, 2016 at 3:33 PM, Yan, Zheng <uker...@gmail.com> wrote:

> On Fri, Jan 15, 2016 at 3:28 AM, Mike Carlson <m...@bayphoto.com> wrote:
> > Thank you for the reply Zheng
> >
> > We tried set mds bal frag to true, but the end result was less than
> > desirable. All nfs and smb clients could no longer browse the share, they
> > would hang on a directory with anything more than a few hundred files.
> >
> > We then tried to back out the active/active mds change, no luck, stopping
> > one of the mds's (mds 1) prevented us from mounting the cephfs filesystem
> >
> > So we failed and removed the secondary MDS, and now our primary mds is
> stuck
> > in a "resovle" state:
> >
> > # ceph -s
> >     cluster cabd1728-2eca-4e18-a581-b4885364e5a4
> >      health HEALTH_WARN
> >             clock skew detected on mon.lts-mon
> >             mds cluster is degraded
> >             Monitor clock skew detected
> >      monmap e1: 4 mons at
> > {lts-mon=
> 10.5.68.236:6789/0,lts-osd1=10.5.68.229:6789/0,lts-osd2=10.5.68.230:6789/0,lts-osd3=10.5.68.203:6789/0
> }
> >             election epoch 1282, quorum 0,1,2,3
> > lts-osd3,lts-osd1,lts-osd2,lts-mon
> >      mdsmap e7892: 1/2/1 up {0=lts-mon=up:resolve}
> >      osdmap e10183: 102 osds: 101 up, 101 in
> >       pgmap v6714309: 4192 pgs, 7 pools, 31748 GB data, 23494 kobjects
> >             96188 GB used, 273 TB / 367 TB avail
> >                 4188 active+clean
> >                    4 active+clean+scrubbing+deep
> >
> > Now we are really down for the count. We cannot get our MDS back up in an
> > active state and none of our data is accessible.
>
> you can't remove active mds this way, you need to:
>
> 1. make sure all active mds are running
> 2. run 'ceph mds set max_mds 1'
> 3. run 'ceph mds stop 1'
>
> step 3 changes the second mds's state to stopping. Wait a while, the
> second mds will go to standby state. Occasionally, the second MDS can
> stuck in stopping state. If it happens, restart all MDS, then repeat
> step 3.
>
> Regards
> Yan, Zheng
>
>
>
> >
> >
> > On Wed, Jan 13, 2016 at 7:05 PM, Yan, Zheng <uker...@gmail.com> wrote:
> >>
> >> On Thu, Jan 14, 2016 at 3:37 AM, Mike Carlson <m...@bayphoto.com>
> wrote:
> >> > Hey Greg,
> >> >
> >> > The inconsistent view is only over nfs/smb on top of our /ceph mount.
> >> >
> >> > When I look directly on the /ceph mount (which is using the cephfs
> >> > kernel
> >> > module), everything looks fine
> >> >
> >> > It is possible that this issue just went unnoticed, and it only being
> a
> >> > infernalis problem is just a red herring. With that, it is oddly
> >> > coincidental that we just started seeing issues.
> >>
> >> This seems like seekdir bugs in kernel client, could you try 4.0+
> kernel.
> >>
> >> Besides, do you enable "mds bal frag" for ceph-mds
> >>
> >>
> >> Regards
> >> Yan, Zheng
> >>
> >>
> >>
> >> >
> >> > On Wed, Jan 13, 2016 at 11:30 AM, Gregory Farnum <gfar...@redhat.com>
> >> > wrote:
> >> >>
> >> >> On Wed, Jan 13, 2016 at 11:24 AM, Mike Carlson <m...@bayphoto.com>
> >> >> wrote:
> >> >> > Hello.
> >> >> >
> >> >> > Since we upgraded to Infernalis last, we have noticed a severe
> >> >> > problem
> >> >> > with
> >> >> > cephfs when we have it shared over Samba and NFS
> >> >> >
> >> >> > Directory listings are showing an inconsistent view of the files:
> >> >> >
> >> >> >
> >> >> > $ ls /lts-mon/BD/xmlExport/ | wc -l
> >> >> >      100
> >> >> > $ sudo umount /lts-mon
> >> >> > $ sudo mount /lts-mon
> >> >> > $ ls /lts-mon/BD/xmlExport/ | wc -l
> >> >> >     3507
> >> >> >
> >> >> >
> >> >> > The only work around I have found is un-mounting and re-mounting
> the
> >> >> > nfs
> >> >> > share, that seems to clear it up
> >> >> > Same with samba, I'd post it here but its thousands of lines. I can
> >> >> > add
> >> >> > additional details on request.
> >> >> >
> >> >> > This happened after our upgrade to infernalis. Is it possible the
> MDS
> >> >> > is
> >> >> > in
> >> >> > an inconsistent state?
> >> >>
> >> >> So this didn't happen to you until after you upgraded? Are you seeing
> >> >> missing files when looking at cephfs directly, or only over the
> >> >> NFS/Samba re-exports? Are you also sharing Samba by re-exporting the
> >> >> kernel cephfs mount?
> >> >>
> >> >> Zheng, any ideas about kernel issues which might cause this or be
> more
> >> >> visible under infernalis?
> >> >> -Greg
> >> >>
> >> >> >
> >> >> > We have cephfs mounted on a server using the built in cephfs kernel
> >> >> > module:
> >> >> >
> >> >> > lts-mon:6789:/ /ceph ceph
> >> >> > name=admin,secretfile=/etc/ceph/admin.secret,noauto,_netdev
> >> >> >
> >> >> >
> >> >> > We are running all of our ceph nodes on ubuntu 14.04 LTS. Samba is
> up
> >> >> > to
> >> >> > date, 4.1.6, and we export nfsv3 to linux and freebsd systems. All
> >> >> > seem
> >> >> > to
> >> >> > exhibit the same behavior.
> >> >> >
> >> >> > system info:
> >> >> >
> >> >> > # uname -a
> >> >> > Linux lts-osd1 3.13.0-63-generic #103-Ubuntu SMP Fri Aug 14
> 21:42:59
> >> >> > UTC
> >> >> > 2015 x86_64 x86_64 x86_64 GNU/Linux
> >> >> > root@lts-osd1:~# lsb
> >> >> > lsblk        lsb_release
> >> >> > root@lts-osd1:~# lsb_release -a
> >> >> > No LSB modules are available.
> >> >> > Distributor ID: Ubuntu
> >> >> > Description: Ubuntu 14.04.3 LTS
> >> >> > Release: 14.04
> >> >> > Codename: trusty
> >> >> >
> >> >> >
> >> >> > package info:
> >> >> >
> >> >> >  # dpkg -l|grep ceph
> >> >> > ii  ceph                                 9.2.0-1trusty
> >> >> > amd64        distributed storage and file system
> >> >> > ii  ceph-common                          9.2.0-1trusty
> >> >> > amd64        common utilities to mount and interact with a ceph
> >> >> > storage
> >> >> > cluster
> >> >> > ii  ceph-fs-common                       9.2.0-1trusty
> >> >> > amd64        common utilities to mount and interact with a ceph
> file
> >> >> > system
> >> >> > ii  ceph-mds                             9.2.0-1trusty
> >> >> > amd64        metadata server for the ceph distributed file system
> >> >> > ii  libcephfs1                           9.2.0-1trusty
> >> >> > amd64        Ceph distributed file system client library
> >> >> > ii  python-ceph                          9.2.0-1trusty
> >> >> > amd64        Meta-package for python libraries for the Ceph
> libraries
> >> >> > ii  python-cephfs                        9.2.0-1trusty
> >> >> > amd64        Python libraries for the Ceph libcephfs library
> >> >> >
> >> >> >
> >> >> > What is interesting, is a directory or file will not show up in a
> >> >> > listing,
> >> >> > however, if we directly access the file, it shows up in that
> >> >> > instance:
> >> >> >
> >> >> >
> >> >> > # ls -al |grep SCHOOL
> >> >> > # ls -alnd SCHOOL667055
> >> >> > drwxrwsr-x  1 21695  21183  2962751438 Jan 13 09:33 SCHOOL667055
> >> >> >
> >> >> >
> >> >> > Any tips are appreciated!
> >> >> >
> >> >> > Thanks,
> >> >> > Mike C
> >> >> >
> >> >> >
> >> >> > _______________________________________________
> >> >> > ceph-users mailing list
> >> >> > ceph-users@lists.ceph.com
> >> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >> >
> >> >
> >> >
> >> >
> >> > _______________________________________________
> >> > ceph-users mailing list
> >> > ceph-users@lists.ceph.com
> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >
> >
> >
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to