John,

This seems to have worked.  I rebooted my client and restarted ceph on the
MDS hosts after giving them more RAM.   I restarted the rsync's that were
running on the client after remounting the cephfs fs and things seem to be
working.  I can access the files so that is a relief.

What is risky about enabling mds_bal_frag on a cluster with data and will
there be any performance degradation if enabled?

Thanks again for the help.

On Tue, Aug 11, 2015 at 2:25 PM, John Spray <jsp...@redhat.com> wrote:

> On Tue, Aug 11, 2015 at 6:23 PM, Bob Ababurko <b...@ababurko.net> wrote:
> > Here is the backtrace from the core dump.
> >
> > (gdb) bt
> > #0  0x00007f71f5404ffb in raise () from /lib64/libpthread.so.0
> > #1  0x000000000087065d in reraise_fatal (signum=6) at
> > global/signal_handler.cc:59
> > #2  handle_fatal_signal (signum=6) at global/signal_handler.cc:109
> > #3  <signal handler called>
> > #4  0x00007f71f40235d7 in raise () from /lib64/libc.so.6
> > #5  0x00007f71f4024cc8 in abort () from /lib64/libc.so.6
> > #6  0x00007f71f49279b5 in __gnu_cxx::__verbose_terminate_handler() ()
> from
> > /lib64/libstdc++.so.6
> > #7  0x00007f71f4925926 in ?? () from /lib64/libstdc++.so.6
> > #8  0x00007f71f4925953 in std::terminate() () from /lib64/libstdc++.so.6
> > #9  0x00007f71f4925b73 in __cxa_throw () from /lib64/libstdc++.so.6
> > #10 0x000000000077d0fc in operator new (num_bytes=2408) at
> mds/CInode.h:120
> > Python Exception <type 'exceptions.IndexError'> list index out of range:
> > #11 CDir::_omap_fetched (this=0x90af04f8, hdrbl=..., omap=std::map with
> > 65536 elements, want_dn="", r=<optimized out>) at mds/CDir.cc:1700
> > #12 0x00000000007d7d44 in complete (r=0, this=0x502b000) at
> > include/Context.h:65
> > #13 MDSIOContextBase::complete (this=0x502b000, r=0) at
> mds/MDSContext.cc:59
> > #14 0x0000000000894818 in Finisher::finisher_thread_entry
> (this=0x5108698)
> > at common/Finisher.cc:59
> > #15 0x00007f71f53fddf5 in start_thread () from /lib64/libpthread.so.0
> > #16 0x00007f71f40e41ad in clone () from /lib64/libc.so.6
>
> If we believe the line numbers here, then it's a malloc failure.  Are
> you running out of memory?
>
> The MDS is loading a bunch of these 64k file directories (presumably a
> characteristic of your workload), and ending up with an unusually
> large number of inodes in cache (this is all happening during the
> "rejoin" phase so no trimming of the cache is done and we merrily
> exceed the default mds_cache_size limit of 100k inodes).
>
> The thing triggering the load of the dirs is clients replaying
> requests that refer to inodes by inode number, and the MDS's procedure
> for handling that involves fully loading the relevant dirs.  That
> might be something we can improve, it doesn't seem obviously necessary
> to load all the dentrys in a dirfrag during this phase.
>
> Anyway, you can hopefully recover from this state by forcibly
> unmounting your clients.  Since you're using the kernel client it may
> be easiest to hard reset the client boxes.  When you next restart your
> MDS, the clients won't be present, so the MDS will be able to make it
> all the way up without trying to load a bunch of directory fragments.
> If you've got some more RAM for the MDS box that wouldn't hurt either.
>
> One of the less well tested (but relevant here) features we have is
> directory fragmentation, where large dirs like these are internally
> split up (partly to avoid memory management issues like this).  It
> might be a risky business on a system that you've already got real
> data on, but once your MDS is back up and running you can try enabling
> the mds_bal_frag setting.
>
> This is not a use case we have particularly strong coverage of in our
> automated tests, so thanks for your experimentation and persistence.
>
> John
>
> >
> > I have also gotten a log file w / debug mds = 20.  It was 1.2GB, so I
> > bzip2'd it w max compression and got it down to 75MB.  I wasn't sure
> where
> > to upload it so if there is a better place to put it, please let me know.
> >
> > https://mega.nz/#!5V4z3A7K!0METjVs5t3DAQAts8_TYXWrLh2FhGHcb7oC4uuhr2T8
> >
> > thanks,
> > Bob
> >
> >
> > On Mon, Aug 10, 2015 at 8:05 PM, Yan, Zheng <uker...@gmail.com> wrote:
> >>
> >> On Tue, Aug 11, 2015 at 9:21 AM, Bob Ababurko <b...@ababurko.net> wrote:
> >> > I had a dual mds server configuration and have been copying data via
> >> > cephfs
> >> > kernel module to my cluster for the past 3 weeks and just had a MDS
> >> > crash
> >> > halting all IO.  Leading up to the crash, I ran a test dd that
> increased
> >> > the
> >> > throughput by about 2x and stopped it but about 10 minutes later, the
> >> > MDS
> >> > server crashed and did not fail over to the standby properly. I have
> >> > using
> >> > an active/standby mds configuration but neither of the mds servers
> will
> >> > stay
> >> > running at this point and crash after starting them.
> >> >
> >> > [bababurko@cephmon01 ~]$ sudo ceph -s
> >> >     cluster f25cb23f-2293-4682-bad2-4b0d8ad10e79
> >> >      health HEALTH_WARN
> >> >             mds cluster is degraded
> >> >             mds cephmds02 is laggy
> >> >             noscrub,nodeep-scrub flag(s) set
> >> >      monmap e1: 3 mons at
> >> >
> >> > {cephmon01=
> 10.15.24.71:6789/0,cephmon02=10.15.24.80:6789/0,cephmon03=10.15.24.135:6789/0
> }
> >> >             election epoch 4, quorum 0,1,2
> cephmon01,cephmon02,cephmon03
> >> >      mdsmap e2760: 1/1/1 up {0=cephmds02=up:rejoin(laggy or crashed)}
> >> >      osdmap e324: 30 osds: 30 up, 30 in
> >> >             flags noscrub,nodeep-scrub
> >> >       pgmap v1555346: 2112 pgs, 3 pools, 4993 GB data, 246 Mobjects
> >> >             14051 GB used, 13880 GB / 27931 GB avail
> >> >                 2112 active+clean
> >> >
> >> >
> >> > I am not sure what information is relevant so I will try to cover
> what I
> >> > think is relevant based on posts I have read through:
> >> >
> >> > Cluster:
> >> > running ceph-0.94.1 on CenttOS 7.1
> >> > [root@mdstest02 bababurko]$ uname -r
> >> > 3.10.0-229.el7.x86_64
> >> >
> >> > Here is my ceph-mds log with 'debug objector = 10' :
> >> >
> >> >
> >> >
> https://www.zerobin.net/?179a6789dfc9eb86#AHAS3YEkpHTj6CSQg8u4hk+jHBasejQNLDc9/KYkYVQ=
> >>
> >>
> >> could you use gdb to check where the crash happened. (gdb
> >> /usr/local/bin/ceph-mds /core.xxxxx.  maybe you need re-compile mds
> >> with debuginfo)
> >>
> >> Yan, Zheng
> >>
> >> >
> >> > cat /sys/kernel/debug/ceph/*/mdsc output:
> >> >
> >> >
> >> >
> https://www.zerobin.net/?ed238ce77b20583d#CK7Yt6yC1VgHfDee7y/CGkFh5bfyLkhwZB6i5R6N/8g=
> >> >
> >> > ceph.conf :
> >> >
> >> >
> >> >
> https://www.zerobin.net/?62a125349aa43c92#5VH3XRR4P7zjhBHNWmTHrFYmwE0TZEig6r2EU6X1q/U=
> >> >
> >> > I have copied almost 5TB of small files to this cluster which has
> taken
> >> > the
> >> > better part of three weeks, so I am really hoping that there is a way
> to
> >> > recover from this.  This is ourPOC cluster
> >> >
> >> > I'm sure I have missed something relevant as i'm just getting my mind
> >> > back
> >> > after nearly losing it, so feel free to ask for anything to assist.
> >> >
> >> > Any help would be greatly appreciated.
> >> >
> >> > thanks,
> >> > Bob
> >> >
> >> >
> >> >
> >> > _______________________________________________
> >> > ceph-users mailing list
> >> > ceph-users@lists.ceph.com
> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >
> >
> >
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to