Good day! Tried to nullify thid osd and reinject it with no success. It works a little bit then the crash again.
Regards, Artem Silenkov, 2GIS TM. --- 2GIS LLC http://2gis.ru a.silen...@2gis.ru gtalk:artem.silen...@gmail.com cell:+79231534853 2013/6/5 Artem Silenkov <artem.silen...@gmail.com> > Hello! > We have simple setup as follows: > > Debian GNU/Linux 6.0 x64 > Linux h08 2.6.32-19-pve #1 SMP Wed May 15 07:32:52 CEST 2013 x86_64 > GNU/Linux > > ii ceph 0.61.2-1~bpo60+1 > distributed storage and file system > ii ceph-common 0.61.2-1~bpo60+1 common > utilities to mount and interact with a ceph storage cluster > ii ceph-fs-common 0.61.2-1~bpo60+1 common > utilities to mount and interact with a ceph file system > ii ceph-fuse 0.61.2-1~bpo60+1 > FUSE-based client for the Ceph distributed file system > ii ceph-mds 0.61.2-1~bpo60+1 metadata > server for the ceph distributed file system > ii libcephfs1 0.61.2-1~bpo60+1 Ceph > distributed file system client library > ii libc-bin 2.11.3-4 Embedded > GNU C Library: Binaries > ii libc-dev-bin 2.11.3-4 Embedded > GNU C Library: Development binaries > ii libc6 2.11.3-4 Embedded > GNU C Library: Shared libraries > ii libc6-dev 2.11.3-4 Embedded > GNU C Library: Development Libraries and Header Files > > All programs are running fine except osd.2 which is crashing repeatedly. > All other nodes have the same operating system onboard and all the system > environment is quite identical. > > #cat /etc/ceph/ceph.conf > [global] > pid file = /var/run/ceph/$name.pid > auth cluster required = none > auth service required = none > auth client required = none > max open files = 65000 > > [mon] > [mon.0] > host = h01 > mon addr = 10.1.1.3:6789 > [mon.1] > host = h07 > mon addr = 10.1.1.10:6789 > [mon.2] > host = h08 > mon addr = 10.1.1.11:6789 > > [mds] > [mds.3] > host = h09 > > [mds.4] > host = h06 > > [osd] > osd journal size = 10000 > osd journal = /var/lib/ceph/journal/$cluster-$id/journal > osd mkfs type = xfs > > [osd.0] > host = h01 > addr = 10.1.1.3 > devs = /dev/sda3 > [osd.1] > host = h07 > addr = 10.1.1.10 > devs = /dev/sda3 > [osd.2] > host = h08 > addr = 10.1.1.11 > devs = /dev/sda3 > [osd.3] > host = h09 > addr = 10.1.1.12 > devs = /dev/sda3 > > [osd.4] > host = h06 > addr = 10.1.1.9 > devs = /dev/sda3 > > > ~#ceph osd tree > > # id weight type name up/down reweight > -1 5 root default > -3 5 rack unknownrack > -2 1 host h01 > 0 1 osd.0 up 1 > -4 1 host h07 > 1 1 osd.1 up 1 > -5 1 host h08 > 2 1 osd.2 down 0 > -6 1 host h09 > 3 1 osd.3 up 1 > -7 1 host h06 > 4 1 osd.4 up 1 > > > When crashing ceph-osd process could fall into zombie state with no > possibility even umount osd partition. > > My gdb show the following > > #gdb /usr/bin/ceph-osd /core > GNU gdb (GDB) 7.0.1-debian > Copyright (C) 2009 Free Software Foundation, Inc. > License GPLv3+: GNU GPL version 3 or later < > http://gnu.org/licenses/gpl.html> > This is free software: you are free to change and redistribute it. > There is NO WARRANTY, to the extent permitted by law. Type "show copying" > and "show warranty" for details. > This GDB was configured as "x86_64-linux-gnu". > For bug reporting instructions, please see: > <http://www.gnu.org/software/gdb/bugs/>... > Reading symbols from /usr/bin/ceph-osd...(no debugging symbols > found)...done. > [New Thread 809630] > [New Thread 809628] > [New Thread 809631] > [New Thread 809632] > [New Thread 809633] > [New Thread 809634] > [New Thread 809672] > [New Thread 809629] > [New Thread 809524] > [New Thread 809421] > [New Thread 137559] > [New Thread 809636] > [New Thread 809635] > [New Thread 809677] > [New Thread 809679] > [New Thread 809527] > [New Thread 137560] > [New Thread 809420] > [New Thread 809637] > [New Thread 809685] > [New Thread 809525] > [New Thread 809638] > [New Thread 99663] > [New Thread 809523] > [New Thread 809639] > [New Thread 809522] > [New Thread 809640] > [New Thread 809644] > [New Thread 809641] > [New Thread 809643] > [New Thread 809648] > [New Thread 809668] > [New Thread 809669] > [New Thread 809671] > [New Thread 809676] > [New Thread 809680] > [New Thread 809681] > [New Thread 56075] > [New Thread 809682] > [New Thread 107924] > [New Thread 809683] > [New Thread 108037] > [New Thread 809684] > [New Thread 119704] > [New Thread 809686] > [New Thread 809537] > [New Thread 56073] > [New Thread 85231] > [New Thread 85232] > [New Thread 99661] > [New Thread 809535] > [New Thread 99662] > [New Thread 107922] > [New Thread 119705] > [New Thread 107928] > [New Thread 108035] > [New Thread 809410] > [New Thread 809528] > [New Thread 809530] > [New Thread 809531] > [New Thread 809533] > [New Thread 809536] > [New Thread 809642] > [New Thread 809534] > [New Thread 809411] > [New Thread 809645] > [New Thread 809667] > [New Thread 809670] > [New Thread 809526] > [New Thread 809521] > [New Thread 809532] > [New Thread 809529] > > warning: Can't read pathname for load map: Input/output error. > Reading symbols from /lib/libaio.so.1...(no debugging symbols > found)...done. > Loaded symbols for /lib/libaio.so.1 > Reading symbols from /usr/lib/libnss3.so.1d...(no debugging symbols > found)...done. > Loaded symbols for /usr/lib/libnss3.so.1d > Reading symbols from /usr/lib/libnspr4.so.0d...(no debugging symbols > found)...done. > Loaded symbols for /usr/lib/libnspr4.so.0d > Reading symbols from /lib/libpthread.so.0...(no debugging symbols > found)...done. > Loaded symbols for /lib/libpthread.so.0 > Reading symbols from /lib/libuuid.so.1...(no debugging symbols > found)...done. > Loaded symbols for /lib/libuuid.so.1 > Reading symbols from /lib/librt.so.1...(no debugging symbols found)...done. > Loaded symbols for /lib/librt.so.1 > Reading symbols from /lib/libdl.so.2...(no debugging symbols found)...done. > Loaded symbols for /lib/libdl.so.2 > Reading symbols from /usr/lib/libtcmalloc.so.0...(no debugging symbols > found)...done. > Loaded symbols for /usr/lib/libtcmalloc.so.0 > Reading symbols from /usr/lib/libboost_thread.so.1.42.0...(no debugging > symbols found)...done. > Loaded symbols for /usr/lib/libboost_thread.so.1.42.0 > Reading symbols from /usr/lib/libleveldb.so.1...(no debugging symbols > found)...done. > Loaded symbols for /usr/lib/libleveldb.so.1 > Reading symbols from /usr/lib/libstdc++.so.6...(no debugging symbols > found)...done. > Loaded symbols for /usr/lib/libstdc++.so.6 > Reading symbols from /lib/libm.so.6...(no debugging symbols found)...done. > Loaded symbols for /lib/libm.so.6 > Reading symbols from /lib/libgcc_s.so.1...(no debugging symbols > found)...done. > Loaded symbols for /lib/libgcc_s.so.1 > Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done. > Loaded symbols for /lib/libc.so.6 > Reading symbols from /usr/lib/libnssutil3.so.1d...(no debugging symbols > found)...done. > Loaded symbols for /usr/lib/libnssutil3.so.1d > Reading symbols from /usr/lib/libplc4.so.0d...(no debugging symbols > found)...done. > Loaded symbols for /usr/lib/libplc4.so.0d > Reading symbols from /usr/lib/libplds4.so.0d...(no debugging symbols > found)...done. > Loaded symbols for /usr/lib/libplds4.so.0d > Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols > found)...done. > Loaded symbols for /lib64/ld-linux-x86-64.so.2 > Reading symbols from /usr/lib/libunwind.so.7...(no debugging symbols > found)...done. > Loaded symbols for /usr/lib/libunwind.so.7 > Reading symbols from /usr/lib/libsnappy.so.1...(no debugging symbols > found)...done. > Loaded symbols for /usr/lib/libsnappy.so.1 > Reading symbols from /usr/lib/nss/libsoftokn3.so...(no debugging symbols > found)...done. > Loaded symbols for /usr/lib/nss/libsoftokn3.so > Reading symbols from /usr/lib/libsqlite3.so.0...(no debugging symbols > found)...done. > Loaded symbols for /usr/lib/libsqlite3.so.0 > Reading symbols from /usr/lib/nss/libfreebl3.so...(no debugging symbols > found)...done. > Loaded symbols for /usr/lib/nss/libfreebl3.so > Reading symbols from /usr/lib/rados-classes/libcls_lock.so...done. > Loaded symbols for /usr/lib/rados-classes/libcls_lock.so > Reading symbols from /usr/lib/libboost_system.so.1.42.0...(no debugging > symbols found)...done. > Loaded symbols for /usr/lib/libboost_system.so.1.42.0 > Reading symbols from /usr/lib/rados-classes/libcls_rgw.so...done. > Loaded symbols for /usr/lib/rados-classes/libcls_rgw.so > > warning: no loadable sections found in added symbol-file system-supplied > DSO at 0x7ffff87fe000 > Core was generated by `/usr/bin/ceph-osd -i 2 --pid-file > /var/run/ceph/osd.2.pid -c /etc/ceph/ceph.con'. > Program terminated with signal 6, Aborted. > #0 0x00007f7e994b9ebb in raise () from /lib/libpthread.so.0 > > (gdb) bt > #0 0x00007f7e994b9ebb in raise () from /lib/libpthread.so.0 > #1 0x00000000007a16c7 in ?? () > #2 <signal handler called> > #3 0x00007f7e97cf21b5 in raise () from /lib/libc.so.6 > #4 0x00007f7e97cf4fc0 in abort () from /lib/libc.so.6 > #5 0x00007f7e98586dc5 in __gnu_cxx::__verbose_terminate_handler() () from > /usr/lib/libstdc++.so.6 > #6 0x00007f7e98585166 in ?? () from /usr/lib/libstdc++.so.6 > #7 0x00007f7e98585193 in std::terminate() () from /usr/lib/libstdc++.so.6 > #8 0x00007f7e9858528e in __cxa_throw () from /usr/lib/libstdc++.so.6 > #9 0x00000000007f9f79 in ceph::__ceph_assert_fail(char const*, char > const*, int, char const*) () > #10 0x0000000000763ca1 in SyncEntryTimeout::finish(int) () > #11 0x00000000005b828a in Context::complete(int) () > #12 0x00000000008b3793 in SafeTimer::timer_thread() () > #13 0x00000000008b595d in SafeTimerThread::entry() () > #14 0x00007f7e994b18ca in start_thread () from /lib/libpthread.so.0 > #15 0x00007f7e97d8fb6d in clone () from /lib/libc.so.6 > #16 0x0000000000000000 in ?? () > (gdb) > > Problem is common only for this one osd.2 and all other services running > fine. I have a lot of core dumped if any need. > > Please help fix this issue. Our cluster running as follows > #ceph -w > health HEALTH_WARN 2 pgs backfilling; 2 pgs degraded; 3 pgs recovering; > 39 pgs recovery_wait; 44 pgs stuck unclean; recovery 157580/1744054 > degraded (9.035%); recovering 105 o/s, 7442KB/s; 1 mons down, quorum 0,1 > 0,1 > monmap e1: 3 mons at {0= > 10.1.1.3:6789/0,1=10.1.1.10:6789/0,2=10.1.1.11:6789/0<http://10.54.255.3:6789/0,1=10.54.255.10:6789/0,2=10.54.255.11:6789/0>}, > election epoch 112, quorum 0,1 0,1 > osdmap e200: 6 osds: 4 up, 4 in > pgmap v1133760: 1208 pgs: 1164 active+clean, 39 active+recovery_wait, > 2 active+degraded+backfilling, 3 active+recovering; 88915 MB data, 170 GB > used, 573 GB / 744 GB avail; 119KB/s rd, 763KB/s wr, 18op/s; 157580/1744054 > degraded (9.035%); recovering 105 o/s, 7442KB/s > mdsmap e16: 1/1/1 up {0=4=up:active}, 1 up:standby > > Regards, Artem Silenkov, 2GIS TM. > --- > 2GIS LLC > http://2gis.ru > a.silen...@2gis.ru > gtalk:artem.silen...@gmail.com > cell:+79231534853 >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com