Good day!

Tried to nullify thid osd and reinject it with no success. It works a
little bit then the crash again.


Regards, Artem Silenkov, 2GIS TM.
---
2GIS LLC
http://2gis.ru
a.silen...@2gis.ru
gtalk:artem.silen...@gmail.com
cell:+79231534853


2013/6/5 Artem Silenkov <artem.silen...@gmail.com>

> Hello!
> We have simple setup as follows:
>
> Debian GNU/Linux 6.0 x64
> Linux h08 2.6.32-19-pve #1 SMP Wed May 15 07:32:52 CEST 2013 x86_64
> GNU/Linux
>
> ii  ceph                             0.61.2-1~bpo60+1
> distributed storage and file system
> ii  ceph-common                      0.61.2-1~bpo60+1             common
> utilities to mount and interact with a ceph storage cluster
> ii  ceph-fs-common                   0.61.2-1~bpo60+1             common
> utilities to mount and interact with a ceph file system
> ii  ceph-fuse                        0.61.2-1~bpo60+1
> FUSE-based client for the Ceph distributed file system
> ii  ceph-mds                         0.61.2-1~bpo60+1             metadata
> server for the ceph distributed file system
> ii  libcephfs1                       0.61.2-1~bpo60+1             Ceph
> distributed file system client library
> ii  libc-bin                         2.11.3-4                     Embedded
> GNU C Library: Binaries
> ii  libc-dev-bin                     2.11.3-4                     Embedded
> GNU C Library: Development binaries
> ii  libc6                            2.11.3-4                     Embedded
> GNU C Library: Shared libraries
> ii  libc6-dev                        2.11.3-4                     Embedded
> GNU C Library: Development Libraries and Header Files
>
> All programs are running fine except osd.2 which is crashing repeatedly.
> All other nodes have the same operating system onboard and all the system
> environment is quite identical.
>
> #cat /etc/ceph/ceph.conf
> [global]
>         pid file = /var/run/ceph/$name.pid
>         auth cluster required = none
>         auth service required = none
>         auth client required = none
>         max open files = 65000
>
> [mon]
> [mon.0]
>         host = h01
>         mon addr = 10.1.1.3:6789
> [mon.1]
>         host = h07
>         mon addr = 10.1.1.10:6789
> [mon.2]
>         host = h08
>         mon addr = 10.1.1.11:6789
>
> [mds]
> [mds.3]
>         host = h09
>
> [mds.4]
>         host = h06
>
> [osd]
>         osd journal size = 10000
>         osd journal = /var/lib/ceph/journal/$cluster-$id/journal
>         osd mkfs type = xfs
>
> [osd.0]
>         host = h01
>         addr = 10.1.1.3
>         devs = /dev/sda3
> [osd.1]
>         host = h07
>         addr = 10.1.1.10
>         devs = /dev/sda3
> [osd.2]
>         host = h08
>         addr = 10.1.1.11
>         devs = /dev/sda3
> [osd.3]
>         host = h09
>         addr = 10.1.1.12
>         devs = /dev/sda3
>
> [osd.4]
>         host = h06
>         addr = 10.1.1.9
>         devs = /dev/sda3
>
>
> ~#ceph osd tree
>
> # id    weight  type name       up/down reweight
> -1      5       root default
> -3      5               rack unknownrack
> -2      1                       host h01
> 0       1                               osd.0   up      1
> -4      1                       host h07
> 1       1                               osd.1   up      1
> -5      1                       host h08
> 2       1                               osd.2   down    0
> -6      1                       host h09
> 3       1                               osd.3   up      1
> -7      1                       host h06
> 4       1                               osd.4   up      1
>
>
> When crashing ceph-osd process could fall into zombie state with no
> possibility even umount osd partition.
>
> My gdb show the following
>
> #gdb /usr/bin/ceph-osd /core
> GNU gdb (GDB) 7.0.1-debian
> Copyright (C) 2009 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <
> http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-linux-gnu".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>...
> Reading symbols from /usr/bin/ceph-osd...(no debugging symbols
> found)...done.
> [New Thread 809630]
> [New Thread 809628]
> [New Thread 809631]
> [New Thread 809632]
> [New Thread 809633]
> [New Thread 809634]
> [New Thread 809672]
> [New Thread 809629]
> [New Thread 809524]
> [New Thread 809421]
> [New Thread 137559]
> [New Thread 809636]
> [New Thread 809635]
> [New Thread 809677]
> [New Thread 809679]
> [New Thread 809527]
> [New Thread 137560]
> [New Thread 809420]
> [New Thread 809637]
> [New Thread 809685]
> [New Thread 809525]
> [New Thread 809638]
> [New Thread 99663]
> [New Thread 809523]
> [New Thread 809639]
> [New Thread 809522]
> [New Thread 809640]
> [New Thread 809644]
> [New Thread 809641]
> [New Thread 809643]
> [New Thread 809648]
> [New Thread 809668]
> [New Thread 809669]
> [New Thread 809671]
> [New Thread 809676]
> [New Thread 809680]
> [New Thread 809681]
> [New Thread 56075]
> [New Thread 809682]
> [New Thread 107924]
> [New Thread 809683]
> [New Thread 108037]
> [New Thread 809684]
> [New Thread 119704]
> [New Thread 809686]
> [New Thread 809537]
> [New Thread 56073]
> [New Thread 85231]
> [New Thread 85232]
> [New Thread 99661]
> [New Thread 809535]
> [New Thread 99662]
> [New Thread 107922]
> [New Thread 119705]
> [New Thread 107928]
> [New Thread 108035]
> [New Thread 809410]
> [New Thread 809528]
> [New Thread 809530]
> [New Thread 809531]
> [New Thread 809533]
> [New Thread 809536]
> [New Thread 809642]
> [New Thread 809534]
> [New Thread 809411]
> [New Thread 809645]
> [New Thread 809667]
> [New Thread 809670]
> [New Thread 809526]
> [New Thread 809521]
> [New Thread 809532]
> [New Thread 809529]
>
> warning: Can't read pathname for load map: Input/output error.
> Reading symbols from /lib/libaio.so.1...(no debugging symbols
> found)...done.
> Loaded symbols for /lib/libaio.so.1
> Reading symbols from /usr/lib/libnss3.so.1d...(no debugging symbols
> found)...done.
> Loaded symbols for /usr/lib/libnss3.so.1d
> Reading symbols from /usr/lib/libnspr4.so.0d...(no debugging symbols
> found)...done.
> Loaded symbols for /usr/lib/libnspr4.so.0d
> Reading symbols from /lib/libpthread.so.0...(no debugging symbols
> found)...done.
> Loaded symbols for /lib/libpthread.so.0
> Reading symbols from /lib/libuuid.so.1...(no debugging symbols
> found)...done.
> Loaded symbols for /lib/libuuid.so.1
> Reading symbols from /lib/librt.so.1...(no debugging symbols found)...done.
> Loaded symbols for /lib/librt.so.1
> Reading symbols from /lib/libdl.so.2...(no debugging symbols found)...done.
> Loaded symbols for /lib/libdl.so.2
> Reading symbols from /usr/lib/libtcmalloc.so.0...(no debugging symbols
> found)...done.
> Loaded symbols for /usr/lib/libtcmalloc.so.0
> Reading symbols from /usr/lib/libboost_thread.so.1.42.0...(no debugging
> symbols found)...done.
> Loaded symbols for /usr/lib/libboost_thread.so.1.42.0
> Reading symbols from /usr/lib/libleveldb.so.1...(no debugging symbols
> found)...done.
> Loaded symbols for /usr/lib/libleveldb.so.1
> Reading symbols from /usr/lib/libstdc++.so.6...(no debugging symbols
> found)...done.
> Loaded symbols for /usr/lib/libstdc++.so.6
> Reading symbols from /lib/libm.so.6...(no debugging symbols found)...done.
> Loaded symbols for /lib/libm.so.6
> Reading symbols from /lib/libgcc_s.so.1...(no debugging symbols
> found)...done.
> Loaded symbols for /lib/libgcc_s.so.1
> Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done.
> Loaded symbols for /lib/libc.so.6
> Reading symbols from /usr/lib/libnssutil3.so.1d...(no debugging symbols
> found)...done.
> Loaded symbols for /usr/lib/libnssutil3.so.1d
> Reading symbols from /usr/lib/libplc4.so.0d...(no debugging symbols
> found)...done.
> Loaded symbols for /usr/lib/libplc4.so.0d
> Reading symbols from /usr/lib/libplds4.so.0d...(no debugging symbols
> found)...done.
> Loaded symbols for /usr/lib/libplds4.so.0d
> Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols
> found)...done.
> Loaded symbols for /lib64/ld-linux-x86-64.so.2
> Reading symbols from /usr/lib/libunwind.so.7...(no debugging symbols
> found)...done.
> Loaded symbols for /usr/lib/libunwind.so.7
> Reading symbols from /usr/lib/libsnappy.so.1...(no debugging symbols
> found)...done.
> Loaded symbols for /usr/lib/libsnappy.so.1
> Reading symbols from /usr/lib/nss/libsoftokn3.so...(no debugging symbols
> found)...done.
> Loaded symbols for /usr/lib/nss/libsoftokn3.so
> Reading symbols from /usr/lib/libsqlite3.so.0...(no debugging symbols
> found)...done.
> Loaded symbols for /usr/lib/libsqlite3.so.0
> Reading symbols from /usr/lib/nss/libfreebl3.so...(no debugging symbols
> found)...done.
> Loaded symbols for /usr/lib/nss/libfreebl3.so
> Reading symbols from /usr/lib/rados-classes/libcls_lock.so...done.
> Loaded symbols for /usr/lib/rados-classes/libcls_lock.so
> Reading symbols from /usr/lib/libboost_system.so.1.42.0...(no debugging
> symbols found)...done.
> Loaded symbols for /usr/lib/libboost_system.so.1.42.0
> Reading symbols from /usr/lib/rados-classes/libcls_rgw.so...done.
> Loaded symbols for /usr/lib/rados-classes/libcls_rgw.so
>
> warning: no loadable sections found in added symbol-file system-supplied
> DSO at 0x7ffff87fe000
> Core was generated by `/usr/bin/ceph-osd -i 2 --pid-file
> /var/run/ceph/osd.2.pid -c /etc/ceph/ceph.con'.
> Program terminated with signal 6, Aborted.
> #0  0x00007f7e994b9ebb in raise () from /lib/libpthread.so.0
>
> (gdb) bt
> #0  0x00007f7e994b9ebb in raise () from /lib/libpthread.so.0
> #1  0x00000000007a16c7 in ?? ()
> #2  <signal handler called>
> #3  0x00007f7e97cf21b5 in raise () from /lib/libc.so.6
> #4  0x00007f7e97cf4fc0 in abort () from /lib/libc.so.6
> #5  0x00007f7e98586dc5 in __gnu_cxx::__verbose_terminate_handler() () from
> /usr/lib/libstdc++.so.6
> #6  0x00007f7e98585166 in ?? () from /usr/lib/libstdc++.so.6
> #7  0x00007f7e98585193 in std::terminate() () from /usr/lib/libstdc++.so.6
> #8  0x00007f7e9858528e in __cxa_throw () from /usr/lib/libstdc++.so.6
> #9  0x00000000007f9f79 in ceph::__ceph_assert_fail(char const*, char
> const*, int, char const*) ()
> #10 0x0000000000763ca1 in SyncEntryTimeout::finish(int) ()
> #11 0x00000000005b828a in Context::complete(int) ()
> #12 0x00000000008b3793 in SafeTimer::timer_thread() ()
> #13 0x00000000008b595d in SafeTimerThread::entry() ()
> #14 0x00007f7e994b18ca in start_thread () from /lib/libpthread.so.0
> #15 0x00007f7e97d8fb6d in clone () from /lib/libc.so.6
> #16 0x0000000000000000 in ?? ()
> (gdb)
>
> Problem is common only for this one osd.2 and all other services running
> fine. I have a lot of core dumped if any need.
>
> Please help fix this issue. Our cluster running as follows
> #ceph -w
>    health HEALTH_WARN 2 pgs backfilling; 2 pgs degraded; 3 pgs recovering;
> 39 pgs recovery_wait; 44 pgs stuck unclean; recovery 157580/1744054
> degraded (9.035%);  recovering 105 o/s, 7442KB/s; 1 mons down, quorum 0,1
> 0,1
>    monmap e1: 3 mons at {0=
> 10.1.1.3:6789/0,1=10.1.1.10:6789/0,2=10.1.1.11:6789/0<http://10.54.255.3:6789/0,1=10.54.255.10:6789/0,2=10.54.255.11:6789/0>},
> election epoch 112, quorum 0,1 0,1
>    osdmap e200: 6 osds: 4 up, 4 in
>     pgmap v1133760: 1208 pgs: 1164 active+clean, 39 active+recovery_wait,
> 2 active+degraded+backfilling, 3 active+recovering; 88915 MB data, 170 GB
> used, 573 GB / 744 GB avail; 119KB/s rd, 763KB/s wr, 18op/s; 157580/1744054
> degraded (9.035%);  recovering 105 o/s, 7442KB/s
>    mdsmap e16: 1/1/1 up {0=4=up:active}, 1 up:standby
>
> Regards, Artem Silenkov, 2GIS TM.
> ---
> 2GIS LLC
> http://2gis.ru
> a.silen...@2gis.ru
> gtalk:artem.silen...@gmail.com
> cell:+79231534853
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to