Re: [ceph-users] ceph-osd constantly crashing

Artem Silenkov Wed, 05 Jun 2013 23:36:55 -0700

Good day!

Thank you, but it's not clear for me what is a bottleneck here.


- Hardware node - load average, disk IO


- underlying file system problem on osd or disk bad.

- ceph journal problem

Ceph osd partition is a part of block device which has practically no load

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn

sda              12,00         0,00         0,12          0          0


Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sda              12,00         0,00         0,14          0          0

Disk with osd is good, just checked it and have good r/w speed with
appropriate iops and latency.

But hardware node is working hard and have high load average. I fear
that ceph-osd process lack resources. Is there any way to fix it? May
be raise some kind of timeout when syncing or make this osd less
weight or so?

Or its better to move this osd to another server?


Regards, Artem Silenkov, 2GIS TM.
---
2GIS LLChttp://2gis.rua.silenkov at 2gis.ru
<http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
gtalk:artem.silenkov at gmail.com
<http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
cell:+79231534853




2013/6/5 Gregory Farnum <g...@inktank.com>

> This would be easier to see with a log than with all the GDB stuff, but
> the reference in the backtrace to "SyncEntryTimeout::finish(int)" tells
> me that the filesystem is taking too long to sync things to disk. Either
> this disk is bad or you're somehow subjecting it to a much heavier load
> than the others.
> -Greg
>
> On Wednesday, June 5, 2013, Artem Silenkov wrote:
>
>> Good day!
>>
>> Tried to nullify thid osd and reinject it with no success. It works a
>> little bit then the crash again.
>>
>>
>> Regards, Artem Silenkov, 2GIS TM.
>> ---
>> 2GIS LLC
>> http://2gis.ru
>> a.silen...@2gis.ru
>>  gtalk:artem.silen...@gmail.com
>> cell:+79231534853
>>
>>
>> 2013/6/5 Artem Silenkov <artem.silen...@gmail.com>
>>
>> Hello!
>> We have simple setup as follows:
>>
>> Debian GNU/Linux 6.0 x64
>> Linux h08 2.6.32-19-pve #1 SMP Wed May 15 07:32:52 CEST 2013 x86_64
>> GNU/Linux
>>
>> ii  ceph                             0.61.2-1~bpo60+1
>> distributed storage and file system
>> ii  ceph-common                      0.61.2-1~bpo60+1             common
>> utilities to mount and interact with a ceph storage cluster
>> ii  ceph-fs-common                   0.61.2-1~bpo60+1             common
>> utilities to mount and interact with a ceph file system
>> ii  ceph-fuse                        0.61.2-1~bpo60+1
>> FUSE-based client for the Ceph distributed file system
>> ii  ceph-mds                         0.61.2-1~bpo60+1
>> metadata server for the ceph distributed file system
>> ii  libcephfs1                       0.61.2-1~bpo60+1             Ceph
>> distributed file system client library
>> ii  libc-bin                         2.11.3-4
>> Embedded GNU C Library: Binaries
>> ii  libc-dev-bin                     2.11.3-4
>> Embedded GNU C Library: Development binaries
>> ii  libc6                            2.11.3-4
>> Embedded GNU C Library: Shared libraries
>> ii  libc6-dev                        2.11.3-4
>> Embedded GNU C Library: Development Libraries and Header Files
>>
>> All programs are running fine except osd.2 which is crashing repeatedly.
>> All other nodes have the same operating system onboard and all the system
>> environment is quite identical.
>>
>> #cat /etc/ceph/ceph.conf
>> [global]
>>         pid file = /var/run/ceph/$name.pid
>>         auth cluster required = none
>>         auth service required = none
>>         auth client required = none
>>         max open files = 65000
>>
>> [mon]
>> [mon.0]
>>         host = h01
>>         mon addr = 10.1.1.3:6789
>> [mon.1]
>>         host = h07
>>         mon addr = 10.1.1.10:6789
>> [mon.2]
>>         host = h08
>>         mon addr = 10.1.1.11:6789
>>
>> [mds]
>> [mds.3]
>>         host = h09
>>
>> [mds.4]
>>         host = h06
>>
>> [osd]
>>         osd journal size = 10000
>>         osd journal = /var/lib/ceph/journal/$cluster-$id/journal
>>         osd mkfs type = xfs
>>
>> [osd.0]
>>         host = h01
>>         addr = 10.1.1.3
>>         devs = /dev/sda3
>> [osd.1]
>>         host = h07
>>         addr = 10.1.1.10
>>         devs = /dev/sda3
>> [osd.2]
>>         host = h08
>>         addr = 10.1.1.11
>>         devs = /dev/sda3
>> [osd.3]
>>         host = h09
>>         addr = 10.1.1.12
>>         devs = /dev/sda3
>>
>> [osd.4]
>>         host = h06
>>         addr = 10.1.1.9
>>         devs = /dev/sda3
>>
>>
>> ~#ceph osd tree
>>
>> # id    weight  type name       up/down reweight
>> -1      5       root default
>> -3      5               rack unknownrack
>> -2      1                       host h01
>> 0       1                               osd.0   up      1
>> -4      1                       host h07
>> 1       1                               osd.1   up      1
>> -5      1                       host h08
>> 2       1                               osd.2   down    0
>> -6      1                       host h09
>> 3       1                               osd.3   up      1
>> -7      1                       host h06
>> 4       1                               osd.4   up      1
>>
>>
>
> --
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph-osd constantly crashing

Reply via email to