Hi, last night one of the PCI SSD drives that we use as a disk for OSD
journal died, so we had to replace it, in this case for a 800GB SSD SATA
Hard Disk. After recreating the journals 9 of the 11 OSDs of the server are
not starting anymore (they start but after a minute, the OSD goes down).

Looking at the logs, I see that the service dies after a  *** Caught signal
(Aborted) ** message.

extract from ceph-osd.6.log :

   -11> 2015-08-15 09:33:16.937820 7f32e6167700  5 -- op tracker -- seq:
484, time: 2015-08-15 09:33:16.937294, event: header_read, op: pg_info(1
pgs e10407:18.2c)
   -10> 2015-08-15 09:33:16.937822 7f32e6167700  5 -- op tracker -- seq:
484, time: 2015-08-15 09:33:16.937296, event: throttled, op: pg_info(1 pgs
e10407:18.2c)
    -9> 2015-08-15 09:33:16.937826 7f32e6167700  5 -- op tracker -- seq:
484, time: 2015-08-15 09:33:16.937369, event: all_read, op: pg_info(1 pgs
e10407:18.2c)
    -8> 2015-08-15 09:33:16.937830 7f32e6167700  5 -- op tracker -- seq:
484, time: 2015-08-15 09:33:16.937819, event: dispatched, op: pg_info(1 pgs
e10407:18.2c)
    -7> 2015-08-15 09:33:16.937834 7f32e6167700  5 -- op tracker -- seq:
484, time: 2015-08-15 09:33:16.937834, event: waiting_for_osdmap, op:
pg_info(1 pgs e10407:18.2c)
    -6> 2015-08-15 09:33:16.937837 7f32e6167700  5 -- op tracker -- seq:
484, time: 2015-08-15 09:33:16.937837, event: started, op: pg_info(1 pgs
e10407:18.2c)
    -5> 2015-08-15 09:33:16.937848 7f32e6167700  5 -- op tracker -- seq:
484, time: 2015-08-15 09:33:16.937848, event: done, op: pg_info(1 pgs
e10407:18.2c)
    -4> 2015-08-15 09:33:16.937860 7f32e6167700  1 -- 172.18.4.6:6800/21878
<== osd.11 172.18.4.7:6840/7934 44 ==== pg_info(1 pgs e10407:21.78) v4 ====
759+0+0 (2132192505 0 0) 0x1da59fe0 con 0x15e9a520
    -3> 2015-08-15 09:33:16.937869 7f32e6167700  5 -- op tracker -- seq:
485, time: 2015-08-15 09:33:16.928344, event: header_read, op: pg_info(1
pgs e10407:21.78)
    -2> 2015-08-15 09:33:16.937871 7f32e6167700  5 -- op tracker -- seq:
485, time: 2015-08-15 09:33:16.928346, event: throttled, op: pg_info(1 pgs
e10407:21.78)
    -1> 2015-08-15 09:33:16.937876 7f32e6167700  5 -- op tracker -- seq:
485, time: 2015-08-15 09:33:16.928388, event: all_read, op: pg_info(1 pgs
e10407:21.78)
     0> 2015-08-15 09:33:16.937829 7f32de958700 -1 *** Caught signal
(Aborted) **
 in thread 7f32de958700


I have the full log in case you need more information

Thank you for your help.

-- 
*Francisco J. Araya Maggiolo*
Devops Engineer & Cloud Specialist
KIO Networks
Mexico City
Phone: +52 (55) 8503 2600 ext. 3901
Mobile: +52 (1) (55) 6066 9025
http://www.kionetworks.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to