Re: [ceph-users] Jewel 10.2.2 - Error when flushing journal

Alexey Sheplyakov Mon, 12 Sep 2016 06:49:01 -0700

Hi,

> This is the actual logfile for osd.10
> - http://slexy.org/view/s21lhpkLGQ


Unfortunately this log does not contain any new data -- for some reason the
log levels haven't changed (see line 36369).
Could you please try the following command:

ceph-osd -d --flush-journal --debug_filestore 20/20 --debug_journal 20/20
-i 10

Best regards,
     Alexey


On Fri, Sep 9, 2016 at 11:24 AM, Mehmet <c...@elchaka.de> wrote:

> Hello Alexey,
>
> thank you for your mail - my answers inline :)
>
> Am 2016-09-08 16:24, schrieb Alexey Sheplyakov:
>
>> Hi,
>>
>> root@:~# ceph-osd -i 12 --flush-journal
>>>
>>  > SG_IO: questionable sense data, results may be incorrect
>>  > SG_IO: questionable sense data, results may be incorrect
>>
>> As far as I understand these lines is a hdparm warning (OSD uses
>> hdparm command to query the journal device write cache state).
>>
>> The message means hdparm is unable to reliably figure out if the drive
>> write cache is enabled. This might indicate a hardware problem.
>>
>
> I guess this has to do with the the NVMe-Device (Intel DC P3700 NVMe)
> which is used for journaling.
> And so.. a normal behavior?
>
> ceph-osd -i 12 --flush-journal
>>>
>>
>> I think it's a good idea to
>> a) check the journal drive (smartctl),
>>
>
> The disks are all fine - checked 2-3 weeks before.
>
> b) capture a more verbose log,
>>
>> i.e. add this to ceph.conf
>>
>> [osd]
>> debug filestore = 20/20
>> debug journal = 20/20
>>
>> and try flushing the journal once more (note: this won't fix the
>> problem, the point is to get a useful log)
>>
>
> I have flushed the the journal @ ~09:55:26 today and got this lines
>
> root@:~# ceph-osd -i 10 --flush-journal
> SG_IO: questionable sense data, results may be incorrect
> SG_IO: questionable sense data, results may be incorrect
> *** Caught signal (Segmentation fault) **
>  in thread 7f38a2ecf700 thread_name:ceph-osd
>  ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
>  1: (()+0x96bdde) [0x560356296dde]
>  2: (()+0x113d0) [0x7f38a81b03d0]
>  3: [0x560360f79f00]
> 2016-09-09 09:55:26.446925 7f38a2ecf700 -1 *** Caught signal (Segmentation
> fault) **
>  in thread 7f38a2ecf700 thread_name:ceph-osd
>
>  ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
>  1: (()+0x96bdde) [0x560356296dde]
>  2: (()+0x113d0) [0x7f38a81b03d0]
>  3: [0x560360f79f00]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> to interpret this.
>
>      0> 2016-09-09 09:55:26.446925 7f38a2ecf700 -1 *** Caught signal
> (Segmentation fault) **
>  in thread 7f38a2ecf700 thread_name:ceph-osd
>
>  ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
>  1: (()+0x96bdde) [0x560356296dde]
>  2: (()+0x113d0) [0x7f38a81b03d0]
>  3: [0x560360f79f00]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> to interpret this.
>
> Segmentation fault
>
>
> This is the actual logfile for osd.10
> - http://slexy.org/view/s21lhpkLGQ
>
> By the way:
> I have done "ceph osd set noout" before stop and flushing.
>
> Hope this is useful for you!
>
> - Mehmet
>
> Best regards,
>>       Alexey
>>
>> On Wed, Sep 7, 2016 at 6:48 PM, Mehmet <c...@elchaka.de> wrote:
>>
>> Hey again,
>>>
>>> now i have stopped my osd.12 via
>>>
>>> root@:~# systemctl stop ceph-osd@12
>>>
>>> and when i am flush the journal...
>>>
>>> root@:~# ceph-osd -i 12 --flush-journal
>>> SG_IO: questionable sense data, results may be incorrect
>>> SG_IO: questionable sense data, results may be incorrect
>>> *** Caught signal (Segmentation fault) **
>>>  in thread 7f421d49d700 thread_name:ceph-osd
>>>  ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
>>>  1: (()+0x96bdde) [0x564545e65dde]
>>>  2: (()+0x113d0) [0x7f422277e3d0]
>>>  3: [0x56455055a3c0]
>>> 2016-09-07 17:42:58.128839 7f421d49d700 -1 *** Caught signal
>>> (Segmentation fault) **
>>>  in thread 7f421d49d700 thread_name:ceph-osd
>>>
>>>  ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
>>>  1: (()+0x96bdde) [0x564545e65dde]
>>>  2: (()+0x113d0) [0x7f422277e3d0]
>>>  3: [0x56455055a3c0]
>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>> needed to interpret this.
>>>
>>>      0> 2016-09-07 17:42:58.128839 7f421d49d700 -1 *** Caught
>>> signal (Segmentation fault) **
>>>  in thread 7f421d49d700 thread_name:ceph-osd
>>>
>>>  ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
>>>  1: (()+0x96bdde) [0x564545e65dde]
>>>  2: (()+0x113d0) [0x7f422277e3d0]
>>>  3: [0x56455055a3c0]
>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>> needed to interpret this.
>>>
>>> Segmentation fault
>>>
>>> The logfile with further information
>>> - http://slexy.org/view/s2T8AohMfU [4]
>>>
>>>
>>> I guess i will get same message when i flush the other journals.
>>>
>>> - Mehmet
>>>
>>> Am 2016-09-07 13:23, schrieb Mehmet:
>>>
>>> Hello ceph people,
>>>>
>>>> yesterday i stopped one of my OSDs via
>>>>
>>>> root@:~# systemctl stop ceph-osd@10
>>>>
>>>> and tried to flush the journal for this osd via
>>>>
>>>> root@:~# ceph-osd -i 10 --flush-journal
>>>>
>>>> but getting this output on the screen:
>>>>
>>>> SG_IO: questionable sense data, results may be incorrect
>>>> SG_IO: questionable sense data, results may be incorrect
>>>> *** Caught signal (Segmentation fault) **
>>>>  in thread 7fd846333700 thread_name:ceph-osd
>>>>  ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
>>>>  1: (()+0x96bdde) [0x55f33b862dde]
>>>>  2: (()+0x113d0) [0x7fd84b6143d0]
>>>>  3: [0x55f345bbff80]
>>>> 2016-09-06 22:12:51.850739 7fd846333700 -1 *** Caught signal
>>>> (Segmentation fault) **
>>>>  in thread 7fd846333700 thread_name:ceph-osd
>>>>
>>>>  ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
>>>>  1: (()+0x96bdde) [0x55f33b862dde]
>>>>  2: (()+0x113d0) [0x7fd84b6143d0]
>>>>  3: [0x55f345bbff80]
>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>`
>>>> is
>>>> needed to interpret this.
>>>>
>>>>      0> 2016-09-06 22:12:51.850739 7fd846333700 -1 *** Caught
>>>> signal
>>>> (Segmentation fault) **
>>>>  in thread 7fd846333700 thread_name:ceph-osd
>>>>
>>>>  ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
>>>>  1: (()+0x96bdde) [0x55f33b862dde]
>>>>  2: (()+0x113d0) [0x7fd84b6143d0]
>>>>  3: [0x55f345bbff80]
>>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>`
>>>> is
>>>> needed to interpret this.
>>>>
>>>> Segmentation fault
>>>>
>>>> This is the logfile from my osd.10 with further informations
>>>> - http://slexy.org/view/s21tfwQ1fZ [1]
>>>>
>>>> Today i stopped another OSD (osd.11)
>>>>
>>>> root@:~# systemctl stop ceph-osd@11
>>>>
>>>> I did not not get the above mentioned error - but this
>>>>
>>>> root@:~# ceph-osd -i 11 --flush-journal
>>>> SG_IO: questionable sense data, results may be incorrect
>>>> SG_IO: questionable sense data, results may be incorrect
>>>> 2016-09-07 13:19:39.729894 7f3601a298c0 -1 flushed journal
>>>> /var/lib/ceph/osd/ceph-11/journal for object store
>>>> /var/lib/ceph/osd/ceph-11
>>>>
>>>> This is the logfile from my osd.11 with further informations
>>>> - http://slexy.org/view/s2AlEhV38m [2]
>>>>
>>>>
>>>> This is not realy a case actualy cause i will setup the journal
>>>> partitions again with 20GB (from 5GB actual) an bring the OSD
>>>> then
>>>> bring up again.
>>>> But i thought i should mail this error to the mailing list.
>>>>
>>>> This is my Setup:
>>>>
>>>> *Software/OS*
>>>> - Jewel
>>>> #> ceph tell osd.* version | grep version | uniq
>>>> "version": "ceph version 10.2.2
>>>> (45107e21c568dd033c2f0a3107dec8f0b0e58374)"
>>>>
>>>> #> ceph tell mon.* version
>>>> [...] ceph version 10.2.2
>>>> (45107e21c568dd033c2f0a3107dec8f0b0e58374)
>>>>
>>>> - Ubuntu 16.04 LTS on all OSD and MON Server
>>>> #> uname -a
>>>> 31.08.2016: Linux reilif 4.4.0-36-generic #55-Ubuntu SMP Thu Aug
>>>> 11
>>>> 18:01:55 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
>>>>
>>>> *Server*
>>>> 3x OSD Server, each with
>>>>
>>>> - 2x Intel(R) Xeon(R) CPU E5-2603 v3 @ 1.60GHz ==> 12 Cores, no
>>>> Hyper-Threading
>>>>
>>>> - 64GB RAM
>>>> - 10x 4TB HGST 7K4000 SAS2 (6GB/s) Disks as OSDs
>>>>
>>>> - 1x INTEL SSDPEDMD400G4 (Intel DC P3700 NVMe) as Journaling
>>>> Device
>>>> for 10-12 Disks
>>>>
>>>> - 1x Samsung SSD 840/850 Pro only for the OS
>>>>
>>>> 3x MON Server
>>>> - Two of them with 1x Intel(R) Xeon(R) CPU E3-1265L V2 @ 2.50GHz
>>>> (4
>>>> Cores, 8 Threads) - The third one has 2x Intel(R) Xeon(R) CPU
>>>> L5430 @
>>>> 2.66GHz ==> 8 Cores, no Hyper-Threading
>>>>
>>>> - 32 GB RAM
>>>> - 1x Raid 10 (4 Disks)
>>>>
>>>> *Network*
>>>> - Actualy each Server and Client has on active connection @ 1x
>>>> 1GB; In
>>>> Short this will be changed to 2x 10GB Fibre perhaps with LACP
>>>> when
>>>> possible.
>>>>
>>>> - We do not use Jumbo Frames yet..
>>>>
>>>> - Public and Cluster-Network related Ceph traffic is actualy
>>>> going
>>>> through this one active 1GB Interface on each Server.
>>>>
>>>> hf
>>>> - Mehmet
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [3]
>>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [3]
>>>
>>
>>
>>
>> Links:
>> ------
>> [1] http://slexy.org/view/s21tfwQ1fZ
>> [2] http://slexy.org/view/s2AlEhV38m
>> [3] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> [4] http://slexy.org/view/s2T8AohMfU
>>
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Jewel 10.2.2 - Error when flushing journal

Reply via email to