Re: [ceph-users] Jewel 10.2.2 - Error when flushing journal

Alexey Sheplyakov Thu, 08 Sep 2016 07:24:33 -0700

Hi,

> root@:~# ceph-osd -i 12 --flush-journal
> SG_IO: questionable sense data, results may be incorrect
> SG_IO: questionable sense data, results may be incorrect


As far as I understand these lines is a hdparm warning (OSD uses hdparm
command to query the journal device write cache state).
The message means hdparm is unable to reliably figure out if the drive
write cache is enabled. This might indicate a hardware problem.

> ceph-osd -i 12 --flush-journal

I think it's a good idea to a) check the journal drive (smartctl), b)
capture a more verbose log,
i.e. add this to ceph.conf

[osd]
debug filestore = 20/20
debug journal = 20/20

and try flushing the journal once more (note: this won't fix the problem,
the point is to get a useful log)

Best regards,
      Alexey


On Wed, Sep 7, 2016 at 6:48 PM, Mehmet <c...@elchaka.de> wrote:

> Hey again,
>
> now i have stopped my osd.12 via
>
> root@:~# systemctl stop ceph-osd@12
>
> and when i am flush the journal...
>
> root@:~# ceph-osd -i 12 --flush-journal
> SG_IO: questionable sense data, results may be incorrect
> SG_IO: questionable sense data, results may be incorrect
> *** Caught signal (Segmentation fault) **
>  in thread 7f421d49d700 thread_name:ceph-osd
>  ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
>  1: (()+0x96bdde) [0x564545e65dde]
>  2: (()+0x113d0) [0x7f422277e3d0]
>  3: [0x56455055a3c0]
> 2016-09-07 17:42:58.128839 7f421d49d700 -1 *** Caught signal (Segmentation
> fault) **
>  in thread 7f421d49d700 thread_name:ceph-osd
>
>  ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
>  1: (()+0x96bdde) [0x564545e65dde]
>  2: (()+0x113d0) [0x7f422277e3d0]
>  3: [0x56455055a3c0]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> to interpret this.
>
>      0> 2016-09-07 17:42:58.128839 7f421d49d700 -1 *** Caught signal
> (Segmentation fault) **
>  in thread 7f421d49d700 thread_name:ceph-osd
>
>  ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
>  1: (()+0x96bdde) [0x564545e65dde]
>  2: (()+0x113d0) [0x7f422277e3d0]
>  3: [0x56455055a3c0]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> to interpret this.
>
> Segmentation fault
>
> The logfile with further information
> - http://slexy.org/view/s2T8AohMfU
>
> I guess i will get same message when i flush the other journals.
>
> - Mehmet
>
>
> Am 2016-09-07 13:23, schrieb Mehmet:
>
>> Hello ceph people,
>>
>> yesterday i stopped one of my OSDs via
>>
>> root@:~# systemctl stop ceph-osd@10
>>
>> and tried to flush the journal for this osd via
>>
>> root@:~# ceph-osd -i 10 --flush-journal
>>
>> but getting this output on the screen:
>>
>> SG_IO: questionable sense data, results may be incorrect
>> SG_IO: questionable sense data, results may be incorrect
>> *** Caught signal (Segmentation fault) **
>>  in thread 7fd846333700 thread_name:ceph-osd
>>  ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
>>  1: (()+0x96bdde) [0x55f33b862dde]
>>  2: (()+0x113d0) [0x7fd84b6143d0]
>>  3: [0x55f345bbff80]
>> 2016-09-06 22:12:51.850739 7fd846333700 -1 *** Caught signal
>> (Segmentation fault) **
>>  in thread 7fd846333700 thread_name:ceph-osd
>>
>>  ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
>>  1: (()+0x96bdde) [0x55f33b862dde]
>>  2: (()+0x113d0) [0x7fd84b6143d0]
>>  3: [0x55f345bbff80]
>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>> needed to interpret this.
>>
>>      0> 2016-09-06 22:12:51.850739 7fd846333700 -1 *** Caught signal
>> (Segmentation fault) **
>>  in thread 7fd846333700 thread_name:ceph-osd
>>
>>  ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
>>  1: (()+0x96bdde) [0x55f33b862dde]
>>  2: (()+0x113d0) [0x7fd84b6143d0]
>>  3: [0x55f345bbff80]
>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>> needed to interpret this.
>>
>> Segmentation fault
>>
>> This is the logfile from my osd.10 with further informations
>> - http://slexy.org/view/s21tfwQ1fZ
>>
>> Today i stopped another OSD (osd.11)
>>
>> root@:~# systemctl stop ceph-osd@11
>>
>> I did not not get the above mentioned error - but this
>>
>> root@:~# ceph-osd -i 11 --flush-journal
>> SG_IO: questionable sense data, results may be incorrect
>> SG_IO: questionable sense data, results may be incorrect
>> 2016-09-07 13:19:39.729894 7f3601a298c0 -1 flushed journal
>> /var/lib/ceph/osd/ceph-11/journal for object store
>> /var/lib/ceph/osd/ceph-11
>>
>> This is the logfile from my osd.11 with further informations
>> - http://slexy.org/view/s2AlEhV38m
>>
>> This is not realy a case actualy cause i will setup the journal
>> partitions again with 20GB (from 5GB actual) an bring the OSD then
>> bring up again.
>> But i thought i should mail this error to the mailing list.
>>
>> This is my Setup:
>>
>> *Software/OS*
>> - Jewel
>> #> ceph tell osd.* version | grep version | uniq
>> "version": "ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec
>> 8f0b0e58374)"
>>
>> #> ceph tell mon.* version
>> [...] ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
>>
>> - Ubuntu 16.04 LTS on all OSD and MON Server
>> #> uname -a
>> 31.08.2016: Linux reilif 4.4.0-36-generic #55-Ubuntu SMP Thu Aug 11
>> 18:01:55 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
>>
>> *Server*
>> 3x OSD Server, each with
>>
>> - 2x Intel(R) Xeon(R) CPU E5-2603 v3 @ 1.60GHz ==> 12 Cores, no
>> Hyper-Threading
>>
>> - 64GB RAM
>> - 10x 4TB HGST 7K4000 SAS2 (6GB/s) Disks as OSDs
>>
>> - 1x INTEL SSDPEDMD400G4 (Intel DC P3700 NVMe) as Journaling Device
>> for 10-12 Disks
>>
>> - 1x Samsung SSD 840/850 Pro only for the OS
>>
>> 3x MON Server
>> - Two of them with 1x Intel(R) Xeon(R) CPU E3-1265L V2 @ 2.50GHz (4
>> Cores, 8 Threads) - The third one has 2x Intel(R) Xeon(R) CPU L5430 @
>> 2.66GHz ==> 8 Cores, no Hyper-Threading
>>
>> - 32 GB RAM
>> - 1x Raid 10 (4 Disks)
>>
>> *Network*
>> - Actualy each Server and Client has on active connection @ 1x 1GB; In
>> Short this will be changed to 2x 10GB Fibre perhaps with LACP when
>> possible.
>>
>> - We do not use Jumbo Frames yet..
>>
>> - Public and Cluster-Network related Ceph traffic is actualy going
>> through this one active 1GB Interface on each Server.
>>
>> hf
>> - Mehmet
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Jewel 10.2.2 - Error when flushing journal

Reply via email to