Hi, > This is the actual logfile for osd.10 > - http://slexy.org/view/s21lhpkLGQ
Unfortunately this log does not contain any new data -- for some reason the log levels haven't changed (see line 36369). Could you please try the following command: ceph-osd -d --flush-journal --debug_filestore 20/20 --debug_journal 20/20 -i 10 Best regards, Alexey On Fri, Sep 9, 2016 at 11:24 AM, Mehmet <c...@elchaka.de> wrote: > Hello Alexey, > > thank you for your mail - my answers inline :) > > Am 2016-09-08 16:24, schrieb Alexey Sheplyakov: > >> Hi, >> >> root@:~# ceph-osd -i 12 --flush-journal >>> >> > SG_IO: questionable sense data, results may be incorrect >> > SG_IO: questionable sense data, results may be incorrect >> >> As far as I understand these lines is a hdparm warning (OSD uses >> hdparm command to query the journal device write cache state). >> >> The message means hdparm is unable to reliably figure out if the drive >> write cache is enabled. This might indicate a hardware problem. >> > > I guess this has to do with the the NVMe-Device (Intel DC P3700 NVMe) > which is used for journaling. > And so.. a normal behavior? > > ceph-osd -i 12 --flush-journal >>> >> >> I think it's a good idea to >> a) check the journal drive (smartctl), >> > > The disks are all fine - checked 2-3 weeks before. > > b) capture a more verbose log, >> >> i.e. add this to ceph.conf >> >> [osd] >> debug filestore = 20/20 >> debug journal = 20/20 >> >> and try flushing the journal once more (note: this won't fix the >> problem, the point is to get a useful log) >> > > I have flushed the the journal @ ~09:55:26 today and got this lines > > root@:~# ceph-osd -i 10 --flush-journal > SG_IO: questionable sense data, results may be incorrect > SG_IO: questionable sense data, results may be incorrect > *** Caught signal (Segmentation fault) ** > in thread 7f38a2ecf700 thread_name:ceph-osd > ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374) > 1: (()+0x96bdde) [0x560356296dde] > 2: (()+0x113d0) [0x7f38a81b03d0] > 3: [0x560360f79f00] > 2016-09-09 09:55:26.446925 7f38a2ecf700 -1 *** Caught signal (Segmentation > fault) ** > in thread 7f38a2ecf700 thread_name:ceph-osd > > ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374) > 1: (()+0x96bdde) [0x560356296dde] > 2: (()+0x113d0) [0x7f38a81b03d0] > 3: [0x560360f79f00] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed > to interpret this. > > 0> 2016-09-09 09:55:26.446925 7f38a2ecf700 -1 *** Caught signal > (Segmentation fault) ** > in thread 7f38a2ecf700 thread_name:ceph-osd > > ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374) > 1: (()+0x96bdde) [0x560356296dde] > 2: (()+0x113d0) [0x7f38a81b03d0] > 3: [0x560360f79f00] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed > to interpret this. > > Segmentation fault > > > This is the actual logfile for osd.10 > - http://slexy.org/view/s21lhpkLGQ > > By the way: > I have done "ceph osd set noout" before stop and flushing. > > Hope this is useful for you! > > - Mehmet > > Best regards, >> Alexey >> >> On Wed, Sep 7, 2016 at 6:48 PM, Mehmet <c...@elchaka.de> wrote: >> >> Hey again, >>> >>> now i have stopped my osd.12 via >>> >>> root@:~# systemctl stop ceph-osd@12 >>> >>> and when i am flush the journal... >>> >>> root@:~# ceph-osd -i 12 --flush-journal >>> SG_IO: questionable sense data, results may be incorrect >>> SG_IO: questionable sense data, results may be incorrect >>> *** Caught signal (Segmentation fault) ** >>> in thread 7f421d49d700 thread_name:ceph-osd >>> ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374) >>> 1: (()+0x96bdde) [0x564545e65dde] >>> 2: (()+0x113d0) [0x7f422277e3d0] >>> 3: [0x56455055a3c0] >>> 2016-09-07 17:42:58.128839 7f421d49d700 -1 *** Caught signal >>> (Segmentation fault) ** >>> in thread 7f421d49d700 thread_name:ceph-osd >>> >>> ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374) >>> 1: (()+0x96bdde) [0x564545e65dde] >>> 2: (()+0x113d0) [0x7f422277e3d0] >>> 3: [0x56455055a3c0] >>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>> needed to interpret this. >>> >>> 0> 2016-09-07 17:42:58.128839 7f421d49d700 -1 *** Caught >>> signal (Segmentation fault) ** >>> in thread 7f421d49d700 thread_name:ceph-osd >>> >>> ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374) >>> 1: (()+0x96bdde) [0x564545e65dde] >>> 2: (()+0x113d0) [0x7f422277e3d0] >>> 3: [0x56455055a3c0] >>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>> needed to interpret this. >>> >>> Segmentation fault >>> >>> The logfile with further information >>> - http://slexy.org/view/s2T8AohMfU [4] >>> >>> >>> I guess i will get same message when i flush the other journals. >>> >>> - Mehmet >>> >>> Am 2016-09-07 13:23, schrieb Mehmet: >>> >>> Hello ceph people, >>>> >>>> yesterday i stopped one of my OSDs via >>>> >>>> root@:~# systemctl stop ceph-osd@10 >>>> >>>> and tried to flush the journal for this osd via >>>> >>>> root@:~# ceph-osd -i 10 --flush-journal >>>> >>>> but getting this output on the screen: >>>> >>>> SG_IO: questionable sense data, results may be incorrect >>>> SG_IO: questionable sense data, results may be incorrect >>>> *** Caught signal (Segmentation fault) ** >>>> in thread 7fd846333700 thread_name:ceph-osd >>>> ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374) >>>> 1: (()+0x96bdde) [0x55f33b862dde] >>>> 2: (()+0x113d0) [0x7fd84b6143d0] >>>> 3: [0x55f345bbff80] >>>> 2016-09-06 22:12:51.850739 7fd846333700 -1 *** Caught signal >>>> (Segmentation fault) ** >>>> in thread 7fd846333700 thread_name:ceph-osd >>>> >>>> ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374) >>>> 1: (()+0x96bdde) [0x55f33b862dde] >>>> 2: (()+0x113d0) [0x7fd84b6143d0] >>>> 3: [0x55f345bbff80] >>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` >>>> is >>>> needed to interpret this. >>>> >>>> 0> 2016-09-06 22:12:51.850739 7fd846333700 -1 *** Caught >>>> signal >>>> (Segmentation fault) ** >>>> in thread 7fd846333700 thread_name:ceph-osd >>>> >>>> ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374) >>>> 1: (()+0x96bdde) [0x55f33b862dde] >>>> 2: (()+0x113d0) [0x7fd84b6143d0] >>>> 3: [0x55f345bbff80] >>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` >>>> is >>>> needed to interpret this. >>>> >>>> Segmentation fault >>>> >>>> This is the logfile from my osd.10 with further informations >>>> - http://slexy.org/view/s21tfwQ1fZ [1] >>>> >>>> Today i stopped another OSD (osd.11) >>>> >>>> root@:~# systemctl stop ceph-osd@11 >>>> >>>> I did not not get the above mentioned error - but this >>>> >>>> root@:~# ceph-osd -i 11 --flush-journal >>>> SG_IO: questionable sense data, results may be incorrect >>>> SG_IO: questionable sense data, results may be incorrect >>>> 2016-09-07 13:19:39.729894 7f3601a298c0 -1 flushed journal >>>> /var/lib/ceph/osd/ceph-11/journal for object store >>>> /var/lib/ceph/osd/ceph-11 >>>> >>>> This is the logfile from my osd.11 with further informations >>>> - http://slexy.org/view/s2AlEhV38m [2] >>>> >>>> >>>> This is not realy a case actualy cause i will setup the journal >>>> partitions again with 20GB (from 5GB actual) an bring the OSD >>>> then >>>> bring up again. >>>> But i thought i should mail this error to the mailing list. >>>> >>>> This is my Setup: >>>> >>>> *Software/OS* >>>> - Jewel >>>> #> ceph tell osd.* version | grep version | uniq >>>> "version": "ceph version 10.2.2 >>>> (45107e21c568dd033c2f0a3107dec8f0b0e58374)" >>>> >>>> #> ceph tell mon.* version >>>> [...] ceph version 10.2.2 >>>> (45107e21c568dd033c2f0a3107dec8f0b0e58374) >>>> >>>> - Ubuntu 16.04 LTS on all OSD and MON Server >>>> #> uname -a >>>> 31.08.2016: Linux reilif 4.4.0-36-generic #55-Ubuntu SMP Thu Aug >>>> 11 >>>> 18:01:55 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux >>>> >>>> *Server* >>>> 3x OSD Server, each with >>>> >>>> - 2x Intel(R) Xeon(R) CPU E5-2603 v3 @ 1.60GHz ==> 12 Cores, no >>>> Hyper-Threading >>>> >>>> - 64GB RAM >>>> - 10x 4TB HGST 7K4000 SAS2 (6GB/s) Disks as OSDs >>>> >>>> - 1x INTEL SSDPEDMD400G4 (Intel DC P3700 NVMe) as Journaling >>>> Device >>>> for 10-12 Disks >>>> >>>> - 1x Samsung SSD 840/850 Pro only for the OS >>>> >>>> 3x MON Server >>>> - Two of them with 1x Intel(R) Xeon(R) CPU E3-1265L V2 @ 2.50GHz >>>> (4 >>>> Cores, 8 Threads) - The third one has 2x Intel(R) Xeon(R) CPU >>>> L5430 @ >>>> 2.66GHz ==> 8 Cores, no Hyper-Threading >>>> >>>> - 32 GB RAM >>>> - 1x Raid 10 (4 Disks) >>>> >>>> *Network* >>>> - Actualy each Server and Client has on active connection @ 1x >>>> 1GB; In >>>> Short this will be changed to 2x 10GB Fibre perhaps with LACP >>>> when >>>> possible. >>>> >>>> - We do not use Jumbo Frames yet.. >>>> >>>> - Public and Cluster-Network related Ceph traffic is actualy >>>> going >>>> through this one active 1GB Interface on each Server. >>>> >>>> hf >>>> - Mehmet >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users@lists.ceph.com >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [3] >>>> >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [3] >>> >> >> >> >> Links: >> ------ >> [1] http://slexy.org/view/s21tfwQ1fZ >> [2] http://slexy.org/view/s2AlEhV38m >> [3] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> [4] http://slexy.org/view/s2T8AohMfU >> >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com