"you use XFS on your OSDs?" This OSD was formatted in BTRFS as a whole block device /dev/sdc (with no partition table). Then I moved from BTRFS to XFS /dev/sdc1 (with partition table), because BTRFS was v-v-very slow. Maybe partprober sees some old signatures from first sectors of that disk... By "move" I mean removing OSD from cluster as described in official docs http://ceph.com/docs/master/rados/operations/add-or-rm-osds/, and then adding it as new OSD.
пт, 7 авг. 2015 г. в 14:46, Межов Игорь Александрович <me...@yuterra.ru>: > Hi! > > > >If there is cable or SATA-controller issues, will it be shown in > /var/log/dmesg? > > If it will lead to read errors, it will be logged in dmesg. If it cause > only > SATA command retransmission, it maybe won't logged in dmesg, > but have to be shown in SMART attributes. > > And anyway, we face only some (<10) inconsistences during a year > in production usage, not so large amount, as you have. So maybe > it is not a disk error issue. > > Also, there are two strange things I can see: > - do you need SELinux enabled? AFAIK, SELinux security attributes are > stored in extended attrs (xatts), and Ceph also use xattrs in intensive > manner. Maybe some problems between them? Try to disable SELinux > at boot - I saw this suggestion quite often, when do trobleshooting. > - As I cen see, you use XFS on your OSDs? But in dmesg we see > a records about the same /dev/sdc: > > > >BTRFS: device fsid a3176b6c-2e2b-4121-9dc3-f191d469eccc devid 1 > >transid 287115 /dev/sdc > > and > > > >XFS (sdc1): Mounting V4 Filesystem > >XFS (sdc1): Ending clean mount > > Is it btrfs or XFS? I, personally, dont have much success with btrfs. > Some of my test usages ends with completely hang machines. > > > Megov Igor > CIO, Yuterr > > > > block0 dmesg: > [ 5.296302] XFS (sdb1): Mounting V4 Filesystem > [ 5.316487] XFS (sda1): Mounting V4 Filesystem > [ 5.464439] XFS (sdb1): Ending clean mount > [ 5.464466] SELinux: initialized (dev sdb1, type xfs), uses xattr > [ 5.506185] XFS (sda1): Ending clean mount > [ 5.506214] SELinux: initialized (dev sda1, type xfs), uses xattr > [ 5.649101] XFS (sdc1): Ending clean mount > [ 5.649129] SELinux: initialized (dev sdc1, type xfs), uses xattr > [ 5.668062] systemd-journald[500]: Received request to flush runtime > journal from PID 1 > [ 5.786559] type=1305 audit(1438754525.356:4): audit_pid=629 old=0 > auid=4294967295 ses=4294967295 subj=system_u:system_r:auditd_t:s0 res=1 > [ 5.870809] snd_hda_intel 0000:00:1b.0: Codec #2 probe error; disabling > it... > [ 5.976594] sound hdaudioC0D0: autoconfig: line_outs=1 > (0x14/0x0/0x0/0x0/0x0) type:line > [ 5.976604] sound hdaudioC0D0: speaker_outs=0 (0x0/0x0/0x0/0x0/0x0) > [ 5.976609] sound hdaudioC0D0: hp_outs=1 (0x1b/0x0/0x0/0x0/0x0) > [ 5.976613] sound hdaudioC0D0: mono: mono_out=0x0 > [ 5.976617] sound hdaudioC0D0: dig-out=0x11/0x0 > [ 5.976621] sound hdaudioC0D0: inputs: > [ 5.976627] sound hdaudioC0D0: Rear Mic=0x18 > [ 5.976632] sound hdaudioC0D0: Front Mic=0x19 > [ 5.976637] sound hdaudioC0D0: Line=0x1a > [ 5.976641] sound hdaudioC0D0: CD=0x1c > > block2 dmesg: > [ 4.987126] XFS (sda1): Mounting V4 Filesystem > [ 4.990737] raid6: sse2x1 601 MB/s > [ 5.007765] raid6: sse2x2 632 MB/s > [ 5.024711] raid6: sse2x4 1316 MB/s > [ 5.024717] raid6: using algorithm sse2x4 (1316 MB/s) > [ 5.024720] raid6: using ssse3x2 recovery algorithm > [ 5.046099] XFS (sda1): Ending clean mount > [ 5.046126] SELinux: initialized (dev sda1, type xfs), uses xattr > [ 5.074257] Btrfs loaded > [ 5.075283] BTRFS: device fsid a3176b6c-2e2b-4121-9dc3-f191d469eccc > devid 1 transid 287115 /dev/sdc > [ 5.098721] Adding 4882428k swap on /dev/mapper/root-swap00. > Priority:-1 extents:1 across:4882428k SSFS > [ 5.157349] XFS (sdc1): Mounting V4 Filesystem > [ 5.222992] XFS (sdb1): Mounting V4 Filesystem > [ 5.362687] XFS (sdc1): Ending clean mount > [ 5.362738] SELinux: initialized (dev sdc1, type xfs), uses xattr > [ 5.417178] XFS (sdb1): Ending clean mount > [ 5.417208] SELinux: initialized (dev sdb1, type xfs), uses xattr > [ 5.429280] systemd-journald[497]: Received request to flush runtime > journal from PID 1 > [ 5.533922] type=1305 audit(1438754526.111:4): audit_pid=626 old=0 > auid=4294967295 ses=4294967295 subj=system_u:system_r:auditd_t:s0 res=1 > > пт, 7 авг. 2015 г. в 14:08, Межов Игорь Александрович <me...@yuterra.ru>: > >> Hi! >> >> Do you have any disk errors in dmesg output? In our practice, every time >> the deep >> scrub found inconsistent PG, we also found a disk error, that was the >> reason. >> Sometimes it was media errors (bad sectors), one time - bad sata cable >> and >> we also had some raid/hba firmware issues. But in all cases, when we see >> inconsistent >> PG - we also see disk errors. So, please, check them at your cluster >> firstly. >> >> Megov Igor >> CIO, Yuterra >> >> >>
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com