Christop, do you have any links to the bug? On Fri, Dec 21, 2018 at 11:07 AM Christoph Adomeit < christoph.adom...@gatworks.de> wrote:
> Hi, > > same here but also for pgs in cephfs pools. > > As far as I know there is a known bug that under memory pressure some > reads return zero > and this will lead to the error message. > > I have set nodeep-scrub and i am waiting for 12.2.11. > > Thanks > Christoph > > On Fri, Dec 21, 2018 at 03:23:21PM +0100, Hervé Ballans wrote: > > Hi Frank, > > > > I encounter exactly the same issue with the same disks than yours. Every > > day, after a batch of deep scrubbing operation, ther are generally > between 1 > > and 3 inconsistent pgs, and that, on different OSDs. > > > > It could confirm a problem on these disks, but : > > > > - it concerns only the pgs of the rbd pool, not those of cephfs pools > (the > > same disk model is used) > > > > - I encounter this when I was running 12.2.5, not when I upgraded in > 12.2.8 > > but the problem appears again after upgrade in 12.2.10 > > > > - On my side, smartctl and dmesg do not show any media error, so I'm > pretty > > sure that physical media is not concerned... > > > > Small precision: each disk is configured with RAID0 on a PERC740P, is > this > > also the case for you or are your disks in JBOD mode ? > > > > Another question: in your case, the OSD who is involved in the > inconsistent > > pgs is it always the same one or is it a new one every time ? > > > > For information, currently, the manually 'ceph pg repair' command works > well > > each time... > > > > Context: Luminous 12.2.10, Bluestore OSD with data block on SATA disks > and > > WAL/DB on NVMe, rbd configuration replica 3/2 > > > > Cheers, > > rv > > > > Few outputs: > > > > $ sudo ceph -s > > cluster: > > id: 838506b7-e0c6-4022-9e17-2d1cf9458be6 > > health: HEALTH_ERR > > 3 scrub errors > > Possible data damage: 3 pgs inconsistent > > > > services: > > mon: 3 daemons, quorum inf-ceph-mon0,inf-ceph-mon1,inf-ceph-mon2 > > mgr: inf-ceph-mon0(active), standbys: inf-ceph-mon1, inf-ceph-mon2 > > mds: cephfs_home-2/2/2 up > > {0=inf-ceph-mon1=up:active,1=inf-ceph-mon0=up:active}, 1 up:standby > > osd: 126 osds: 126 up, 126 in > > > > data: > > pools: 3 pools, 4224 pgs > > objects: 23.35M objects, 20.9TiB > > usage: 64.9TiB used, 136TiB / 201TiB avail > > pgs: 4221 active+clean > > 3 active+clean+inconsistent > > > > io: > > client: 2.62KiB/s rd, 2.25MiB/s wr, 0op/s rd, 118op/s wr > > > > $ sudo ceph health detail > > HEALTH_ERR 3 scrub errors; Possible data damage: 3 pgs inconsistent > > OSD_SCRUB_ERRORS 3 scrub errors > > PG_DAMAGED Possible data damage: 3 pgs inconsistent > > pg 9.27 is active+clean+inconsistent, acting [78,107,96] > > pg 9.260 is active+clean+inconsistent, acting [84,113,62] > > pg 9.6b9 is active+clean+inconsistent, acting [79,107,80] > > $ sudo rados list-inconsistent-obj 9.27 --format=json-prettyrados > > list-inconsistent-obj 9.27 --format=json-pretty |grep error > > "errors": [], > > "union_shard_errors": [ > > "read_error" > > "errors": [ > > "read_error" > > "errors": [], > > "errors": [], > > $ sudo rados list-inconsistent-obj 9.260 --format=json-prettyrados > > list-inconsistent-obj 9.260 --format=json-pretty |grep error > > "errors": [], > > "union_shard_errors": [ > > "read_error" > > "errors": [], > > "errors": [], > > "errors": [ > > "read_error" > > $ sudo rados list-inconsistent-obj 9.6b9 --format=json-prettyrados > > list-inconsistent-obj 9.6b9 --format=json-pretty |grep error > > "errors": [], > > "union_shard_errors": [ > > "read_error" > > "errors": [ > > "read_error" > > "errors": [], > > "errors": [], > > $ sudo ceph pg repair 9.27 > > instructing pg 9.27 on osd.78 to repair > > $ sudo ceph pg repair 9.260 > > instructing pg 9.260 on osd.84 to repair > > $ sudo ceph pg repair 9.6b9 > > instructing pg 9.6b9 on osd.79 to repair > > $ sudo ceph -s > > cluster: > > id: 838506b7-e0c6-4022-9e17-2d1cf9458be6 > > health: HEALTH_OK > > > > services: > > mon: 3 daemons, quorum inf-ceph-mon0,inf-ceph-mon1,inf-ceph-mon2 > > mgr: inf-ceph-mon0(active), standbys: inf-ceph-mon1, inf-ceph-mon2 > > mds: cephfs_home-2/2/2 up > > {0=inf-ceph-mon1=up:active,1=inf-ceph-mon0=up:active}, 1 up:standby > > osd: 126 osds: 126 up, 126 in > > > > data: > > pools: 3 pools, 4224 pgs > > objects: 23.35M objects, 20.9TiB > > usage: 64.9TiB used, 136TiB / 201TiB avail > > pgs: 4224 active+clean > > > > io: > > client: 195KiB/s rd, 7.19MiB/s wr, 17op/s rd, 127op/s wr > > > > > > > > Le 19/12/2018 à 04:48, Frank Ritchie a écrit : > > >Hi all, > > > > > >I have been receiving alerts for: > > > > > >Possible data damage: 1 pg inconsistent > > > > > >almost daily for a few weeks now. When I check: > > > > > >rados list-inconsistent-obj $PG --format=json-pretty > > > > > >I will always see a read_error. When I run a deep scrub on the PG I will > > >see: > > > > > >head candidate had a read error > > > > > >When I check dmesg on the osd node I see: > > > > > >blk_update_request: critical medium error, dev sdX, sector 123 > > > > > >I will also see a few uncorrected read errors in smartctl. > > > > > >Info: > > >Ceph: ceph version 12.2.4-30.el7cp > > >OSD: Toshiba 1.8TB SAS 10K > > >120 OSDs total > > > > > >Has anyone else seen these alerts occur almost daily? Can the errors > > >possibly be due to deep scrubbing too aggressively? > > > > > >I realize these errors indicate potential failing drives but I can't > > >replace a drive daily. > > > > > >thx > > >Frank > > > > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > -- > Kein Backup - kein Mitleid > Christoph Adomeit > GATWORKS GmbH > Reststrauch 191 > 41199 Moenchengladbach > Sitz: Moenchengladbach > Amtsgericht Moenchengladbach, HRB 6303 > Geschaeftsfuehrer: > Christoph Adomeit, Hans Wilhelm Terstappen > > christoph.adom...@gatworks.de Internetloesungen vom Feinsten > Fon. +49 2166 9149-32 Fax. +49 2166 9149-10 >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com