[ceph-users] Re: Bluestore caching oddities, again

Christian Balzer Wed, 07 Aug 2019 20:26:59 -0700


Hello Sage,



On Thu, 8 Aug 2019 02:23:15 +0000 (UTC) Sage Weil wrote:

> On Thu, 8 Aug 2019, Christian Balzer wrote:
> > 
> > Hello again,
> > 
> > Getting back to this:
> > On Sun, 4 Aug 2019 10:47:27 +0900 Christian Balzer wrote:
> >   
> > > Hello,
> > > 
> > > preparing the first production bluestore, nautilus (latest) based cluster
> > > I've run into the same things other people and myself ran into before.
> > > 
> > > Firstly HW, 3 nodes with 12 SATA HDDs each, IT mode LSI 3008, wal/db on
> > > 40GB SSD partitions. (boy do I hate the inability of ceph-volume to deal
> > > with raw partitions).
> > > SSDs aren't a bottleneck in any scenario.
> > > Single E5-1650 v3 @ 3.50GHz, cpu isn't a bottleneck in any scenario, less
> > > than 15% of a core per OSD.
> > > 
> > > Connection is via 40GB/s infiniband, IPoIB, no issues here as numbers 
> > > later
> > > will show.
> > > 
> > > Clients are KVMs on Epyc based compute nodes, maybe some more speed could
> > > be squeezed out here with different VM configs, but the cpu isn't an issue
> > > in the problem cases.
> > > 
> > > 
> > > 
> > > 1. 4k random I/O can cause degraded PGs
> > > I've run into the same/similar issue as Nathan Fish here:
> > > https://www.spinics.net/lists/ceph-users/msg526
> > > During the first 2 tests with 4k random I/O I got shortly degraded PGs as
> > > well, with no indication in CPU or SSD utilization accounting for this.
> > > HDDs were of course busy at that time.
> > > Wasn't able to reproduce this so far, but it leaves me less than
> > > confident. 
> > > 
> > >   
> > This happened again yesterday when rsyncing 260GB of average 4MB files
> > into a Ceph image backed VM.
> > Given the nature of this rsync nothing on the ceph nodes was the least bit
> > busy, the HDDs were all below 15% utilization, CPU bored, etc.
> > 
> > Still we got:
> > ---
> > 2019-08-07 15:38:23.452580 osd.21 (osd.21) 651 : cluster [DBG] 1.125 
> > starting backfill to osd.9 from (0'0,0'0] MAX to 1297'21584
> > 2019-08-07 15:38:24.454942 mon.ceph-05 (mon.0) 182756 : cluster [WRN] 
> > Health check failed: Reduced data availability: 2 pgs peering 
> > (PG_AVAILABILITY)
> > 2019-08-07 15:38:25.396756 mon.ceph-05 (mon.0) 182757 : cluster [DBG] 
> > osdmap e1302: 36 total, 36 up, 36 in
> > 2019-08-07 15:38:23.452026 osd.12 (osd.12) 767 : cluster [DBG] 1.105 
> > starting backfill to osd.25 from (0'0,0'0] MAX to 1297'6782
> > ---  
> 
> Is the balancer enabled?  Maybe it is adjusting the PG distribution a bit.
> 
It is indeed and that would explain things, though I did run it manually a
few times and the PGs are all within one of each other, so I didn't really
expect any further adjustment needs as this is only having a single pool,
RBD. 

Would be nice if it spoke up not just in the audit.log:
---
2019-08-07 15:38:21.092104 mon.ceph-05 (mon.0) 182680 : audit [INF] 
from='mgr.196195 10.0.8.25:0/960' entity='mgr.ceph-05' cmd=[{"item": "osd.0", 
"prefix": "osd crush weight-set reweight-compat", "weight": 
[2.504257053831929], "format": "json"}]: dispatch
---

I turned it off now, as I don't expect significant variances going forward.

Thanks,

Christian

> > Unfortunately all I have in the OSD log is this:
> > ---
> > 2019-08-07 15:38:23.461 7f155e71b700  1 osd.9 pg_epoch: 1299 pg[1.125( 
> > empty local-lis/les=0/0 n=0 ec=189/189 lis/c 1286/1286 les/c/f 1287/1287/0 
> > 1298/1299/189) [21,9,28]/[21,28,3] r=-1 lpr=1299 pi=[1286,1299)/1 crt=0'0 
> > unknown mbc={}] state<Start>: transitioning to Stray
> > 2019-08-07 15:38:24.353 7f155e71b700  1 osd.9 pg_epoch: 1301 pg[1.125( v 
> > 1297'21584 (1246'18584,1297'21584] local-lis/les=1299/1300 n=5 ec=189/189 
> > lis/c 1299/1299 les/c/f 1300/1300/0 1298/1301/189) [21,9,28] r=1 lpr=1301 
> > pi=[1299,1301)/1 luod=0'0 crt=1297'21584 active mbc={}] 
> > start_peering_interval up [21,9,28] -> [21,9,28], acting [21,28,3] -> 
> > [21,9,28], acting_primary 21 -> 21, up_primary 21 -> 21, role -1 -> 1, 
> > features acting 4611087854031667199 upacting 4611087854031667199
> > 2019-08-07 15:38:24.353 7f155e71b700  1 osd.9 pg_epoch: 1301 pg[1.125( v 
> > 1297'21584 (1246'18584,1297'21584] local-lis/les=1299/1300 n=5 ec=189/189 
> > lis/c 1299/1299 les/c/f 1300/1300/0 1298/1301/189) [21,9,28] r=1 lpr=1301 
> > pi=[1299,1301)/1 crt=1297'21584 unknown NOTIFY mbc={}] state<Start>: 
> > transitioning to Stray
> > ---
> > 
> > How can I find out what happened here, given that it might not happen
> > again anytime soon cranking up debug levels now is a tad late.  
> 
> In the past we had "problems" where the degraded count would increase in 
> cases where we were migrated PGs, even though there aren't actually any 
> objects with too few replicas.  I think David Zafman ironed most/all 
> of these out, but perhaps they weren't all in Nautilus? I can't quite 
> remember.
> 
> s
> 
> 


-- 
Christian Balzer        Network/Systems Engineer                
ch...@gol.com           Rakuten Mobile Inc.
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Bluestore caching oddities, again

Reply via email to