Re: [ceph-users] SLOW SSD's after moving to Bluestore

Christian Balzer Mon, 10 Dec 2018 18:38:24 -0800

On Mon, 10 Dec 2018 21:01:31 -0500 Tyler Bishop wrote:

> LVM |         dm-0  | busy    101%  | read     137  | write   1761  |
> KiB/r      4  | KiB/w     30  | MBr/s    0.1  | MBw/s    5.3  | avq
> 185.42  | avio 5.31 ms  |
> DSK |          sdb  | busy    100%  | read     127  | write   1208  |
> KiB/r      4  | KiB/w     32  | MBr/s    0.1  | MBw/s    3.9  | avq
> 58.39  | avio 7.49 ms  |
> _____________________________________________
>
OK, after stretching that back out to its original length, you're having
100% busy SSDs at just 5MB/s writes.

hitting "1" in atop will give you per second results like this for an
Intel DC S3600 in a cache tier (filestore):

----
DSK |          sde  |              | busy      3%  |              |  read     
0/s |               | write  476/s |               | KiB/r      0 |             
  |               | KiB/w     17 |               | MBr/s   0.00 |               
| MBw/s   8.34  |              | avq    49.35  |              |  avio 0.06 ms |
---

If these SSDs worked decently with filestore I can't really see how they
would be this much worse with bluestore, still got an original setup to
compare?
Anything else like controller cache/battery/BIOS settings/etc that might
have changed during the migration?

Christian

> Tyler Bishop
> EST 2007
> 
> 
> O: 513-299-7108 x1000
> M: 513-646-5809
> http://BeyondHosting.net
> 
> 
> This email is intended only for the recipient(s) above and/or
> otherwise authorized personnel. The information contained herein and
> attached is confidential and the property of Beyond Hosting. Any
> unauthorized copying, forwarding, printing, and/or disclosing any
> information related to this email is prohibited. If you received this
> message in error, please contact the sender and destroy all copies of
> this email and any attachment(s).
> 
> 
> On Mon, Dec 10, 2018 at 8:57 PM Christian Balzer <ch...@gol.com> wrote:
> >
> > Hello,
> >
> > On Mon, 10 Dec 2018 20:43:40 -0500 Tyler Bishop wrote:
> >  
> > > I don't think thats my issue here because I don't see any IO to justify 
> > > the
> > > latency.  Unless the IO is minimal and its ceph issuing a bunch of 
> > > discards
> > > to the ssd and its causing it to slow down while doing that.
> > >  
> >
> > What does atop have to say?
> >
> > Discards/Trims are usually visible in it, this is during a fstrim of a
> > RAID1 / :
> > ---
> > DSK |          sdb  | busy     81% |  read       0 | write   8587  | MBw/s 
> > 2323.4 |  avio 0.47 ms |
> > DSK |          sda  | busy     70% |  read       2 | write   8587  | MBw/s 
> > 2323.4 |  avio 0.41 ms |
> > ---
> >
> > The numbers tend to be a lot higher than what the actual interface is
> > capable of, clearly the SSD is reporting its internal activity.
> >
> > In any case, it should give a good insight of what is going on activity
> > wise.
> > Also for posterity and curiosity, what kind of SSDs?
> >
> > Christian
> >  
> > > Log isn't showing anything useful and I have most debugging disabled.
> > >
> > >
> > >
> > > On Mon, Dec 10, 2018 at 7:43 PM Mark Nelson <mnel...@redhat.com> wrote:
> > >  
> > > > Hi Tyler,
> > > >
> > > > I think we had a user a while back that reported they had background
> > > > deletion work going on after upgrading their OSDs from filestore to
> > > > bluestore due to PGs having been moved around.  Is it possible that your
> > > > cluster is doing a bunch of work (deletion or otherwise) beyond the
> > > > regular client load?  I don't remember how to check for this off the top
> > > > of my head, but it might be something to investigate.  If that's what it
> > > > is, we just recently added the ability to throttle background deletes:
> > > >
> > > > https://github.com/ceph/ceph/pull/24749
> > > >
> > > >
> > > > If the logs/admin socket don't tell you anything, you could also try
> > > > using our wallclock profiler to see what the OSD is spending it's time
> > > > doing:
> > > >
> > > > https://github.com/markhpc/gdbpmp/
> > > >
> > > >
> > > > ./gdbpmp -t 1000 -p`pidof ceph-osd` -o foo.gdbpmp
> > > >
> > > > ./gdbpmp -i foo.gdbpmp -t 1
> > > >
> > > >
> > > > Mark
> > > >
> > > > On 12/10/18 6:09 PM, Tyler Bishop wrote:  
> > > > > Hi,
> > > > >
> > > > > I have an SSD only cluster that I recently converted from filestore to
> > > > > bluestore and performance has totally tanked. It was fairly decent
> > > > > before, only having a little additional latency than expected.  Now
> > > > > since converting to bluestore the latency is extremely high, SECONDS.
> > > > > I am trying to determine if it an issue with the SSD's or Bluestore
> > > > > treating them differently than filestore... potential garbage
> > > > > collection? 24+ hrs ???
> > > > >
> > > > > I am now seeing constant 100% IO utilization on ALL of the devices and
> > > > > performance is terrible!
> > > > >
> > > > > IOSTAT
> > > > >
> > > > > avg-cpu:  %user   %nice %system %iowait  %steal   %idle
> > > > >            1.37    0.00    0.34   18.59    0.00   79.70
> > > > >
> > > > > Device:         rrqm/s   wrqm/s     r/s     w/s rkB/s    wkB/s
> > > > > avgrq-sz avgqu-sz   await r_await w_await svctm  %util
> > > > > sda               0.00     0.00    0.00    9.50  0.00    64.00
> > > > > 13.47     0.01    1.16    0.00    1.16  1.11   1.05
> > > > > sdb               0.00    96.50    4.50   46.50 34.00 11776.00
> > > > >  463.14   132.68 1174.84  782.67 1212.80 19.61 100.00
> > > > > dm-0              0.00     0.00    5.50  128.00 44.00  8162.00
> > > > >  122.94   507.84 1704.93  674.09 1749.23  7.49 100.00
> > > > >
> > > > > avg-cpu:  %user   %nice %system %iowait  %steal   %idle
> > > > >            0.85    0.00    0.30   23.37    0.00   75.48
> > > > >
> > > > > Device:         rrqm/s   wrqm/s     r/s     w/s rkB/s    wkB/s
> > > > > avgrq-sz avgqu-sz   await r_await w_await svctm  %util
> > > > > sda               0.00     0.00    0.00    3.00  0.00    17.00
> > > > > 11.33     0.01    2.17    0.00    2.17  2.17   0.65
> > > > > sdb               0.00    24.50    9.50   40.50 74.00 10000.00
> > > > >  402.96    83.44 2048.67 1086.11 2274.46 20.00 100.00
> > > > > dm-0              0.00     0.00   10.00   33.50 78.00  2120.00
> > > > >  101.06   287.63 8590.47 1530.40 10697.96 22.99 100.00
> > > > >
> > > > > avg-cpu:  %user   %nice %system %iowait  %steal   %idle
> > > > >            0.81    0.00    0.30   11.40    0.00   87.48
> > > > >
> > > > > Device:         rrqm/s   wrqm/s     r/s     w/s rkB/s    wkB/s
> > > > > avgrq-sz avgqu-sz   await r_await w_await svctm  %util
> > > > > sda               0.00     0.00    0.00    6.00  0.00    40.25
> > > > > 13.42     0.01    1.33    0.00    1.33  1.25   0.75
> > > > > sdb               0.00   314.50   15.50   72.00  122.00 17264.00
> > > > >  397.39    61.21 1013.30  740.00 1072.13  11.41  99.85
> > > > > dm-0              0.00     0.00   10.00  427.00 78.00 27728.00
> > > > >  127.26   224.12  712.01 1147.00  701.82  2.28  99.85
> > > > >
> > > > > avg-cpu:  %user   %nice %system %iowait  %steal   %idle
> > > > >            1.22    0.00    0.29    4.01    0.00   94.47
> > > > >
> > > > > Device:         rrqm/s   wrqm/s     r/s     w/s rkB/s    wkB/s
> > > > > avgrq-sz avgqu-sz   await r_await w_await svctm  %util
> > > > > sda               0.00     0.00    0.00    3.50  0.00    17.00
> > > > >  9.71     0.00    1.29    0.00    1.29  1.14   0.40
> > > > > sdb               0.00     0.00    1.00   39.50  8.00 10112.00
> > > > >  499.75    78.19 1711.83 1294.50 1722.39 24.69 100.00
> > > > >
> > > > >
> > > > > _______________________________________________
> > > > > ceph-users mailing list
> > > > > ceph-users@lists.ceph.com
> > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com  
> > > > _______________________________________________
> > > > ceph-users mailing list
> > > > ceph-users@lists.ceph.com
> > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > >  
> >
> >
> > --
> > Christian Balzer        Network/Systems Engineer
> > ch...@gol.com           Rakuten Communications  
> 


-- 
Christian Balzer        Network/Systems Engineer                
ch...@gol.com           Rakuten Communications
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] SLOW SSD's after moving to Bluestore

Reply via email to