Hi Tyler,
I suspect you have BlueStore DB/WAL at these drives as well, don't you?
Then perhaps you have performance issues with f[data]sync requests which
DB/WAL invoke pretty frequently.
See the following links for details:
https://www.percona.com/blog/2018/02/08/fsync-performance-storage-devices/
https://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/
The latter link shows pretty poor numbers for M500DC drives.
Thanks,
Igor
On 12/11/2018 4:58 AM, Tyler Bishop wrote:
Older Crucial/Micron M500/M600
_____________________________________________
*Tyler Bishop*
EST 2007
O:513-299-7108 x1000
M:513-646-5809
http://BeyondHosting.net <http://beyondhosting.net/>
This email is intended only for the recipient(s) above and/or
otherwise authorized personnel. The information contained herein and
attached is confidential and the property of Beyond Hosting. Any
unauthorized copying, forwarding, printing, and/or disclosing
any information related to this email is prohibited. If you received
this message in error, please contact the sender and destroy all
copies of this email and any attachment(s).
On Mon, Dec 10, 2018 at 8:57 PM Christian Balzer <ch...@gol.com
<mailto:ch...@gol.com>> wrote:
Hello,
On Mon, 10 Dec 2018 20:43:40 -0500 Tyler Bishop wrote:
> I don't think thats my issue here because I don't see any IO to
justify the
> latency. Unless the IO is minimal and its ceph issuing a bunch
of discards
> to the ssd and its causing it to slow down while doing that.
>
What does atop have to say?
Discards/Trims are usually visible in it, this is during a fstrim of a
RAID1 / :
---
DSK | sdb | busy 81% | read 0 | write 8587
| MBw/s 2323.4 | avio 0.47 ms |
DSK | sda | busy 70% | read 2 | write 8587
| MBw/s 2323.4 | avio 0.41 ms |
---
The numbers tend to be a lot higher than what the actual interface is
capable of, clearly the SSD is reporting its internal activity.
In any case, it should give a good insight of what is going on
activity
wise.
Also for posterity and curiosity, what kind of SSDs?
Christian
> Log isn't showing anything useful and I have most debugging
disabled.
>
>
>
> On Mon, Dec 10, 2018 at 7:43 PM Mark Nelson <mnel...@redhat.com
<mailto:mnel...@redhat.com>> wrote:
>
> > Hi Tyler,
> >
> > I think we had a user a while back that reported they had
background
> > deletion work going on after upgrading their OSDs from
filestore to
> > bluestore due to PGs having been moved around. Is it possible
that your
> > cluster is doing a bunch of work (deletion or otherwise)
beyond the
> > regular client load? I don't remember how to check for this
off the top
> > of my head, but it might be something to investigate. If
that's what it
> > is, we just recently added the ability to throttle background
deletes:
> >
> > https://github.com/ceph/ceph/pull/24749
> >
> >
> > If the logs/admin socket don't tell you anything, you could
also try
> > using our wallclock profiler to see what the OSD is spending
it's time
> > doing:
> >
> > https://github.com/markhpc/gdbpmp/
> >
> >
> > ./gdbpmp -t 1000 -p`pidof ceph-osd` -o foo.gdbpmp
> >
> > ./gdbpmp -i foo.gdbpmp -t 1
> >
> >
> > Mark
> >
> > On 12/10/18 6:09 PM, Tyler Bishop wrote:
> > > Hi,
> > >
> > > I have an SSD only cluster that I recently converted from
filestore to
> > > bluestore and performance has totally tanked. It was fairly
decent
> > > before, only having a little additional latency than
expected. Now
> > > since converting to bluestore the latency is extremely high,
SECONDS.
> > > I am trying to determine if it an issue with the SSD's or
Bluestore
> > > treating them differently than filestore... potential garbage
> > > collection? 24+ hrs ???
> > >
> > > I am now seeing constant 100% IO utilization on ALL of the
devices and
> > > performance is terrible!
> > >
> > > IOSTAT
> > >
> > > avg-cpu: %user %nice %system %iowait %steal %idle
> > > 1.37 0.00 0.34 18.59 0.00 79.70
> > >
> > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
> > > avgrq-sz avgqu-sz await r_await w_await svctm %util
> > > sda 0.00 0.00 0.00 9.50 0.00 64.00
> > > 13.47 0.01 1.16 0.00 1.16 1.11 1.05
> > > sdb 0.00 96.50 4.50 46.50 34.00 11776.00
> > > 463.14 132.68 1174.84 782.67 1212.80 19.61 100.00
> > > dm-0 0.00 0.00 5.50 128.00 44.00 8162.00
> > > 122.94 507.84 1704.93 674.09 1749.23 7.49 100.00
> > >
> > > avg-cpu: %user %nice %system %iowait %steal %idle
> > > 0.85 0.00 0.30 23.37 0.00 75.48
> > >
> > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
> > > avgrq-sz avgqu-sz await r_await w_await svctm %util
> > > sda 0.00 0.00 0.00 3.00 0.00 17.00
> > > 11.33 0.01 2.17 0.00 2.17 2.17 0.65
> > > sdb 0.00 24.50 9.50 40.50 74.00 10000.00
> > > 402.96 83.44 2048.67 1086.11 2274.46 20.00 100.00
> > > dm-0 0.00 0.00 10.00 33.50 78.00 2120.00
> > > 101.06 287.63 8590.47 1530.40 10697.96 22.99 100.00
> > >
> > > avg-cpu: %user %nice %system %iowait %steal %idle
> > > 0.81 0.00 0.30 11.40 0.00 87.48
> > >
> > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
> > > avgrq-sz avgqu-sz await r_await w_await svctm %util
> > > sda 0.00 0.00 0.00 6.00 0.00 40.25
> > > 13.42 0.01 1.33 0.00 1.33 1.25 0.75
> > > sdb 0.00 314.50 15.50 72.00 122.00 17264.00
> > > 397.39 61.21 1013.30 740.00 1072.13 11.41 99.85
> > > dm-0 0.00 0.00 10.00 427.00 78.00 27728.00
> > > 127.26 224.12 712.01 1147.00 701.82 2.28 99.85
> > >
> > > avg-cpu: %user %nice %system %iowait %steal %idle
> > > 1.22 0.00 0.29 4.01 0.00 94.47
> > >
> > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
> > > avgrq-sz avgqu-sz await r_await w_await svctm %util
> > > sda 0.00 0.00 0.00 3.50 0.00 17.00
> > > 9.71 0.00 1.29 0.00 1.29 1.14 0.40
> > > sdb 0.00 0.00 1.00 39.50 8.00 10112.00
> > > 499.75 78.19 1711.83 1294.50 1722.39 24.69 100.00
> > >
> > >
> > > _______________________________________________
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
--
Christian Balzer Network/Systems Engineer
ch...@gol.com <mailto:ch...@gol.com> Rakuten Communications
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com