Is this a clean (new) cluster and RBD image you are using for your test or has it been burned in? When possible (i.e. it has enough free space), bluestore will essentially turn your random RBD image writes into sequential writes. This optimization doesn't work for random reads unless your read patterns matches your original random write pattern.
Note that with the default "stupid" allocator, this optimization will at some point hit a massive performance cliff because the allocator will aggressively try to re-use free slots that best match the IO size, even if that means it will require massive seeking around the disk. Hopefully the "bitmap" allocator will address this issue once it becomes the stable default in a future release of Ceph. On Mon, Aug 13, 2018 at 8:56 AM Emmanuel Lacour <elac...@easter-eggs.com> wrote: > Dear ceph users, > > > I set up a new cluster: > > - Debian stretch > - ceph 12.2.7 > - 3 nodes with mixed mon/osd > - 4 hdd 4TB osd per nodes > - 2 SSDs per nodes shared among osds for db/wal > - each OSD alone in a raid0+WriteBack > > Inside a VM I get really good writes(200MB/s, 5k iops for direct 4K rand > writes), but with rand reads, device is 100% io wait with only ~150 IOPS > of avg size 128K. > > I tried same workload using fio on rbd volume, same results :( > > I played with VM read_ahead without any changes. I also disable most of > ceph debug, no change. > > Any hints to solve this? > > Here is the ceph.conf used: > https://owncloud.home-dn.net/index.php/s/swZsgeFGF2ZfPB2 > > > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Jason
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com