Is this a clean (new) cluster and RBD image you are using for your test or
has it been burned in? When possible (i.e. it has enough free space),
bluestore will essentially turn your random RBD image writes into
sequential writes. This optimization doesn't work for random reads unless
your read patterns matches your original random write pattern.

Note that with the default "stupid" allocator, this optimization will at
some point hit a massive performance cliff because the allocator will
aggressively try to re-use free slots that best match the IO size, even if
that means it will require massive seeking around the disk. Hopefully the
"bitmap" allocator will address this issue once it becomes the stable
default in a future release of Ceph.

On Mon, Aug 13, 2018 at 8:56 AM Emmanuel Lacour <elac...@easter-eggs.com>
wrote:

> Dear ceph users,
>
>
> I set up a new cluster:
>
> - Debian stretch
> - ceph 12.2.7
> - 3 nodes with mixed mon/osd
> - 4 hdd  4TB osd per nodes
> - 2 SSDs per nodes shared among osds for db/wal
> - each OSD alone in a raid0+WriteBack
>
> Inside a VM I get really good writes(200MB/s, 5k iops for direct 4K rand
> writes), but with rand reads, device is 100% io wait with only ~150 IOPS
> of avg size 128K.
>
> I tried same workload using fio on rbd volume, same results :(
>
> I played with VM read_ahead without any changes. I also disable most of
> ceph debug, no change.
>
> Any hints to solve this?
>
> Here is the ceph.conf used:
> https://owncloud.home-dn.net/index.php/s/swZsgeFGF2ZfPB2
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


-- 
Jason
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to