Thanks, Sage, for pointing out the PR and ceph branch. I will take a closer 
look.

Yes, I am trying KVStore backend. The reason we are trying it is that few user 
doesn't have such high requirement on data loss occasionally. It seems KVStore 
backend without synchronized WAL could achieve better performance than 
filestore. And only data still in page cache would get lost on machine 
crashing, not process crashing, if we use WAL but no synchronization. What do 
you think?

    
Thanks.
Zhi Zhang (David)

Date: Tue, 20 Oct 2015 05:47:44 -0700
From: s...@newdream.net
To: zhangz.da...@outlook.com
CC: ceph-users@lists.ceph.com; ceph-de...@vger.kernel.org
Subject: Re: [ceph-users] Write performance issue under rocksdb kvstore

On Tue, 20 Oct 2015, Z Zhang wrote:
> Hi Guys,
> 
> I am trying latest ceph-9.1.0 with rocksdb 4.1 and ceph-9.0.3 with 
> rocksdb 3.11 as OSD backend. I use rbd to test performance and following 
> is my cluster info.
> 
> [ceph@xxx ~]$ ceph -s
>     cluster b74f3944-d77f-4401-a531-fa5282995808
>      health HEALTH_OK
>      monmap e1: 1 mons at {xxx=xxx.xxx.xxx.xxx:6789/0}
>             election epoch 1, quorum 0 xxx
>      osdmap e338: 44 osds: 44 up, 44 in
>             flags sortbitwise
>       pgmap v1476: 2048 pgs, 1 pools, 158 MB data, 59 objects
>             1940 MB used, 81930 GB / 81932 GB avail
>                 2048 active+clean
> 
> All the disks are spinning ones with write cache turning on. Rocksdb's 
> WAL and sst files are on the same disk as every OSD.
 
Are you using the KeyValueStore backend?
 
> Using fio to generate following write load: 
> fio -direct=1 -rw=randwrite -ioengine=sync -size=10M -bs=4K -group_reporting 
> -directory /mnt/rbd_test/ -name xxx.1 -numjobs=1  
> 
> Test result:
> WAL enabled + sync: false + disk write cache: on  will get ~700 IOPS.
> WAL enabled + sync: true (default) + disk write cache: on|off  will get only 
> ~25 IOPS.
> 
> I tuned some other rocksdb options, but with no lock.
 
The wip-newstore-frags branch sets some defaults for rocksdb that I think 
look pretty reasonable (at least given how newstore is using rocksdb).
 
> I tracked down the rocksdb code and found each writer's Sync operation 
> would take ~30ms to finish. And as shown above, it is strange that 
> performance has no much difference no matters disk write cache is on or 
> off.
> 
> Do your guys encounter the similar issue? Or do I miss something to 
> cause rocksdb's poor write performance?
 
Yes, I saw the same thing.  This PR addresses the problem and is nearing 
merge upstream:
 
        https://github.com/facebook/rocksdb/pull/746
 
There is also an XFS performance bug that is contributing to the problem, 
but it looks like Dave Chinner just put together a fix for that.
 
But... we likely won't be using KeyValueStore in its current form over 
rocksdb (or any other kv backend).  It stripes object data over key/value 
pairs, which IMO is not the best approach.
 
sage

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com                          
          
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to