Hi Christian,

Thanks for you quick reply.


> On Jun 5, 2017, at 2:01 PM, Christian Balzer <ch...@gol.com> wrote:
> 
> 
> Hello,
> 
> On Mon, 5 Jun 2017 12:25:25 +0800 TYLin wrote:
> 
>> Hi all,
>> 
>> We’re using cache-tier with write-back mode but the write throughput is not 
>> as good as we expect. 
> 
> Numbers (what did you see and what did you expect?), versions, cluster
> HW/SW, etc etc.
> 

We use kraken 11.2.0. Our cluster has 8 nodes and each node consists of 7 HDD 
for storage pool (cephfs data and metadata), 3 ssd for data pool cache, 1 ssd 
for metadata pool cache. Public network and cluster network use same 10G NIC 
interface. We mount cephfs with kernel client on one of the nodes and use 
dd/fio to test its performance. The throughput of creating new file is about 
400MB/s. However, the throughput of overwriting an existing file can reach more 
than 800MB/s. In our thoughts, the throughput of creating a new file and 
overwriting an existing file should not have that much difference. 


>> We use CephFS and create a 20GB file in it. While data is writing, we use 
>> iostat to get the disk statistics. From iostat, we saw that ssd (cache-tier) 
>> is idle most of the time and hdd (storage-tier) is busy all the time. From 
>> the document
> 
> While having no real experience with CephFS (with or w/o cache-tiers), I
> do think I know what you're seeing here, see below.
> 
>> 
>> “When admins configure tiers with writeback mode, Ceph clients write data to 
>> the cache tier and receive an ACK from the cache tier. In time, the data 
>> written to the cache tier migrates to the storage tier and gets flushed from 
>> the cache tier.”
>> 
>> So the data is write to cache-tier and then flush to storage tier when dirty 
>> ratio is more than 0.4? The word “in time” in the document confused me. 
>> 
>> We found that the throughput of creating a new file is slower than overwrite 
>> an existing file, and ssd has more write when doing overwrite. We then look 
>> into the source code and log. A newly created file goes to proxy_write, 
>> which is followed by a promote_object. Does this means that the object 
>> actually goes to storage pool directly and then be promoted to the 
>> cache-tier when creating a new file?
>> 
> 
> Creating a new file means creating new Ceph objects, which need to be
> present on both the backing store and the cache-tier. 
> That overhead of creating them is the difference in time you see.
> The actual data of the initial write will still be only on the cache-tier,
> btw.

You mean that when we create a new object, client will not get ACK until the 
data is written to storage pool (only journal?) and then promote to cache-tier 
? If this is true, why we should wait until the object be written to both 
storage pool and cache-tier ? Can we use any configuration to force it write to 
cache-tier only and then flush to storage pool when the dirty ratio is reached? 
Just as what happened when overwrite an existing file. 

> 
> Once a file exists and is properly (not sparsely) allocated, writes should
> indeed just go to the cache-tier until flushing (space/time/object#)
> becomes necessary. 
> That of course also requires the cache being big enough and not too busy
> so that things stay actually in it.
> Otherwise those objects need to be promoted back in from the HDDs, making
> things slow again.
> 
> Tuning a cache-tier (both parameters and size in general) isn't easy and
> with some workloads pretty impossible to get desirable results.
> 
> 
> Christian
> -- 
> Christian Balzer        Network/Systems Engineer                
> ch...@gol.com         Rakuten Communications

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to