Hi Christian, Thanks for you quick reply.
> On Jun 5, 2017, at 2:01 PM, Christian Balzer <ch...@gol.com> wrote: > > > Hello, > > On Mon, 5 Jun 2017 12:25:25 +0800 TYLin wrote: > >> Hi all, >> >> We’re using cache-tier with write-back mode but the write throughput is not >> as good as we expect. > > Numbers (what did you see and what did you expect?), versions, cluster > HW/SW, etc etc. > We use kraken 11.2.0. Our cluster has 8 nodes and each node consists of 7 HDD for storage pool (cephfs data and metadata), 3 ssd for data pool cache, 1 ssd for metadata pool cache. Public network and cluster network use same 10G NIC interface. We mount cephfs with kernel client on one of the nodes and use dd/fio to test its performance. The throughput of creating new file is about 400MB/s. However, the throughput of overwriting an existing file can reach more than 800MB/s. In our thoughts, the throughput of creating a new file and overwriting an existing file should not have that much difference. >> We use CephFS and create a 20GB file in it. While data is writing, we use >> iostat to get the disk statistics. From iostat, we saw that ssd (cache-tier) >> is idle most of the time and hdd (storage-tier) is busy all the time. From >> the document > > While having no real experience with CephFS (with or w/o cache-tiers), I > do think I know what you're seeing here, see below. > >> >> “When admins configure tiers with writeback mode, Ceph clients write data to >> the cache tier and receive an ACK from the cache tier. In time, the data >> written to the cache tier migrates to the storage tier and gets flushed from >> the cache tier.” >> >> So the data is write to cache-tier and then flush to storage tier when dirty >> ratio is more than 0.4? The word “in time” in the document confused me. >> >> We found that the throughput of creating a new file is slower than overwrite >> an existing file, and ssd has more write when doing overwrite. We then look >> into the source code and log. A newly created file goes to proxy_write, >> which is followed by a promote_object. Does this means that the object >> actually goes to storage pool directly and then be promoted to the >> cache-tier when creating a new file? >> > > Creating a new file means creating new Ceph objects, which need to be > present on both the backing store and the cache-tier. > That overhead of creating them is the difference in time you see. > The actual data of the initial write will still be only on the cache-tier, > btw. You mean that when we create a new object, client will not get ACK until the data is written to storage pool (only journal?) and then promote to cache-tier ? If this is true, why we should wait until the object be written to both storage pool and cache-tier ? Can we use any configuration to force it write to cache-tier only and then flush to storage pool when the dirty ratio is reached? Just as what happened when overwrite an existing file. > > Once a file exists and is properly (not sparsely) allocated, writes should > indeed just go to the cache-tier until flushing (space/time/object#) > becomes necessary. > That of course also requires the cache being big enough and not too busy > so that things stay actually in it. > Otherwise those objects need to be promoted back in from the HDDs, making > things slow again. > > Tuning a cache-tier (both parameters and size in general) isn't easy and > with some workloads pretty impossible to get desirable results. > > > Christian > -- > Christian Balzer Network/Systems Engineer > ch...@gol.com Rakuten Communications _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com