The prefetch cache aims to improve the performance of sequential read data. Of most interest here are the requests of a small size of data for sequential read, such requests can be optimized by extending them and moving into the prefetch cache. However, there are 2 issues: - In aggregate only a small portion of requests is sequential, so delays caused by the need to read more volumes of data will lead to an overall decrease in performance. - The presence of redundant data in the cache memory with a large number of random requests. This pcache implementation solves the above and other problems prefetching data. The pcache algorithm can be summarised by the following main steps.
1. Monitor I/O requests to identify typical sequences. This implementation of prefetch cache works at the storage system level and has information only about the physical block addresses of I/O requests. Statistics are collected only from read requests to a maximum size of 64kb(by default), each request that matches the criteria falls into a pool of requests. In order to store request statistics used by the rb-tree, it's simple but for this issue a quite efficient data structure. 2. Identifying sequential I/O streams. For each read request to be carried out attempting to lift the chain sequence from the tree statistics, where this request will be element of a sequential chain of requests. The key to search for consecutive requests is the area of bytes preceding the current request. The size of this area should not be too small to avoid false readahead. The sequential stream data requests can be identified even when a large number of random requests. For example, if there is access to the blocks 100, 1157, 27520, 4, 101, 312, 1337, 102, in the context of request processing 102 will be identified the chain of sequential requests 100, 101. 102 and then should a decision be made to do readahead. Also a situation may arise when multiple applications A, B, C simultaneously perform sequential read of data. For each separate application that will be sequential read data A(100, 101, 102), B(300, 301, 302), C(700, 701, 702), but for block devices it may look like a random data reading: 100,300,700,101,301,701,102,302,702. In this case, the sequential streams will also be recognised because location requests in the rb-tree will allow to separate the sequential I/O streams. 3. Do the readahead into the cache for recognized sequential data streams. After the issue of the detection of pcache case was resolved, need using larger requests to bring data into the cache. In this implementation the pcache used readahead instead of the extension request, therefore the request goes as is. There is not any reason to put data in the cache that will never be picked up, but this will always happen in the case of extension requests. In order to store areas of cached blocks is also used the rb-tree, it's simple but for this issue a quite efficient data structure. 4. Control size of the prefetch cache pool and the requests statistic pool For control the border of the pool statistic of requests, the data of requests are placed and replaced according to the FIFO principle, everything is simple. For control the boundaries of the memory cache used LRU list, it allows to limit the max amount memory that we can allocate for pcache. But the LRU is there mainly to prevent displacement of the cache blocks that was read partially. The main way the memory is pushed out immediately after use, as soon as a chunk of memory from the cache has been completely read, since the probability of repetition of the request is very low. Cases when one and the same portion of the cache memory has been read several times are not optimized and do not apply to the cases that can optimize the pcache. Thus, using a cache memory of small volume, by the optimization of the operations read-ahead and clear memory, we can read entire volumes of data, providing a 100% cache hit. Also does not decrease the effectiveness of random read requests. PCache is implemented as a qemu block filter driver, has some configurable parameters, such as: total cache size, statistics size, readahead size, maximum size of block that can be processed. For performance evaluation has been used several test cases with different sequential and random read data on SSD disk. Here are the results of tests and qemu parameters: qemu parameters: -machine pc,accel=kvm,usb=off,vmport=off -m 1024 -smp 8 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 -drive file=/img/harddisk.hdd,format=pcache,if=none,id=drive-scsi0-0-0-0,cache=none,aio=native -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0 ***************************************************************** * Testcase * Results in iops * * ******************************* * * clean qemu * pcache * ***************************************************************** * Create/open 16 file(s) of total * 21645 req/s * 74793 req/s * * size 2048.00 MB named * 20385 req/s * 66481 req/s * * /tmp/tmp.tmp, start 4 thread(s) * 20616 req/s * 69007 req/s * * and do uncached sequential read * * * * by 4KB blocks * * * ***************************************************************** * Create/open 16 file(s) of total * 84033 req/s * 87828 req/s * * size 2048.00 MB named * 84602 req/s * 89678 req/s * * /tmp/tmp.tmp, start 4 thread(s) * 83163 req/s * 96202 req/s * * and do uncached sequential read * * * * by 4KB blocks with constant * * * * queue len 32 * * * ***************************************************************** * Create/open 16 file(s) of total * 14104 req/s * 14164 req/s * * size 2048.00 MB named * 14130 req/s * 14232 req/s * * /tmp/tmp.tmp, start 4 thread(s) * 14183 req/s * 14080 req/s * * and do uncached random read by * * * * 4KB blocks * * * ***************************************************************** * Create/open 16 file(s) of total * 23480 req/s * 23483 req/s * * size 2048.00 MB named * 23070 req/s * 22432 req/s * * /tmp/tmp.tmp, start 4 thread(s) * 24090 req/s * 23499 req/s * * and do uncached random read by * * * * 4KB blocks with constant queue * * * * len 32 * * * ***************************************************************** Pavel Butsykin (18): block/io: add bdrv_aio_{preadv, pwritev} block/pcache: empty pcache driver filter util/rbtree: add rbtree from linux kernel util/rbcache: range-based cache core tests/test-rbcache: add test cases block/pcache: statistics collection read requests block/pcache: skip large aio read block/pcache: updating statistics for overlapping requests block/pcache: add AIO readahead block/pcache: skip readahead for unallocated clusters block/pcache: cache invalidation on AIO write requests block/pcache: add reading data from the cache block/pcache: inflight readahead request waiting for aio read block/pcache: pick up parts of the cache block/pcache: drop used pcache nodes block/pcache: write through block/pcache: add tracepoints block/pcache: debug build MAINTAINERS | 13 + block/Makefile.objs | 1 + block/io.c | 16 + block/pcache.c | 771 ++++++++++++++++++++++++++++++++++++++++ block/trace-events | 9 + include/block/block.h | 7 + include/qemu/rbcache.h | 105 ++++++ include/qemu/rbtree.h | 107 ++++++ include/qemu/rbtree_augmented.h | 235 ++++++++++++ tests/Makefile.include | 3 + tests/test-rbcache.c | 308 ++++++++++++++++ util/Makefile.objs | 2 + util/rbcache.c | 246 +++++++++++++ util/rbtree.c | 570 +++++++++++++++++++++++++++++ 14 files changed, 2393 insertions(+) create mode 100644 block/pcache.c create mode 100644 include/qemu/rbcache.h create mode 100644 include/qemu/rbtree.h create mode 100644 include/qemu/rbtree_augmented.h create mode 100644 tests/test-rbcache.c create mode 100644 util/rbcache.c create mode 100644 util/rbtree.c -- 2.10.1