Uh! It is strange. You said you have already cleared caches both on client and osd node, the data must directly come from the disk. Wait other's ideas
2015-02-03 11:44 GMT+08:00 Bruce McFarland <bruce.mcfarl...@taec.toshiba.com>: > Yes I'm using and the kernel rbd in Ubuntu 14.04 which makes calls into > libceph > > root@essperf3:/etc/ceph# lsmod | grep rbd > rbd 63707 1 > libceph 225026 1 rbd > root@essperf3:/etc/ceph# > > I'm doing raw device IO with either fio or vdbench (preferred tool) and there > is no filesystem on top of /dev/rbd1. Yes I did invalidate the kmem pages by > writing to the drop_caches and I've also allocated huge pages to be the max > allowable based on free memory. The huge page allocation should minimize any > system caches. I have a, relatively, small storage pool since this is a > development environment and there is only ~ 4TB total and the rbd image is > 3TB. On my lab system with 320TB I don't see this problem since the data set > is orders of magnitude larger than available system cache. > > Maybe I'll should try and test after removing DIMMs from the client system > and physically disabling kernel caching. > > -----Original Message----- > From: Nicheal [mailto:zay11...@gmail.com] > Sent: Monday, February 02, 2015 7:35 PM > To: Bruce McFarland > Cc: ceph-us...@ceph.com; Prashanth Nednoor > Subject: Re: [ceph-users] RBD caching on 4K reads??? > > It seems you use the kernel rbd. So rbd_cache does not work, which is just > designed for librbd. Kernel rbd is directly using the system page cache. You > said that you have already run like echo 3 > /proc/sys/vm/drop_cache to > invalidate all pages cached in kernel. So do you test the /dev/rbd1 based on > any filesystem, such ext4 or xfs? > If so, and you run the test tool like fio, first with a write test and > file_size = 10G. Then a file(10G) is created by fio but with lots of holes in > the file, and your read test may read those holes so that filesystem can tell > thay contain nothing and there is no need to access the physical disk to get > data. You may check the fiemap of the file to see whether it contains holes > or you just remove the file and recreate the file by a read test. > > Ning Yao > > 2015-01-31 4:51 GMT+08:00 Bruce McFarland <bruce.mcfarl...@taec.toshiba.com>: >> I have a cluster and have created a rbd device - /dev/rbd1. It shows >> up as expected with ‘rbd –image test info’ and rbd showmapped. I have >> been looking at cluster performance with the usual Linux block device >> tools – fio and vdbench. When I look at writes and large block >> sequential reads I’m seeing what I’d expect with performance limited >> by either my cluster interconnect bandwidth or the backend device >> throughput speeds – 1 GE frontend and cluster network and 7200rpm SATA >> OSDs with 1 SSD/osd for journal. Everything looks good EXCEPT 4K >> random reads. There is caching occurring somewhere in my system that I >> haven’t been able to detect and suppress - yet. >> >> >> >> I’ve set ‘rbd_cache=false’ in the [client] section of ceph.conf on the >> client, monitor, and storage nodes. I’ve flushed the system caches on >> the client and storage nodes before test run ie vm.drop_caches=3 and >> set the huge pages to the maximum available to consume free system >> memory so that it can’t be used for system cache . I’ve also disabled >> read-ahead on all of the HDD/OSDs. >> >> >> >> When I run a 4k randon read workload on the client the most I could >> expect would be ~100iops/osd x number of osd’s – I’m seeing an order >> of magnitude greater than that AND running IOSTAT on the storage nodes >> show no read activity on the OSD disks. >> >> >> >> Any ideas on what I’ve overlooked? There appears to be some read-ahead >> caching that I’ve missed. >> >> >> >> Thanks, >> >> Bruce >> >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com