Hi Steve and Kate, Again - thanks again for the great suggestions.
Increasing the allocsize did not help us in the situation relating to my current testing (poor read performance). However, allocsize is a great for parameter for overall performance tuning and I intend to use it. :) After discussion with colleagues and reading this article - ubuntu drive io scheduler <http://askubuntu.com/questions/784442/why-does-ubuntu-16-04-set-all-drive-io-schedulers-to-deadline>, I decided to try out the cfq io schedular - ubuntu now defaults to deadline. This made a significant difference - it actually double the overall read performance. I suggest anyone using ubuntu 14.04 or higher and high density osd nodes (we have 48 osds per osd node) might like to test out cfq. It's also a pretty easy test to perform :) and can be done on the fly. Cheers, Tom On Wed, Nov 30, 2016 at 5:50 PM, Steve Taylor <steve.tay...@storagecraft.com > wrote: > We’re using Ubuntu 14.04 on x86_64. We just added ‘osd mount options xfs = > rw,noatime,inode64,allocsize=1m’ to the [osd] section of our ceph.conf so > XFS allocates 1M blocks for new files. That only affected new files, so > manual defragmentation was still necessary to clean up older data, but once > that was done everything got better and stayed better. > > > > You can use the xfs_db command to check fragmentation on an XFS volume and > xfs_fsr to perform a defragmentation. The defragmentation can run on a > mounted filesystem too, so you don’t even have to rely on Ceph to avoid > downtime. I probably wouldn’t run it everywhere at once though for > performance reasons. A single OSD at a time would be ideal, but that’s a > matter of preference. > > > > *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf > Of *Thomas Bennett > *Sent:* Wednesday, November 30, 2016 5:58 AM > > *Cc:* ceph-users@lists.ceph.com > *Subject:* Re: [ceph-users] Is there a setting on Ceph that we can use to > fix the minimum read size? > > > > Hi Kate and Steve, > > > > Thanks for the replies. Always good to hear back from a community :) > > > > I'm using Linux on x86_64 architecture and the block size is limited to > the page size which is 4k. So it looks like I'm hitting hard limits in any > changes. to increase the block size. > > > > I found this out by running the following command: > > > > $ mkfs.xfs -f -b size=8192 /dev/sda1 > > > > $ mount -v /dev/sda1 /tmp/disk/ > > mount: Function not implemented #huh??? > > > > Checking out the man page: > > > > $ man mkfs.xfs > > -b block_size_options > > ... XFS on Linux currently only supports pagesize or smaller > blocks. > > > > I'm hesitant to implement btrfs as its still experimental and ext4 seems > to have the same current limitation. > > > > Our current approach is to exclude the hard drive that we're getting the > poor read rates from our procurement process, but it would still be nice to > find out how much control we have over how ceph-osd daemons read from the > drives. I may attempts a strace on an osd daemon as we read to see what the > actual read request size is being asked to the kernel. > > > > Cheers, > > Tom > > > > > > On Tue, Nov 29, 2016 at 11:53 PM, Steve Taylor < > steve.tay...@storagecraft.com> wrote: > > We configured XFS on our OSDs to use 1M blocks (our use case is RBDs with > 1M blocks) due to massive fragmentation in our filestores a while back. We > were having to defrag all the time and cluster performance was noticeably > degraded. We also create and delete lots of RBD snapshots on a daily basis, > so that likely contributed to the fragmentation as well. It’s been MUCH > better since we switched XFS to use 1M allocations. Virtually no > fragmentation and performance is consistently good. > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com