I recently added a 3rd node to my cluster, and increased the pool size to 3.
Latency was initially so bad that OSDs were being kicked out for being
unresponsive. I checked the list, and changed
osd max backfills = 1
osd recovery op priority = 1
That's helped. OSDs aren't so slow that they get kicked out of the cluster.
I'm not super sensitive to latency, but I'm still seeing > 2s for RGW
GET and PUT operations. Does anybody have some suggestions for other
ways I can give more priority to clients and less to backfill?
I know the ultimate answer is "add more nodes", and I'm planning on it.
For the next expansion (in a few months), I'm thinking about only adding
a single OSD at a time. Since my cluster is so small, most objects are
stripped across all of the disks. One slow OSD vs. all slow OSDs will
go a long way to reducing total latency. It'll take two months to
finish, but that's not a big deal.
I have 3 nodes, with 8x 4TB HDDs each. Journal is on disk. The Cluster
network is gigabit, but it's only pushing ~300 Mbps per node.
In general, I'm more concerned with read performance than write. Using
all the drive bays for spindles gave better read throughput and latency,
at the expense of write latency. I didn't think about how concerned
with write latency I would be during a recovery.
Thanks for any suggestions.
--
*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com <mailto:cle...@centraldesktop.com>
*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website <http://www.centraldesktop.com/> | Twitter
<http://www.twitter.com/centraldesktop> | Facebook
<http://www.facebook.com/CentralDesktop> | LinkedIn
<http://www.linkedin.com/groups?gid=147417> | Blog
<http://cdblog.centraldesktop.com/>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com