Hello,
We'll soon be building out four new luminous clusters with Bluestore.
Our current clusters are running filestore so we're not very familiar
with Bluestore yet and I'd like to have an idea of what to expect.
Here are the OSD hardware specs (5x per cluster):
2x 3.0GHz 18c/36t
22x 1.8TB 10K SAS (RAID1 OS + 20 OSD's)
5x 480GB Intel S4610 SSD's (WAL and DB)
192 GB RAM
4X Mellanox 25GB NIC
PERC H730p
With filestore we've found that we can achieve sub-millisecond write
latency by running very fast journals (currently Intel S4610's). My
main concern is that Bluestore doesn't use journals and instead writes
directly to the higher latency HDD; in theory resulting in slower acks
and higher write latency. How does Bluestore handle this? Can we
expect similar or better performance then our current filestore
clusters?
I've heard it repeated that Bluestore performs better than Filestore
but I've also heard some people claiming this is not always the case
with HDD's. Is there any truth to that and if so is there a
configuration we can use to achieve this same type of performance with
Bluestore?
Bluestore does use journals for small writes and doesn't for big ones. You
can try to disable "small writes" by increasing
bluestore_prefer_deferred_size, but it's generally pointless because in
Bluestore the "journal" is RocksDB's journal (WAL) which creates way too
much extra write amplification when big data chunks are put into it. This
creates extra load for SSDs and write performance does not increase when
compared to the default.
Bluestore is always better in terms of linear write throughput because it
has no double-write for big data chunks. But it's roughly on par, and
sometimes may even be slightly worse than filestore, in terms of 4K random
writes.
--
With best regards,
Vitaliy Filippov
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com