Hi Mehmet!
On 08/16/2017 11:12 AM, Mehmet wrote:
:( no suggestions or recommendations on this?
Am 14. August 2017 16:50:15 MESZ schrieb Mehmet <c...@elchaka.de>:
Hi friends,
my actual hardware setup per OSD-node is as follow:
# 3 OSD-Nodes with
- 2x Intel(R) Xeon(R) CPU E5-2603 v3 @ 1.60GHz ==> 12 Cores, no
Hyper-Threading
- 64GB RAM
- 12x 4TB HGST 7K4000 SAS2 (6GB/s) Disks as OSDs
- 1x INTEL SSDPEDMD400G4 (Intel DC P3700 NVMe) as Journaling Device for
12 Disks (20G Journal size)
- 1x Samsung SSD 840/850 Pro only for the OS
# and 1x OSD Node with
- 1x Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz (10 Cores 20 Threads)
- 64GB RAM
- 23x 2TB TOSHIBA MK2001TRKB SAS2 (6GB/s) Disks as OSDs
- 1x SEAGATE ST32000445SS SAS2 (6GB/s) Disk as OSDs
- 1x INTEL SSDPEDMD400G4 (Intel DC P3700 NVMe) as Journaling Device for
24 Disks (15G Journal size)
- 1x Samsung SSD 850 Pro only for the OS
The single P3700 for 23 spinning disks is pushing it. They have high
write durability but based on the model that is the 400GB version? If
you are doing a lot of writes you might wear it out pretty fast and it's
a single point of failure for the entire node (if it dies you have a lot
of data dying with it). General unbalanced setups like this are
trickier to get performing well as well.
As you can see, i am using 1 (one) NVMe (Intel DC P3700 NVMe – 400G)
Device for whole Spinning Disks (partitioned) on each OSD-node.
When „Luminous“ is available (as next LTE) i plan to switch vom
„filestore“ to „bluestore“ 😊
As far as i have read bluestore consists of
- „the device“
- „block-DB“: device that store RocksDB metadata
- „block-WAL“: device that stores RocksDB „write-ahead journal“
Which setup would be usefull in my case?
I Would setup the disks via "ceph-deploy".
So typically we recommend something like a 1-2GB WAL partition on the
NVMe drive per OSD and use the remaining space for DB. If you run out
of DB space, bluestore will start using the spinning disks to store KV
data instead. I suspect this will still be the advice you will want to
follow, though at some point having so many WAL and DB partitions on the
NVMe may start becoming a bottleneck. Something like 63K sequential
writes to heavily fragmented objects might be worth testing, but in most
cases I suspect DB and WAL on NVMe is still going to be faster.
Thanks in advance for your suggestions!
- Mehmet
------------------------------------------------------------------------
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com