Thanks for your advices Maged, Chris
I'll answer bellow
On 08/22/2017 04:30 PM, Mazzystr wrote:
Also examine your network layout. Any saturation in the private
cluster network or client facing network will be felt in clients /
libvirt / virtual machines
As OSD count increases...
* Ensure client network private cluster network seperation -
different nics, different wires, different switches
* Add more nics both client side and private cluster network side
and lag them.
* If/When your dept's budget suddenly swells...implement 10 gig-e.
We have different NICs for each network but they are connected to the
same switch. In that switch both nets are logically separated by VLANs.
The switch does not look saturated for now (it is a 10gbit-e), but using
the same switch may become a problem as the OSD count increases.
Monitor, capacity plan, execute :)
/Chris C
On Tue, Aug 22, 2017 at 3:02 PM, Maged Mokhtar <mmokh...@petasan.org
<mailto:mmokh...@petasan.org>> wrote:
It is likely your 2 spinning disks cannot keep up with the load.
Things are likely to improve if you double your OSDs hooking them
up to your existing SSD journal. Technically it would be nice to
run a load/performance tool (either atop/collectl/sysstat) and
measure how busy your resources are, but it is most likely your 2
spinning disks will show near 100% busy utilization.
We have a monitoring "stack" compounded by collectd/graphite/grafana and
I can see the spinning disks almost saturated when performing IO heavy
tasks on the cluster.
filestore_max_sync_interval: i do not recommend decreasing this to
0.1, i would keep it at 5 sec
I'll increase this parameter today, since we have some maintenance work
to do.
osd_op_threads do not increase this unless you have enough cores.
I'll look into this today too.
but adding disks is the way to go
Maged
On 2017-08-22 20:08, fcid wrote:
Hello everyone,
I've been using ceph to provide storage using RBD for 60 KVM
virtual machines running on proxmox.
The ceph cluster we have is very small (2 OSDs + 1 mon per node,
and a total of 3 nodes) and we are having some performace issues,
like big latency times (apply lat:~0.5 s; commit lat: 0.001 s),
which get worse by the weekly deep-scrubs.
I wonder if doubling the numbers of OSDs would improve latency
times, or if there is any other configuration tweak recommended
for such small cluster. Also, I'm looking forward to read any
experience of other users using a similiar configuration.
Some technical info:
- Ceph version: 10.2.5
- OSDs have SSD journal (one SSD disk per 2 OSDs) and have a
spindle for backend disk.
- Using CFQ disk queue scheduler
- OSD configuration excerpt:
osd_recovery_max_active = 1
osd_recovery_op_priority = 63
osd_client_op_priority = 1
osd_mkfs_options = -f -i size=2048 -n size=64k
osd_mount_options_xfs = inode64,noatime,logbsize=256k
osd_journal_size = 20480
osd_op_threads = 12
osd_disk_threads = 1
osd_disk_thread_ioprio_class = idle
osd_disk_thread_ioprio_priority = 7
osd_scrub_begin_hour = 3
osd_scrub_end_hour = 8
osd_scrub_during_recovery = false
filestore_merge_threshold = 40
filestore_split_multiple = 8
filestore_xattr_use_omap = true
filestore_queue_max_ops = 2500
filestore_min_sync_interval = 0.01
filestore_max_sync_interval = 0.1
filestore_journal_writeahead = true
Best regards,
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
<http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
--
Fernando Cid O.
Ingeniero de Operaciones
AltaVoz S.A.
http://www.altavoz.net
Viña del Mar, Valparaiso:
2 Poniente 355 of 53
+56 32 276 8060
Santiago:
San Pío X 2460, oficina 304, Providencia
+56 2 2585 4264
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com