Hi, some news, it seem that it's finally stable for me since 1week. (around 0,7ms of commit latency average) http://odisoweb1.odiso.net/osdstable.png
The biggest change is the 18/02, where I have finished to rebuild all my osd, with 2 osd of 3TB for 1NVME 6TB. (previously I only have done it on 1 node, so maybe with replication I didn't see the benefit) I have also push bluestore_cache_kv_max to 1G, and keep osd_target_memory to default, and disable THP. Differents buffers seem to be more constant too. But clearly, 2 x smaller 3TB osd with 3G osd_target_memory vs 1 big osd 6TB with 6G osd_target_memory have a differents behaviour. (maybe fragmentation, maybe rocksdb, maybe number of objects in cache, I really don't known) ----- Mail original ----- De: "Stefan Kooman" <[email protected]> À: "Wido den Hollander" <[email protected]> Cc: "aderumier" <[email protected]>, "Igor Fedotov" <[email protected]>, "ceph-users" <[email protected]>, "ceph-devel" <[email protected]> Envoyé: Jeudi 28 Février 2019 21:57:05 Objet: Re: [ceph-users] ceph osd commit latency increase over time, until restart Quoting Wido den Hollander ([email protected]): > Just wanted to chime in, I've seen this with Luminous+BlueStore+NVMe > OSDs as well. Over time their latency increased until we started to > notice I/O-wait inside VMs. On a Luminous 12.2.8 cluster with only SSDs we also hit this issue I guess. After restarting the OSD servers the latency would drop to normal values again. See https://owncloud.kooman.org/s/BpkUc7YM79vhcDj Reboots were finished at ~ 19:00. Gr. Stefan -- | BIT BV http://www.bit.nl/ Kamer van Koophandel 09090351 | GPG: 0xD14839C6 +31 318 648 688 / [email protected] _______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
