[ceph-users] Re: Degradation of write-performance after upgrading to Octopus

Mark Nelson Thu, 11 Jun 2020 10:15:12 -0700


On 6/11/20 11:30 AM, Stephan wrote:

Hi Mark,


thanks for your comprehensive response!

Our tests are basically matching the linked results (we are testing with 2 
OSDs/NVMe and fio/librbd too, but having a much smaller setup). Sometimes we 
see smaller or higher improvements from Nautilus to Octupus but it is similar. 
Only the random write iops are the other way round, namely a lot slower in our 
setup …

Meanwhile we have gone through some more testing:

@1) Increasing osd_memory_target from the default (which ist 4GB as far as we 
know) to 16GB doesn't change the results.



Ok, not likely due to onode cache misses then!


@2/3) The CPUs are configured for high performance in the BIOS and we also 
ensured that it is set in the kernel as well (governor performance). Each node 
in our test-setup has one Intel E2690-v3 with 12/24 cores/threads running 
constantly at 3,1GHz.

Ok, that's good. FWIW this typically makes a fairly substantialdifference with high performance NVMe drives so it might be worth justverifying that you are seeing an improvement vs letting the CPUs dropinto low power C states.


@4) Yes, we have tested bluefs_buffered_io without success. We did some 
profiling using gdbpmp, collecting 100 samples shows that 0.5%-1% of the time 
is spent in io_submit. There is an extrem performance impact when profiling 
(reducing iops to several hundreds operations/second), therefore we are 
uncertain if this is a relevant information. Can we improve the profiling (we 
used gdbpmp.py -p … -n 100 -m bstore_kv_sync,bstore_kv_final -o … like in the 
example on github)? We gladly provide the sample data collected if this could 
be helpful. Furthermore we checked iostats, which seems to be okay (w_await 
most times below 1).

Yeah, gdbpmp will have a a big effect on performance which in some casescould affect the results (especially if you are profiling the client andthe osd at the same time). Having said that, the way it works is thatit periodically stops the process and looks at the sample of what it wasdoing when it was stopped. Between those pauses it runs normally. So ifyou end up with a bunch of samples all in lock contention there'sprobably a decent chance that the OSD really is spending a lot of timein lock contention during normal execution too. You can adjust how longit sleeps between sample collection with the -s paramater if you want tomaximize the time between pauses (though it will make it take evenlonger to gather samples).


@5) We have set noscrub and norebalance as well as disabled the automated 
scaling of the pg count during all our tests.

As the results are reproducible when switching between Nautilus and Octopus, 
there must clearly be something going on in Octopus. Maybe this only affects 
very small setups like ours? As far as we see you have been testing with 8 
nodes/64 NVMe total, where our setup only consists of 3 nodes with one NVMe 
each.

It's possible, though even in small setups while we were testing masterpost-nautilus we were seeing quite a bit better throughput. If youcould try a 1000 sample gdbpmp profile of one of your OSDs on octopus(and even better another one on nautilus) that would be most helpful! Please also include the benchmark command-line that was run if possible.



Thanks,

Mark
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Degradation of write-performance after upgrading to Octopus

Reply via email to