[ceph-users] Re: Inquiry About IO Ordering in Ceph Octopus Version

Maged Mokhtar Thu, 24 Apr 2025 08:06:23 -0700


On 24/04/2025 13:57, 段世博 wrote:

We are currently using the Ceph Octopus version and have some questions
regarding a specific commit in librbd (
https://github.com/ceph/ceph/commit/081d28ae7ca46fd1f40034cc558def77a95a9294
).


Could you please clarify why Ceph implements ordering for overlapping IO?
Our understanding is that overlapping IO in block storage can typically be
parallelized. We are curious about which components of Ceph depend on this
feature. Additionally, we noticed that this feature was removed in the
subsequent Pacific version. Could you provide some insight into the reasons
behind this decision?

Thank you for your assistance.

Best regards,

shibo





1) Generally and not specific to ceph:

i) If a client issues 2 overlapped write ios A then io B. If A returnsfirst then B returns, the client knows that both ios were written tostorage, but cannot assume A was written before B, it cannot assumeorder since they are overlapped.ii) If after A returns, the client issues io C which also returns, itcan assume C was written after A.

2) The above is valid if no caching/buffering is involved in the middle.Caching can be located on the client (OS/page cache, librbd cache), orthe storage server (server/hardware cache, drive cache..).Caching provides better performance but in case of crash, data in cachewill be lost and it does not guarantee the same write order.Client application specify if they want to disable OS cache (directflag) or server cache (sync flag).Clients that enable caching can issue "cache flush" at specfic times toguarantee when data is saved and have control on save order.

3) Ceph OSDs do not cache writes, there is no server cache, all i/o isconsidered (sync flag).

If you write client app that uses librados, then rules in 1) apply

If you use kernel rbd mapped devices and write with (direct flag) toavoid page cache,then rules in 1) apply.

If you use librbd and disable librbd cache,then rules in 1) apply.

4) Regarding parallel behavior: concurrent i/o will execute in parallelacross the many OSDs as well as within the OSD. However within the OSD,writes with the same pg number will be serialized, this is animplementation detail mainly to insure integrity of the pg metadatastructures, it is not related to overlapped i/o ordering.

So yes, overlapped i/o to block storage will execute in parallel.

5) I had a quick look at the changes/patches you mention, they do notalter the above. From what i understood it seems to be related to whenusing librbd and have librbd cache enabled, it enhances/fixes behaviorof cache flushes, maybe like delaying execution of a flush if prev i/oor prev flush was inflight and not yet completed, so in case of crash wedo not have image with new writes but not old.


/maged

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Inquiry About IO Ordering in Ceph Octopus Version

Reply via email to