On Mon, Jul 27, 2015 at 11:23:02AM -0400, Jeff Cody wrote: > On Sat, Jul 18, 2015 at 01:44:24AM +0900, Hitoshi Mitake wrote: > > Current sheepdog driver only serializes create requests in oid > > unit. This mechanism isn't enough for handling requests to > > overwrapping area spanning multiple oids, so it can result bugs like > > below: > > https://bugs.launchpad.net/sheepdog-project/+bug/1456421 > > > > This patch adds a new serialization mechanism for the problem. The > > difference from the old one is: > > 1. serialize entire aiocb if their targetting areas overwrap > > 2. serialize all requests (read, write, and discard), not only creates > > > > This patch also removes the old mechanism because the new one can be > > an alternative. > >
Okay, I figured out what the problem is myself and allow me to try to make it clear to non-sheepdog devs: sheepdog volume is thin-provision, so for the first write, we create the object internally, meaning that we need to handle write in two case: 1. write to non-allocated object, create it then update inode, so in this case two request will be generated: create(oid), update_inode(oid_to_inode_idx) 2. write the allocated object, just write(oid). Current sheepdog driver use a range update_inode(min_idx, max_idx) for batching the updates. But there is subtle problem by determining min_idx and max_idx: for a single create request, min_idx == max_idx, so actually we just update one one bit as expected. Suppose we have 2 create request, create(10) and create(20), then min == 10, max==20 even though we just need to update index 10 and index 20, update_inode(10,20) will actually update range from 10 to 20. This would work if all the update_inode() requests won't overlap. But unfortunately, this is not true for some corner case. So the problem arise as following: req 1: update_inode(10,20) req 2: update_inode(15,22) req 1 and req 2 might have different value between [15,20] and cause problems. Based on above analysis, I think the real fix is to fix update_inode(), not serialize all the requests in overkill way. The fix would be easy, considering most update_inode() update only 1 index, we could just make update_inode a single bit updater, not a range one, in which way we don't affect performance as the above patch did. (I actually suspect that the above patch might not solve the problem because update_inode() can overlap even with the patch). If everyone agrees with my analysis, I'll post the fix. Thanks, Yuan