On Mon, Mar 18, 2013 at 6:17 PM, Dave Chinner <da...@fromorbit.com> wrote: > On Wed, Mar 13, 2013 at 10:00:51PM -0400, Peter Hurley wrote: >> On Wed, 2013-03-13 at 14:23 +1100, Dave Chinner wrote: >> > We don't care about the ordering between multiple concurrent >> > metadata modifications - what matters is whether the ongoing data IO >> > around them is ordered correctly. >> >> Dave, >> >> The point that Michel is making is that there never was any ordering >> guarantee by rwsem. It's an illusion. > > Weasel words.
Whoaaa, calm down. You initially made one false statement (that the change meant a stream of readers would starve a writer forever) and one imprecise statement (that rwsem used to guarantee that readers don't skip ahead of writers - this may be true in practice for your use case because the latencies involved are very large compared to scheduling latencies, but that's a very important qualification that needs to be added here). That confused me enough that I initially couldn't tell what your actual concern was, so I pointed out the source of my confusion and asked you to clarify. It seems unfair to characterize that as "weasel words" - I'm not trying to be a smartass here, but only to actually understand your concern. >> The reason is simple: to even get to the lock the cpu has to be >> sleep-able. So for every submission that you believe is ordered, is by >> its very nature __not ordered__, even when used by kernel code. >> >> Why? Because any thread on its way to claim the lock (reader or writer) >> could be pre-empted for some other task, thus delaying the submission of >> whatever i/o you believed to be ordered. > > You think I don't know this? You're arguing fine grained, low level > behaviour between tasks is unpredictable. I get that. I understand > that. But I'm not arguing about fine-grained, low level, microsecond > semantics of the locking order.... > > What you (and Michael) appear to be failing to see is what happens > on a macro level when you have read locks being held for periods > measured in *seconds* (e.g. direct IO gets queued behind a few > thousand other IOs in the elevator waiting for a request slot), > and the subsequent effect of inserting an operation that requires a > write lock into that IO stream. > > IOWs, it simply doesn't matter if there's a micro-level race between > the write lock and a couple of the readers. That's the level you > guys are arguing at but it simply does not matter in the cases I'm > describing. I'm talking about high level serialisation behaviours > that might take of *seconds* to play out and the ordering behaviours > observed at that scale. > > That is, I don't care if a couple of threads out of a few thousand > race with the write lock over few tens to hundreds of microseconds, > but I most definitely care if a few thousand IOs issued seconds > after the write lock is queued jump over the write lock. That is a > gross behavioural change at the macro-level..... Understood. I accepted your concern and made sure my v2 proposal doesn't do such macro level reordering. >> So just to reiterate: there is no 'queue' and no 'barrier'. The >> guarantees that rwsem makes are; >> 1. Multiple readers can own the lock. >> 2. Only a single writer can own the lock. >> 3. Readers will not starve writers. > > You've conveniently ignored the fact that the current implementation > also provides following guarantee: > > 4. new readers will block behind existing writers In your use case, with large enough queue latencies, yes. Please don't make it sound like this applies in every use case - it has never applied for short (<ms) queue latencies, and you might confuse people by making such unqualified statements. -- Michel "Walken" Lespinasse A program is never fully debugged until the last user dies. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/