On 12/22/2012 12:42 AM, Michel Lespinasse wrote:
However, I have a few concerns about the behavior of this, which I think deserve more experimentation (I may try helping with it after new years).
More experimentation is always good. Different hardware will probably behave differently, etc...
One thing you mentioned in 0/3 is that the best value varies depending on the number of CPUs contending. This is somewhat surprising to me; I would have guessed/hoped that the (inc.tail - inc.head) multiplicative factor would account for that already.
I had hoped the same, but testing things out while developing the code showed it not to be so. A larger constant scales to a larger number of CPUs, while a smaller constant works faster when there are few CPUs contending on the lock. The graph attached to patch 0/3 shows this effect.
What I'm getting at is that I would be more confident that the autotune algorithm will work well in all cases if the value only depended on the system parameters such as CPU type and frequency, rather than per-spinlock parameters such as number of waiters and hold time.
The autotune algorithm adjusts the delay factor so we access the spinlock, on average, 2.7 times before we acquire it (the 3.7th time). This provides scalability by keeping the number of shared cache line touches constant for each lock acquisition. Going for a fixed value, either at compile time or at boot time, cannot provide such a guarantee.
I feel this review is too high-level to be really helpful, so I'll stop until I can find time to experiment :)
I am looking forward to test results. If anyone manages to break the code, I will fix it... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/