Zhihui Zhang wrote: > What if most I/O are asynchronous writes and handled by a background > process (e.g. SoftUpdate syncer daemon or a special kernel daemon), then I > guess the wait should have something to do with memory or buffer. But I do > not know to to confirm this. Maybe some profiling or instrumentation (too > much work?) will help.
Keep a count of outstanding I/O. This is your pool size. The figure of merit, according to queueing theory, is pool retention time. You can get this by having a running average of how many elelments are pending completion, vs. the frequency at which you make requests. Doing this doesn't require kernel hacks for statistics gathering. The number of tables in a McDonalds is based on pool retention time sums of the line time, the eating time, and the cleanup time. That's why, no matter how big the line is, there is always a place to sit when you get your food. For your application, this comes down to I/O latency. The bigger the latency, the more outstanding operations you need to have happening concurrently. If you are actually handling this by serializing through a daemon, then you may have to add more daemons, or speed up the ones you have. The soft updates syncer daemon can actually be sped up, but to do it, you will have to add more slots, so that placing an event into the future still works as expected (the same distance in the future). You actually probably would not benefit from this; the main bottleneck is that once a buffer is handed off, there is a effective write lock on it. So reducing the write lock overhead means reducing the period of time when the lock is active (this may not be possible to do, and still maintain the benefits of the soft updates: you will lose increasing amounts of write coelescing). Probably, you will want to tackle the problem outside the kernel, by getting the operations out of the same contention domain in the first place. Doing this is a balance, since it means that you will probably move them out of the adjacent cache space, as well, which means cache shootdowns. For a disk write cache, this could simply move the stall from the kernel down to the disk. If you are bottlenecked by physical I/O bandwidth, then, at that point, there's really nothing you can do to save yourself. If you are driving requests into the queue as fast as you can, there is also nothing you can do, except try to reduce the pool retention time itself (faster disks, enabling write caching, use of "presto-serv" type hardware, etc.). -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message