hi folks.
i have a system where raidframe accesses are severely restricted and present a 2-20x performance loss. after various tests i have confirmed the problem is in raidframe itself and that appears to be a wakeup() that does not cause the related ltsleep() to wakeup until the next tick. initially i observed this system building raid1 parity on 2 disks limited to about 6MiB/sec, on disks capable of doing well over 100MiB/sec transfers. systat initially pinpointed raidframe as the problem due to vastly different "busy" values for the raid volume compared to the underlying component. when setting up this system originally, put 3 RAID1 devices on wd0/wd1 for root, swap and home, and after observing the root volume init being very slow, i fired off raidctl -iv for the other 2 raid devices. normally, i'd have expected an adverse effect on the total disk io for wd0/wd1, but infact, i saw the total IO for these disks raise fom ~6MiB/sec to ~17MiB/sec. ie, i got almost 3x the performance by having 3 raid devices active. after a bunch of testing and asking other developers for input i came to the conclusion that a wakeup() in rf is not actually waking the sleeper until the next tick occurs. i have tested with HZ=100, 512 and 1024 and the total IO/s i see is limited by the HZ value. at HZ=1024 the system seems mostly reasonable but measurements show it still being over 2x slower in some cases compared to the raw disk. i converted the main iodone to use mutex/condvar but it did not change anything. i added some instrumentation into most of the iodone usage. what i've found confirmed my theory that the wakeups weren't occuring (now with cv_signal()). it seems that while there are a bunch of threads woken up and run to schedule a raid IO, the main slowness comes from the KernelWakeupFunc(), which is called from biodone2(). the other very interesting thing is that the problem entirely goes away if i "boot -1" -- ie, avoid SMP. anyone have any idea what is going on here? thank. .mrg. ps. i converted one more wakeup/tsleep to mutex/cv, but there's still a fair bit to go: http://www.netbsd.org/~mrg/rf_node_mutex.diff if anyone would like to give this review.