Hi gang, I need some advice on the best way to accomplish non-blocking buffered disk IO from my user space application. Unlike some of the other database systems I'm trying to outsource as much work to the kernel as possible. I would prefer to avoid having to resolve to O_DIRECT and io_submit to fetch the data and having to reimplement the page / buffer cache & read ahead.
The application is read heavy with occasional long running write jobs. Since I'm not too concerned about the performance on the write path I am able to run that work in threads and block. Current I'm mmaping the files, and the make the read path quite simple and is great for disk scans when my data set is stored in memory. When the data is not cached the performance becomes more unpredictable, esp. when I'm doing an indexed read (giant bitmap indexes). Here's how my IO path looks like: application <--> fscache (SSD) <--> cephfs <--> ceph cluster Ultimately what I'd like is a way to do non-blocking scatter gather IO from disk or page cache into my application. I'd like to be non-blocking because it often happens that I can do something useful while waiting on IO like uncompress indexes for another request that is waiting, process network IO., etc. With mmap my blocking is unpredictable and mlock() blocks and only lets me lock a range and not a vector of page ranges. If I was doing this in the kernel life would be simple; there are all sorts of APIs for doing async IO even when my VFS is stack as in above diagram. Is there any way for me to take advantage of it in user space... even if it's in units of page. It's entirely possible that I'm missing something and there's a good way of doing this that I haven't though of. Thanks, - Milosz -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/