Hey folks, Just wanted to update the status of this.
During Gophercon, I happened to meet Russ Cox and asked him the same question. If File::Read blocks goroutines, which then spawn new OS threads, in a long running job, there should be plenty of OS threads created already, so the random read throughput should increase over time and stabilize to the maximum possible value. But, that's not what I see in my benchmarks. And his explanation was that the GOMAXPROCS in a way acts like a multiplexer. From docs, "the GOMAXPROCS variable limits the number of operating system threads that can execute user-level Go code simultaneously." Which basically means, all reads must first be run only via GOMAXPROCS number of goroutines, before switching over to some OS thread (not really a switch, but conceptually speaking). This introduces a bottleneck for throughput. I re-ran my benchmarks with a much higher GOMAXPROCS and was able to then achieve the maximum throughput. The numbers are here: https://github.com/dgraph-io/badger-bench/blob/master/randread/maxprocs.txt To summarize these benchmarks, Linux fio achieves 118K IOPS, and with GOMAXPROCS=64/128, I'm able to achieve 105K IOPS, which is close enough. Win! Regarding the point about using io_submit etc., instead of goroutines; I managed to find a library which does that, but it performed worse than just using goroutines. https://github.com/traetox/goaio/issues/3 >From what I gather (talking to Russ and Ian), whatever work is going on in user space, the same work has to happen in kernel space; so there's not much benefit here. Overall, with GOMAXPROCS set to a higher value (as I've done in Dgraph <https://github.com/dgraph-io/dgraph/commit/30237a1429debab73eff38fea2f724914ca38b77>), one can get the advertised SSD throughput using goroutines. Thanks, Ian, Russ and the Go community in helping solve this problem! On Sat, May 20, 2017 at 5:31 AM, Ian Lance Taylor <i...@golang.org> wrote: > On Fri, May 19, 2017 at 3:26 AM, Manish Rai Jain <manishrj...@gmail.com> > wrote: > > > >> It's not obvious to me that io_submit would be a win for normal > > programs, but if anybody wants to try it out and see that would be > > great. > > > > Yeah, my hunch is that the cost of threads context switching is going to > be > > a hindrance to achieving the true throughput of SSDs. So, I'd like to > try it > > out. A few guiding pointers would be useful: > > > > - This can be done directly via Syscall and Syscall6, is that right? Or > > should I use Cgo? > > You should be able to use syscall.Syscall. > > > - I see SYS_IO_SUBMIT in syscall package. But, no aio_context_t, or > iocbpp > > structs in the package. > > - Similarly, other structs for io_getevents etc. > > - What's the best way to generate them, so syscall.Syscall would accept > > these? > > The simplest way is to get them via cgo. The better way is to add > them to the x/sys/unix package as described at > https://github.com/golang/sys/blob/master/unix/README.md . > > Ian > -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.