On further thought about GOMAXPROCS, and its impact on throughput: A file::pread would block the OS thread. Go runs one OS thread per core. So, if an OS thread is blocked, no goroutines can be scheduled on this thread, therefore even pure CPU operations can't be run. This would lead to core wastage.
This is probably the reason why increasing GOMAXPROCS improves throughput, and running any number of goroutines >= GOMAXPROCS has little impact on anything. The underlying OS threads are already blocked, so goroutines can't do much. If this logic is valid, then in a complex system, which is doing many random reads, while also performing calculations (like Dgraph) would suffer; even if we set GOMAXPROCS to a factor more than number of cores. Ideally, the disk reads could be happening via libaio, causing the OS threads to not block, so all goroutines can make progress, increasing the number of read requests that can be made concurrently. This would then also ensure that one doesn't need to set GOMAXPROCS to a value greater than number of cores to achieve higher throughput. On Wed, May 17, 2017 at 10:38 AM, Manish Rai Jain <manishrj...@gmail.com> wrote: > So, I fixed the rand and removed the atomics usage (link in my original > post). > > Setting GOMAXPROCS definitely helped a lot. And now it seems to make > sense, because (the following command in) fio spawns 16 threads; and > GOMAXPROCS would do the same thing. However, the numbers are still quite a > bit off. > > I realized fio seems to overestimate, and my Go program seems to > underestimate, so we used sar to determine the IOPS. > > $ fio --name=randread --ioengine=psync --iodepth=32 --rw=randread --bs=4k > --direct=0 --size=2G --numjobs=16 --runtime=120 --group_reporting > Gives around 62K, tested via sar -d 1 -p, while > > $ go build . && GOMAXPROCS=16 ./randread --dir ~/diskfio --jobs 16 --num > 2000000 --mode 1 > Gives around 44K, via sar. Number of cores on my machine are 4. > > Note that this is way better than the earlier 20K with GOMAXPROCS = number > of cores, but still leaves much to be desired. > > On Tue, May 16, 2017 at 11:36 PM, Ian Lance Taylor <i...@golang.org> > wrote: > >> On Tue, May 16, 2017 at 4:59 AM, Manish Rai Jain <manishrj...@gmail.com> >> wrote: >> > >> > 3 is slower than 2 (of course). But, 2 is never able to achieve the IOPS >> > that Fio can achieve. I've tried other things, to no luck. What I >> notice is >> > that Go and Fio are close to each other as long as number of Goroutines >> is >> > <= number of cores. Once you exceed cores, Go stays put, while Fio IOPS >> > keeps on improving, until it reaches SSD thresholds. >> >> One thing I notice about your program is that each goroutine is >> calling rand.Intn and rand.Int63n. Those functions acquire and >> release a lock, so that single lock is being contested by every >> goroutine. That's an unfortunate and unnecessary slowdown. Give each >> goroutine its own source of pseudo-random numbers by using rand.New. >> >> You also have a point of contention on the local variable i, which you >> are manipulating using atomic functions. It would be cheaper to give >> each goroutine a number of operations to do rather than to compute >> that dynamically using a contended address. >> >> I'll also note that if a program that should be I/O bound shows a >> behavior change when the number of parallel goroutines exceeds the >> number of CPUs, then it might be interesting to try setting GOMAXPROCS >> to be higher. I don't know what effect that would have here, but it's >> worth checking. >> >> Ian >> > > -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.