> The runtime will spawn a new thread to replace the one that is blocked.
Realized that after writing my last mail. And that actually explains some of the other crashes we saw, about "too many threads", if we run tens of thousands of goroutines to do these reads, one goroutine per read. It is obviously lot more expensive to spawn a new OS thread. It seems like this exact same problem was already solved for network via netpoller ( https://morsmachine.dk/netpoller). Blocking OS threads for disk reads made sense for HDDs, which could only do 200 IOPS; for SSDs we'd need a solution based on async I/O. On Wed, May 17, 2017 at 2:01 PM, Dave Cheney <d...@cheney.net> wrote: > > So, if an OS thread is blocked, no goroutines can be scheduled on this > thread, therefore even pure CPU operations can't be run. > > The runtime will spawn a new thread to replace the one that is blocked. > > On Wednesday, 17 May 2017 13:05:49 UTC+10, Manish Rai Jain wrote: >> >> On further thought about GOMAXPROCS, and its impact on throughput: >> >> A file::pread would block the OS thread. Go runs one OS thread per core. >> So, if an OS thread is blocked, no goroutines can be scheduled on this >> thread, therefore even pure CPU operations can't be run. This would lead to >> core wastage. >> >> This is probably the reason why increasing GOMAXPROCS improves >> throughput, and running any number of goroutines >= GOMAXPROCS has little >> impact on anything. The underlying OS threads are already blocked, so >> goroutines can't do much. >> >> If this logic is valid, then in a complex system, which is doing many >> random reads, while also performing calculations (like Dgraph) would >> suffer; even if we set GOMAXPROCS to a factor more than number of cores. >> >> Ideally, the disk reads could be happening via libaio, causing the OS >> threads to not block, so all goroutines can make progress, increasing the >> number of read requests that can be made concurrently. This would then also >> ensure that one doesn't need to set GOMAXPROCS to a value greater than >> number of cores to achieve higher throughput. >> >> >> On Wed, May 17, 2017 at 10:38 AM, Manish Rai Jain <manis...@gmail.com> >> wrote: >> >>> So, I fixed the rand and removed the atomics usage (link in my original >>> post). >>> >>> Setting GOMAXPROCS definitely helped a lot. And now it seems to make >>> sense, because (the following command in) fio spawns 16 threads; and >>> GOMAXPROCS would do the same thing. However, the numbers are still quite a >>> bit off. >>> >>> I realized fio seems to overestimate, and my Go program seems to >>> underestimate, so we used sar to determine the IOPS. >>> >>> $ fio --name=randread --ioengine=psync --iodepth=32 --rw=randread >>> --bs=4k --direct=0 --size=2G --numjobs=16 --runtime=120 --group_reporting >>> Gives around 62K, tested via sar -d 1 -p, while >>> >>> $ go build . && GOMAXPROCS=16 ./randread --dir ~/diskfio --jobs 16 --num >>> 2000000 --mode 1 >>> Gives around 44K, via sar. Number of cores on my machine are 4. >>> >>> Note that this is way better than the earlier 20K with GOMAXPROCS = >>> number of cores, but still leaves much to be desired. >>> >>> On Tue, May 16, 2017 at 11:36 PM, Ian Lance Taylor <ia...@golang.org> >>> wrote: >>> >>>> On Tue, May 16, 2017 at 4:59 AM, Manish Rai Jain <manis...@gmail.com> >>>> wrote: >>>> > >>>> > 3 is slower than 2 (of course). But, 2 is never able to achieve the >>>> IOPS >>>> > that Fio can achieve. I've tried other things, to no luck. What I >>>> notice is >>>> > that Go and Fio are close to each other as long as number of >>>> Goroutines is >>>> > <= number of cores. Once you exceed cores, Go stays put, while Fio >>>> IOPS >>>> > keeps on improving, until it reaches SSD thresholds. >>>> >>>> One thing I notice about your program is that each goroutine is >>>> calling rand.Intn and rand.Int63n. Those functions acquire and >>>> release a lock, so that single lock is being contested by every >>>> goroutine. That's an unfortunate and unnecessary slowdown. Give each >>>> goroutine its own source of pseudo-random numbers by using rand.New. >>>> >>>> You also have a point of contention on the local variable i, which you >>>> are manipulating using atomic functions. It would be cheaper to give >>>> each goroutine a number of operations to do rather than to compute >>>> that dynamically using a contended address. >>>> >>>> I'll also note that if a program that should be I/O bound shows a >>>> behavior change when the number of parallel goroutines exceeds the >>>> number of CPUs, then it might be interesting to try setting GOMAXPROCS >>>> to be higher. I don't know what effect that would have here, but it's >>>> worth checking. >>>> >>>> Ian >>>> >>> >>> >> -- > You received this message because you are subscribed to a topic in the > Google Groups "golang-nuts" group. > To unsubscribe from this topic, visit https://groups.google.com/d/ > topic/golang-nuts/jPb_h3TvlKE/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > golang-nuts+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.