On Sat, 18 Mar 2017 03:50:39 -0700 (PDT)
Vitaly Isaev <vitalyisa...@gmail.com> wrote:

[...]
> Assume that application does some heavy lifting with multiple file 
> descriptors (e.g., opening - writing data - syncing - closing), what 
> actually happens to Go runtime? Does it block all the goroutines at
> the time when expensive syscall occures (like syscall.Fsync)? Or only
> the calling goroutine is blocked while the others are still operating?

IIUC, since there's no general mechanism to have kernel somehow notify
the process of the completion of any generic syscalls, when a goroutine
enters a syscall, it essentially locks its unrelying OS thread and
waits until the syscall completes.  The scheduler detects the goroutine
is about to sleep in the syscall and schedules another goroutine(s) to
run, but the underlying OS thread is not freed.

This is in contrast to network I/O which uses the platform-specific
poller (such as IOCP on Windows, epoll on Linux, kqueue on FreeBSD and
so on) so when an I/O operation on a socket is about to block, the
goroutine which performed that syscall is suspended, put on the wait
list, its socket is added to the set the poller monitors and its
underlying OS thread is freed to be able to serve a runnable goroutine.

> So does it make sense to write programs with multiple workers that do
> a lot of user space - kernel space context switching? Does it make
> sense to use multithreading patterns for disk input?

It may or may not.  A syscall-heavy workload might degrade the
goroutine scheduling to actually be N×N instead of M×N.  This might not
be the problem in itself (not counting a big number of OS threads
allocated and mostly sleeping) but concurrent access to the same slow
resource such as rotating medum is almost always a bad idea: say, your
HDD (and the file system on it) might only provide such and such read
bandwidth, so spreading the processing of the data being read across
multiple goroutines is only worth the deal if this processing is so
computationally complex that a single goroutine won't cope with that
full bandwidth.  If one goroutine is OK with keeping up with that full
bandwidth, having two goroutines read that same data will make each deal
with only half the bandwidth, so they will sleep > 50% of the time.
Note that reading two files in parallel off the filesystem located on
the same rotating medium will usually result in lowered full
bandwidth due to seek times required to jump around the blocks of
different files.

SSDs and other kinds of medium might have way better performance
characteristics so it worth measuring.

IOW, I'd say that trying to parallelizing might be a premature
optimization.  It worth keeping in mind that goroutines serve two
separate purposes: 1) they allow you to write natural sequential
control flow instead of callback-ridden spaghetti code; 2) they allow
performing tasks truely in parallel--if the hardware supports it
(multiple CPUs and/or cores).

This (2) is tricky because it assumes such goroutines have something to
do; if they instead contend on some shared resource, the parallelization
won't really happen.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to