On Wednesday, 12 July 2017 04:44:43 UTC+1, Ian Lance Taylor wrote:
>
> On Tue, Jul 11, 2017 at 12:27 PM,  <james...@gmail.com <javascript:>> 
> wrote: 
> > 
> > This caught me out recently; there's definitely a gap in my 
> understanding as 
> > to how select works. Maybe it's caught other people out too? 
> > 
> > For one of my clients, I'd written an application that would try to read 
> > data from a backend. There were multiple instances of this backend (say 
> one 
> > in Oregon, and one in Virginia). 
> > 
> > In this context, it made sense to try reading from all of the backends 
> > concurrently, and return the first successful result. All of the slower 
> > responses could be ignored, as long as the code cleaned up any resources 
> > (goroutines, file descriptors etc). 
> > 
> > Reader, it did not. 
> > 
> > The code which demonstrates the failure is at 
> https://github.com/jabley/fptp 
>
> I think you neglected to say what went wrong.  Or, if you did say it, 
> I missed it. 
>
> When I look at this: 
>
>     select { 
>     case <-ctx.Done(): 
>         // Another Searcher already returned a successful response. 
> Ensure we don't leak things 
>         res.Close() 
>     default: 
>         respc <- searchResponse{res, err} 
>     } 
>
> what I think is: if the context is not completed when you first enter 
> the select statement, then something had better hang around reading 
> from respc.  But, as far as I can tell, nothing does.  So you will 
> collect goroutines that will never go away.  Is that the problem you 
> were seeing? 
>
> Ian 
>

The problem seen was that goroutines were leaked, and file descriptors were 
leaked. Over time, the process got to the limit of max file descriptors 
allowed by the OS, and started returning "too many open files" errors.

The select statement in `waitForSearchCompletion` is trying to read from 
respc. If the response that gets read from the channel is deemed 
successful, that function returns. The calling Search function also 
returns, and the context.CancelFunc is called to stop further responses 
being returned by `search`.

TestClosersAreNotLeaked does not appear to fail if

`case respc <- searchResponse{res, err}:`

is used in `search` instead of

```
default:
        respc <- searchResponse{res, err}
```

Running the tests with -race does not highlight a problem. There is 100% 
test coverage in both the successful and failing variants. But it was only 
the addition of TestClosersAreNotLeaked (and the supporting plumbing code) 
which replicated the problem I was seeing in production.

There does appear to be a race condition between the 2 select statements in 
`search` and `waitForSearchCompletion`.

I don't (yet!) understand what I was doing wrong.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to