On Wednesday, 12 July 2017 04:44:43 UTC+1, Ian Lance Taylor wrote: > > On Tue, Jul 11, 2017 at 12:27 PM, <james...@gmail.com <javascript:>> > wrote: > > > > This caught me out recently; there's definitely a gap in my > understanding as > > to how select works. Maybe it's caught other people out too? > > > > For one of my clients, I'd written an application that would try to read > > data from a backend. There were multiple instances of this backend (say > one > > in Oregon, and one in Virginia). > > > > In this context, it made sense to try reading from all of the backends > > concurrently, and return the first successful result. All of the slower > > responses could be ignored, as long as the code cleaned up any resources > > (goroutines, file descriptors etc). > > > > Reader, it did not. > > > > The code which demonstrates the failure is at > https://github.com/jabley/fptp > > I think you neglected to say what went wrong. Or, if you did say it, > I missed it. > > When I look at this: > > select { > case <-ctx.Done(): > // Another Searcher already returned a successful response. > Ensure we don't leak things > res.Close() > default: > respc <- searchResponse{res, err} > } > > what I think is: if the context is not completed when you first enter > the select statement, then something had better hang around reading > from respc. But, as far as I can tell, nothing does. So you will > collect goroutines that will never go away. Is that the problem you > were seeing? > > Ian >
The problem seen was that goroutines were leaked, and file descriptors were leaked. Over time, the process got to the limit of max file descriptors allowed by the OS, and started returning "too many open files" errors. The select statement in `waitForSearchCompletion` is trying to read from respc. If the response that gets read from the channel is deemed successful, that function returns. The calling Search function also returns, and the context.CancelFunc is called to stop further responses being returned by `search`. TestClosersAreNotLeaked does not appear to fail if `case respc <- searchResponse{res, err}:` is used in `search` instead of ``` default: respc <- searchResponse{res, err} ``` Running the tests with -race does not highlight a problem. There is 100% test coverage in both the successful and failing variants. But it was only the addition of TestClosersAreNotLeaked (and the supporting plumbing code) which replicated the problem I was seeing in production. There does appear to be a race condition between the 2 select statements in `search` and `waitForSearchCompletion`. I don't (yet!) understand what I was doing wrong. -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.