On Wed, Jul 12, 2017 at 12:28 AM, <james.ab...@gmail.com> wrote: > > On Wednesday, 12 July 2017 04:44:43 UTC+1, Ian Lance Taylor wrote: >> >> On Tue, Jul 11, 2017 at 12:27 PM, <james...@gmail.com> wrote: >> > >> > This caught me out recently; there's definitely a gap in my >> > understanding as >> > to how select works. Maybe it's caught other people out too? >> > >> > For one of my clients, I'd written an application that would try to read >> > data from a backend. There were multiple instances of this backend (say >> > one >> > in Oregon, and one in Virginia). >> > >> > In this context, it made sense to try reading from all of the backends >> > concurrently, and return the first successful result. All of the slower >> > responses could be ignored, as long as the code cleaned up any resources >> > (goroutines, file descriptors etc). >> > >> > Reader, it did not. >> > >> > The code which demonstrates the failure is at >> > https://github.com/jabley/fptp >> >> I think you neglected to say what went wrong. Or, if you did say it, >> I missed it. >> >> When I look at this: >> >> select { >> case <-ctx.Done(): >> // Another Searcher already returned a successful response. >> Ensure we don't leak things >> res.Close() >> default: >> respc <- searchResponse{res, err} >> } >> >> what I think is: if the context is not completed when you first enter >> the select statement, then something had better hang around reading >> from respc. But, as far as I can tell, nothing does. So you will >> collect goroutines that will never go away. Is that the problem you >> were seeing? >> >> Ian > > > The problem seen was that goroutines were leaked, and file descriptors were > leaked. Over time, the process got to the limit of max file descriptors > allowed by the OS, and started returning "too many open files" errors. > > The select statement in `waitForSearchCompletion` is trying to read from > respc. If the response that gets read from the channel is deemed successful, > that function returns. The calling Search function also returns, and the > context.CancelFunc is called to stop further responses being returned by > `search`. > > TestClosersAreNotLeaked does not appear to fail if > > `case respc <- searchResponse{res, err}:` > > is used in `search` instead of > > ``` > default: > respc <- searchResponse{res, err} > ``` > > Running the tests with -race does not highlight a problem. There is 100% > test coverage in both the successful and failing variants. But it was only > the addition of TestClosersAreNotLeaked (and the supporting plumbing code) > which replicated the problem I was seeing in production. > > There does appear to be a race condition between the 2 select statements in > `search` and `waitForSearchCompletion`. > > I don't (yet!) understand what I was doing wrong.
I have not looked at your whole program. In the case where you write default: respc <- ... then if the context is not finished when the goroutine reaches that point, the goroutine is going to hang forever waiting for something to read from the channel. That will give you the leaked goroutines and descriptors that you describe. In the case where you write case <-ctx.Done(): ... case respc <- ... then the goroutine will wait until either the context is done or it can write to the channel. When the context is done, the goroutine will go away, and you will not see a leak. If that does not explain your problem, why not? Ian -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.