OK, I think I have it. It's ugly. Firstly, note that multiple instances of doCall can be running for the same key. This happens when:
1. you invoke DoChan. This inserts a 'c' (call struct) into the map and starts doCall in a goroutine. 2. at this point it's not shared: i.e. you don't call DoChan again with the same key (yet). 3. you invoke ForgetUnshared on this key. This "detaches" it, but doCall carries on running. It has its own local copy of 'c' so it knows where to send the result, even though the map is now empty. 4. you invoke DoChan again with the same key. This inserts a new 'c' into the map and starts a new doCall goroutine. At this point, you have two instances of doCall running, and the map is pointing at the second one. This is where it gets ugly. 5. you invoke DoChan yet again with the same key. This turns it into a shared task, with c.dups > 0, len(c.chans) > 1. 6. the first instance of doCall terminates. At this point it unconditionally removes the key from the map - even though it had previously been removed by ForgetUnshared! func (g *Group) doCall(c *call, key string, fn func() (interface{}, error)) { c.val, c.err = fn() c.wg.Done() g.mu.Lock() * delete(g.m, key) // <<<< NOTE* for _, ch := range c.chans { ch <- Result{c.val, c.err, c.dups > 0} } g.mu.Unlock() } So, even though it's the first instance of doCall which is terminating, it's removing the second instance of doCall from the map. This is now also a detached task. 7. In one of the two goroutines, the timeout event occurs. It calls ForgetUnshared, which happily returns true because the key does not exist in the map - and therefore you proceed to cancel the context. But actually a task with this key *is* running; and furthermore, it is a shared task, with 2 channel receivers. 8. Once the sleep has completed in the task function, it notices that the context is cancelled and returns an error. 9. doCall sends the resulting error down multiple channels (those you started in steps 4 and 5 above) 10. The select { case res := <-ch } triggers in the *other* goroutine - the one which didn't have a timeout. Hence it receives the error, and that's where you panic(). On Thursday, 22 September 2022 at 20:37:07 UTC+1 Brian Candler wrote: > OK, I see where you're coming from - and I agree, this is a difficult one! > > The point you were making is that > > if g.ForgetUnshared(key) { > cancel() > } > > should only invoke cancel() if this result wasn't shared: i.e. there's > only one receiver in the c.chans array, and c.dups == 0. So where's the > race, given that everything in g is done under a mutex? > > What I have discovered so far is: when g.ForgetUnshared(key) returns true > and the problem occurs, the key is not present in the map (as opposed to > being present with c.dups == 0). But I've not been able to work out why > yet. > > Incidentally, a minor style observation: you passed in ctx to your go > func(...), but not cancel. As far as I can see, both ctx and cancel are > local variables which drop immediately out of scope - there's no way they > can be modified later outside of the goroutine. So I believe you don't > need to pass ctx at all: you can access it via the closure. But if you do > pass one "to be on the safe side", then I think the other should be passed > as well - otherwise it's confusing why you passed in only one. > > In fact, in this case, you could move the ctx/cancel creation inside the > go func(...) anyway. The only thing which needs to be outside is > the wg.Add(1). > > On Thursday, 22 September 2022 at 03:12:47 UTC+1 atomic wrote: > >> > Also notice that the random time you pick for cancelTime can be longer >> than the different random time you sleep inside the goroutine (i.e. the >> function which you pass to DoChan). Hence the goroutine can return a >> result, before the cancelTime is reached. >> >> Although the goroutine can return a result before cancelTime arrives, the >> returned result should not be err because I haven't had time to call >> cancel(). >> 在2022年9月21日星期三 UTC+8 20:18:30<Brian Candler> 写道: >> >>> Notice that DoChan starts a goroutine for the task... >>> >>> go g.doCall(c, key, fn) >>> >>> ... and then returns immediately. >>> >>> Also notice that the random time you pick for cancelTime can be longer >>> than the different random time you sleep inside the goroutine (i.e. the >>> function which you pass to DoChan). Hence the goroutine can return a >>> result, before the cancelTime is reached. >>> >>> Try this modification: >>> >>> --- main.go.orig 2022-09-21 13:14:10.000000000 +0100 >>> +++ main.go 2022-09-21 13:13:43.000000000 +0100 >>> @@ -144,7 +144,7 @@ >>> defer wg.Done() >>> >>> ch, _ := g.DoChan(key, func() (interface{}, error) { >>> - time.Sleep(randTimeout()) >>> + time.Sleep(5000 * time.Millisecond) >>> if ctx.Err() == context.Canceled { >>> return nil, fmt.Errorf("callUUID=[%d] err=[%s]", >>> uuid, ctx.Err()) >>> } >>> @@ -152,7 +152,7 @@ >>> }) >>> >>> // randomly choose a timeout to cancel >>> - cancelTime := time.After(randTimeout()) >>> + cancelTime := time.After(10 * time.Millisecond) >>> select { >>> case <-cancelTime: >>> // cancel only if no other goroutines share >>> >>> On Wednesday, 21 September 2022 at 10:01:22 UTC+1 atomic wrote: >>> >>>> Thanks for your reply, but I still don't understand why time.Sleep is >>>> causing my test program to panic. >>>> >>>> In fact, this is a real online environment problem. My application uses >>>> http.Client.Do(), but it occasionally has errors: [lookup xxxxx on >>>> xxxxx: dial udp xxxxx: operation was canceled], after looking at the code, >>>> I found that it may be There is a problem with ForgetUnshared, >>>> lookupIPAddr >>>> uses ForgetUnshared: >>>> https://github.com/golang/go/blob/4a4127bccc826ebb6079af3252bc6bfeaec187c4/src/net/lookup.go#L336 >>>> >>>> 在2022年9月21日星期三 UTC+8 16:17:35<cuong.m...@gmail.com> 写道: >>>> >>>>> Hello, >>>>> >>>>> You use time.Sleep in your program, so the behavior is not >>>>> predictable. In fact, I get it success or panic randomly. >>>>> >>>>> You can see https://go-review.googlesource.com/c/sync/+/424114 to see >>>>> a predictable test of ForgetUnshared . >>>>> >>>>> On Wednesday, September 21, 2022 at 1:45:24 PM UTC+7 atomic wrote: >>>>> >>>>>> hello >>>>>> >>>>>> I find that the `src/internal/singleflight/singleflight.go >>>>>> ForgetUnshared()` method returns results that are not always expected >>>>>> >>>>>> For this I wrote a test code, I copied the code in the >>>>>> src/internal/singleflight/singleflight.go file to the main package, and >>>>>> wrote a main function to test it, if ForgetUnshared() returns correctly, >>>>>> this code It should not panic, but the fact that it will panic every >>>>>> time >>>>>> it runs, is there something wrong with my understanding of >>>>>> ForgetUnshared()? >>>>>> >>>>>> The test code cannot be run in goplay, so I posted a link: >>>>>> https://gist.github.com/dchaofei/e07547bce17d94c3e05b1b2a7230f62f >>>>>> >>>>>> The go version I use for testing is 1.16, 1.19.1 >>>>>> >>>>>> result: >>>>>> ``` >>>>>> $ go run cmd/main.go >>>>>> panic: callUUID=[9314284969 <(931)%20428-4969>] err=[context >>>>>> canceled] currentUUId=[6980556786] >>>>>> ``` >>>>>> >>>>> -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/9f93b205-9106-4794-8a3c-4d5ee75d26ebn%40googlegroups.com.