I sent https://go-review.googlesource.com/c/go/+/433315 for fixing the issue.
On Friday, September 23, 2022 at 10:59:03 AM UTC+7 atomic wrote: > No wonder I didn't find related questions, the method names in the two > packages are different: > One is Forget and the other is ForgetUnshared > > 在2022年9月23日星期五 UTC+8 11:05:51<cuong.m...@gmail.com> 写道: > >> Seems to me this commit is not port to the internal singleflight: >> https://github.com/golang/sync/commit/56d357773e8497dfd526f0727e187720d1093757 >> >> On Friday, September 23, 2022 at 9:23:29 AM UTC+7 atomic wrote: >> >>> Thank you so much, so happy, you are amazing. >>> You answered a question that has been bothering me for days, I opened an >>> issue on github, can you submit a pr to fix this? >>> https://github.com/golang/go/issues/55343 >>> >>> 在2022年9月23日星期五 UTC+8 06:23:13<Brian Candler> 写道: >>> >>>> And here's a proof-of-concept fix which seems to do the job: >>>> >>>> --- main.go.orig 2022-09-21 13:14:10.000000000 +0100 >>>> +++ main.go 2022-09-22 23:19:54.000000000 +0100 >>>> @@ -27,6 +27,7 @@ >>>> // not written after the WaitGroup is done. >>>> dups int >>>> chans []chan<- Result >>>> + forgotten bool >>>> } >>>> >>>> // Group represents a class of work and forms a namespace in >>>> @@ -101,7 +102,9 @@ >>>> c.wg.Done() >>>> >>>> g.mu.Lock() >>>> - delete(g.m, key) >>>> + if !c.forgotten { >>>> + delete(g.m, key) >>>> + } >>>> for _, ch := range c.chans { >>>> ch <- Result{c.val, c.err, c.dups > 0} >>>> } >>>> @@ -121,6 +124,7 @@ >>>> return true >>>> } >>>> if c.dups == 0 { >>>> + c.forgotten = true >>>> delete(g.m, key) >>>> return true >>>> } >>>> >>>> On Thursday, 22 September 2022 at 23:16:22 UTC+1 Brian Candler wrote: >>>> >>>>> OK, I think I have it. It's ugly. >>>>> >>>>> Firstly, note that multiple instances of doCall can be running for the >>>>> same key. This happens when: >>>>> >>>>> 1. you invoke DoChan. This inserts a 'c' (call struct) into the map >>>>> and starts doCall in a goroutine. >>>>> 2. at this point it's not shared: i.e. you don't call DoChan again >>>>> with the same key (yet). >>>>> 3. you invoke ForgetUnshared on this key. This "detaches" it, but >>>>> doCall carries on running. It has its own local copy of 'c' so it knows >>>>> where to send the result, even though the map is now empty. >>>>> 4. you invoke DoChan again with the same key. This inserts a new 'c' >>>>> into the map and starts a new doCall goroutine. >>>>> >>>>> At this point, you have two instances of doCall running, and the map >>>>> is pointing at the second one. >>>>> >>>>> This is where it gets ugly. >>>>> >>>>> 5. you invoke DoChan yet again with the same key. This turns it into a >>>>> shared task, with c.dups > 0, len(c.chans) > 1. >>>>> 6. the first instance of doCall terminates. At this point it >>>>> unconditionally removes the key from the map - even though it had >>>>> previously been removed by ForgetUnshared! >>>>> >>>>> func (g *Group) doCall(c *call, key string, fn func() (interface{}, >>>>> error)) { >>>>> c.val, c.err = fn() >>>>> c.wg.Done() >>>>> >>>>> g.mu.Lock() >>>>> * delete(g.m, key) // <<<< NOTE* >>>>> for _, ch := range c.chans { >>>>> ch <- Result{c.val, c.err, c.dups > 0} >>>>> } >>>>> g.mu.Unlock() >>>>> } >>>>> >>>>> So, even though it's the first instance of doCall which is >>>>> terminating, it's removing the second instance of doCall from the map. >>>>> This is now also a detached task. >>>>> >>>>> 7. In one of the two goroutines, the timeout event occurs. It calls >>>>> ForgetUnshared, which happily returns true because the key does not exist >>>>> in the map - and therefore you proceed to cancel the context. >>>>> >>>>> But actually a task with this key *is* running; and furthermore, it is >>>>> a shared task, with 2 channel receivers. >>>>> >>>>> 8. Once the sleep has completed in the task function, it notices that >>>>> the context is cancelled and returns an error. >>>>> >>>>> 9. doCall sends the resulting error down multiple channels (those you >>>>> started in steps 4 and 5 above) >>>>> >>>>> 10. The select { case res := <-ch } triggers in the *other* goroutine >>>>> - the one which didn't have a timeout. Hence it receives the error, and >>>>> that's where you panic(). >>>>> >>>>> On Thursday, 22 September 2022 at 20:37:07 UTC+1 Brian Candler wrote: >>>>> >>>>>> OK, I see where you're coming from - and I agree, this is a difficult >>>>>> one! >>>>>> >>>>>> The point you were making is that >>>>>> >>>>>> if g.ForgetUnshared(key) { >>>>>> cancel() >>>>>> } >>>>>> >>>>>> should only invoke cancel() if this result wasn't shared: i.e. >>>>>> there's only one receiver in the c.chans array, and c.dups == 0. So >>>>>> where's the race, given that everything in g is done under a mutex? >>>>>> >>>>>> What I have discovered so far is: when g.ForgetUnshared(key) returns >>>>>> true and the problem occurs, the key is not present in the map (as >>>>>> opposed >>>>>> to being present with c.dups == 0). But I've not been able to work out >>>>>> why >>>>>> yet. >>>>>> >>>>>> Incidentally, a minor style observation: you passed in ctx to your go >>>>>> func(...), but not cancel. As far as I can see, both ctx and cancel are >>>>>> local variables which drop immediately out of scope - there's no way >>>>>> they >>>>>> can be modified later outside of the goroutine. So I believe you don't >>>>>> need to pass ctx at all: you can access it via the closure. But if you >>>>>> do >>>>>> pass one "to be on the safe side", then I think the other should be >>>>>> passed >>>>>> as well - otherwise it's confusing why you passed in only one. >>>>>> >>>>>> In fact, in this case, you could move the ctx/cancel creation inside >>>>>> the go func(...) anyway. The only thing which needs to be outside is >>>>>> the wg.Add(1). >>>>>> >>>>>> On Thursday, 22 September 2022 at 03:12:47 UTC+1 atomic wrote: >>>>>> >>>>>>> > Also notice that the random time you pick for cancelTime can be >>>>>>> longer than the different random time you sleep inside the goroutine >>>>>>> (i.e. >>>>>>> the function which you pass to DoChan). Hence the goroutine can return >>>>>>> a >>>>>>> result, before the cancelTime is reached. >>>>>>> >>>>>>> Although the goroutine can return a result before cancelTime >>>>>>> arrives, the returned result should not be err because I haven't had >>>>>>> time >>>>>>> to call cancel(). >>>>>>> 在2022年9月21日星期三 UTC+8 20:18:30<Brian Candler> 写道: >>>>>>> >>>>>>>> Notice that DoChan starts a goroutine for the task... >>>>>>>> >>>>>>>> go g.doCall(c, key, fn) >>>>>>>> >>>>>>>> ... and then returns immediately. >>>>>>>> >>>>>>>> Also notice that the random time you pick for cancelTime can be >>>>>>>> longer than the different random time you sleep inside the goroutine >>>>>>>> (i.e. >>>>>>>> the function which you pass to DoChan). Hence the goroutine can >>>>>>>> return a >>>>>>>> result, before the cancelTime is reached. >>>>>>>> >>>>>>>> Try this modification: >>>>>>>> >>>>>>>> --- main.go.orig 2022-09-21 13:14:10.000000000 +0100 >>>>>>>> +++ main.go 2022-09-21 13:13:43.000000000 +0100 >>>>>>>> @@ -144,7 +144,7 @@ >>>>>>>> defer wg.Done() >>>>>>>> >>>>>>>> ch, _ := g.DoChan(key, func() (interface{}, error) { >>>>>>>> - time.Sleep(randTimeout()) >>>>>>>> + time.Sleep(5000 * time.Millisecond) >>>>>>>> if ctx.Err() == context.Canceled { >>>>>>>> return nil, fmt.Errorf("callUUID=[%d] >>>>>>>> err=[%s]", uuid, ctx.Err()) >>>>>>>> } >>>>>>>> @@ -152,7 +152,7 @@ >>>>>>>> }) >>>>>>>> >>>>>>>> // randomly choose a timeout to cancel >>>>>>>> - cancelTime := time.After(randTimeout()) >>>>>>>> + cancelTime := time.After(10 * time.Millisecond) >>>>>>>> select { >>>>>>>> case <-cancelTime: >>>>>>>> // cancel only if no other goroutines share >>>>>>>> >>>>>>>> On Wednesday, 21 September 2022 at 10:01:22 UTC+1 atomic wrote: >>>>>>>> >>>>>>>>> Thanks for your reply, but I still don't understand why time.Sleep >>>>>>>>> is causing my test program to panic. >>>>>>>>> >>>>>>>>> In fact, this is a real online environment problem. My application >>>>>>>>> uses http.Client.Do(), but it occasionally has errors: [lookup >>>>>>>>> xxxxx on xxxxx: dial udp xxxxx: operation was canceled], after >>>>>>>>> looking at >>>>>>>>> the code, I found that it may be There is a problem with >>>>>>>>> ForgetUnshared, >>>>>>>>> lookupIPAddr uses ForgetUnshared: >>>>>>>>> https://github.com/golang/go/blob/4a4127bccc826ebb6079af3252bc6bfeaec187c4/src/net/lookup.go#L336 >>>>>>>>> >>>>>>>>> 在2022年9月21日星期三 UTC+8 16:17:35<cuong.m...@gmail.com> 写道: >>>>>>>>> >>>>>>>>>> Hello, >>>>>>>>>> >>>>>>>>>> You use time.Sleep in your program, so the behavior is not >>>>>>>>>> predictable. In fact, I get it success or panic randomly. >>>>>>>>>> >>>>>>>>>> You can see https://go-review.googlesource.com/c/sync/+/424114 >>>>>>>>>> to see a predictable test of ForgetUnshared . >>>>>>>>>> >>>>>>>>>> On Wednesday, September 21, 2022 at 1:45:24 PM UTC+7 atomic wrote: >>>>>>>>>> >>>>>>>>>>> hello >>>>>>>>>>> >>>>>>>>>>> I find that the `src/internal/singleflight/singleflight.go >>>>>>>>>>> ForgetUnshared()` method returns results that are not always >>>>>>>>>>> expected >>>>>>>>>>> >>>>>>>>>>> For this I wrote a test code, I copied the code in the >>>>>>>>>>> src/internal/singleflight/singleflight.go file to the main package, >>>>>>>>>>> and >>>>>>>>>>> wrote a main function to test it, if ForgetUnshared() returns >>>>>>>>>>> correctly, >>>>>>>>>>> this code It should not panic, but the fact that it will panic >>>>>>>>>>> every time >>>>>>>>>>> it runs, is there something wrong with my understanding of >>>>>>>>>>> ForgetUnshared()? >>>>>>>>>>> >>>>>>>>>>> The test code cannot be run in goplay, so I posted a link: >>>>>>>>>>> https://gist.github.com/dchaofei/e07547bce17d94c3e05b1b2a7230f62f >>>>>>>>>>> >>>>>>>>>>> The go version I use for testing is 1.16, 1.19.1 >>>>>>>>>>> >>>>>>>>>>> result: >>>>>>>>>>> ``` >>>>>>>>>>> $ go run cmd/main.go >>>>>>>>>>> panic: callUUID=[9314284969 <(931)%20428-4969>] err=[context >>>>>>>>>>> canceled] currentUUId=[6980556786] >>>>>>>>>>> ``` >>>>>>>>>>> >>>>>>>>>> -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/8810750d-29a8-4d5a-bff3-003fced16e42n%40googlegroups.com.