I sent https://go-review.googlesource.com/c/go/+/433315 for fixing the 
issue.

On Friday, September 23, 2022 at 10:59:03 AM UTC+7 atomic wrote:

> No wonder I didn't find related questions, the method names in the two 
> packages are different:
> One is Forget and the other is ForgetUnshared
>
> 在2022年9月23日星期五 UTC+8 11:05:51<cuong.m...@gmail.com> 写道:
>
>> Seems to me this commit is not port to the internal singleflight: 
>> https://github.com/golang/sync/commit/56d357773e8497dfd526f0727e187720d1093757
>>
>> On Friday, September 23, 2022 at 9:23:29 AM UTC+7 atomic wrote:
>>
>>> Thank you so much, so happy, you are amazing.
>>> You answered a question that has been bothering me for days, I opened an 
>>> issue on github, can you submit a pr to fix this?
>>> https://github.com/golang/go/issues/55343
>>>
>>> 在2022年9月23日星期五 UTC+8 06:23:13<Brian Candler> 写道:
>>>
>>>> And here's a proof-of-concept fix which seems to do the job:
>>>>
>>>> --- main.go.orig    2022-09-21 13:14:10.000000000 +0100
>>>> +++ main.go    2022-09-22 23:19:54.000000000 +0100
>>>> @@ -27,6 +27,7 @@
>>>>      // not written after the WaitGroup is done.
>>>>      dups  int
>>>>      chans []chan<- Result
>>>> +    forgotten bool
>>>>  }
>>>>
>>>>  // Group represents a class of work and forms a namespace in
>>>> @@ -101,7 +102,9 @@
>>>>      c.wg.Done()
>>>>
>>>>      g.mu.Lock()
>>>> -    delete(g.m, key)
>>>> +    if !c.forgotten {
>>>> +        delete(g.m, key)
>>>> +    }
>>>>      for _, ch := range c.chans {
>>>>          ch <- Result{c.val, c.err, c.dups > 0}
>>>>      }
>>>> @@ -121,6 +124,7 @@
>>>>          return true
>>>>      }
>>>>      if c.dups == 0 {
>>>> +        c.forgotten = true
>>>>          delete(g.m, key)
>>>>          return true
>>>>      }
>>>>
>>>> On Thursday, 22 September 2022 at 23:16:22 UTC+1 Brian Candler wrote:
>>>>
>>>>> OK, I think I have it.  It's ugly.
>>>>>
>>>>> Firstly, note that multiple instances of doCall can be running for the 
>>>>> same key.  This happens when:
>>>>>
>>>>> 1. you invoke DoChan.  This inserts a 'c' (call struct) into the map 
>>>>> and starts doCall in a goroutine.
>>>>> 2. at this point it's not shared: i.e. you don't call DoChan again 
>>>>> with the same key (yet).
>>>>> 3. you invoke ForgetUnshared on this key. This "detaches" it, but 
>>>>> doCall carries on running. It has its own local copy of 'c' so it knows 
>>>>> where to send the result, even though the map is now empty.
>>>>> 4. you invoke DoChan again with the same key.  This inserts a new 'c' 
>>>>> into the map and starts a new doCall goroutine.
>>>>>
>>>>> At this point, you have two instances of doCall running, and the map 
>>>>> is pointing at the second one.
>>>>>
>>>>> This is where it gets ugly.
>>>>>
>>>>> 5. you invoke DoChan yet again with the same key. This turns it into a 
>>>>> shared task, with c.dups > 0, len(c.chans) > 1.
>>>>> 6. the first instance of doCall terminates.  At this point it 
>>>>> unconditionally removes the key from the map - even though it had 
>>>>> previously been removed by ForgetUnshared!
>>>>>
>>>>> func (g *Group) doCall(c *call, key string, fn func() (interface{}, 
>>>>> error)) {
>>>>>         c.val, c.err = fn()
>>>>>         c.wg.Done()
>>>>>
>>>>>         g.mu.Lock()
>>>>> *        delete(g.m, key)    // <<<< NOTE*
>>>>>         for _, ch := range c.chans {
>>>>>                 ch <- Result{c.val, c.err, c.dups > 0}
>>>>>         }
>>>>>         g.mu.Unlock()
>>>>> }
>>>>>
>>>>> So, even though it's the first instance of doCall which is 
>>>>> terminating, it's removing the second instance of doCall from the map.  
>>>>> This is now also a detached task.
>>>>>
>>>>> 7. In one of the two goroutines, the timeout event occurs.  It calls 
>>>>> ForgetUnshared, which happily returns true because the key does not exist 
>>>>> in the map - and therefore you proceed to cancel the context.
>>>>>
>>>>> But actually a task with this key *is* running; and furthermore, it is 
>>>>> a shared task, with 2 channel receivers.
>>>>>
>>>>> 8. Once the sleep has completed in the task function, it notices that 
>>>>> the context is cancelled and returns an error.
>>>>>
>>>>> 9. doCall sends the resulting error down multiple channels (those you 
>>>>> started in steps 4 and 5 above)
>>>>>
>>>>> 10. The select { case res := <-ch } triggers in the *other* goroutine 
>>>>> - the one which didn't have a timeout. Hence it receives the error, and 
>>>>> that's where you panic().
>>>>>
>>>>> On Thursday, 22 September 2022 at 20:37:07 UTC+1 Brian Candler wrote:
>>>>>
>>>>>> OK, I see where you're coming from - and I agree, this is a difficult 
>>>>>> one!
>>>>>>
>>>>>> The point you were making is that
>>>>>>
>>>>>>                                 if g.ForgetUnshared(key) {
>>>>>>                                         cancel()
>>>>>>                                 }
>>>>>>
>>>>>> should only invoke cancel() if this result wasn't shared: i.e. 
>>>>>> there's only one receiver in the c.chans array, and c.dups == 0.  So 
>>>>>> where's the race, given that everything in g is done under a mutex?
>>>>>>
>>>>>> What I have discovered so far is: when g.ForgetUnshared(key) returns 
>>>>>> true and the problem occurs, the key is not present in the map (as 
>>>>>> opposed 
>>>>>> to being present with c.dups == 0).  But I've not been able to work out 
>>>>>> why 
>>>>>> yet.
>>>>>>
>>>>>> Incidentally, a minor style observation: you passed in ctx to your go 
>>>>>> func(...), but not cancel. As far as I can see, both ctx and cancel are 
>>>>>> local variables which drop immediately out of scope - there's no way 
>>>>>> they 
>>>>>> can be modified later outside of the goroutine.  So I believe you don't 
>>>>>> need to pass ctx at all: you can access it via the closure.  But if you 
>>>>>> do 
>>>>>> pass one "to be on the safe side", then I think the other should be 
>>>>>> passed 
>>>>>> as well - otherwise it's confusing why you passed in only one.
>>>>>>
>>>>>> In fact, in this case, you could move the ctx/cancel creation inside 
>>>>>> the go func(...) anyway.  The only thing which needs to be outside is 
>>>>>> the wg.Add(1).
>>>>>>
>>>>>> On Thursday, 22 September 2022 at 03:12:47 UTC+1 atomic wrote:
>>>>>>
>>>>>>> > Also notice that the random time you pick for cancelTime can be 
>>>>>>> longer than the different random time you sleep inside the goroutine 
>>>>>>> (i.e. 
>>>>>>> the function which you pass to DoChan).  Hence the goroutine can return 
>>>>>>> a 
>>>>>>> result, before the cancelTime is reached.
>>>>>>>
>>>>>>> Although the goroutine can return a result before cancelTime 
>>>>>>> arrives, the returned result should not be err because I haven't had 
>>>>>>> time 
>>>>>>> to call cancel(). 
>>>>>>> 在2022年9月21日星期三 UTC+8 20:18:30<Brian Candler> 写道:
>>>>>>>
>>>>>>>> Notice that DoChan starts a goroutine for the task...
>>>>>>>>
>>>>>>>>         go g.doCall(c, key, fn)
>>>>>>>>
>>>>>>>> ... and then returns immediately.
>>>>>>>>
>>>>>>>> Also notice that the random time you pick for cancelTime can be 
>>>>>>>> longer than the different random time you sleep inside the goroutine 
>>>>>>>> (i.e. 
>>>>>>>> the function which you pass to DoChan).  Hence the goroutine can 
>>>>>>>> return a 
>>>>>>>> result, before the cancelTime is reached.
>>>>>>>>
>>>>>>>> Try this modification:
>>>>>>>>
>>>>>>>> --- main.go.orig    2022-09-21 13:14:10.000000000 +0100
>>>>>>>> +++ main.go    2022-09-21 13:13:43.000000000 +0100
>>>>>>>> @@ -144,7 +144,7 @@
>>>>>>>>              defer wg.Done()
>>>>>>>>
>>>>>>>>              ch, _ := g.DoChan(key, func() (interface{}, error) {
>>>>>>>> -                time.Sleep(randTimeout())
>>>>>>>> +                time.Sleep(5000 * time.Millisecond)
>>>>>>>>                  if ctx.Err() == context.Canceled {
>>>>>>>>                      return nil, fmt.Errorf("callUUID=[%d] 
>>>>>>>> err=[%s]", uuid, ctx.Err())
>>>>>>>>                  }
>>>>>>>> @@ -152,7 +152,7 @@
>>>>>>>>              })
>>>>>>>>
>>>>>>>>              // randomly choose a timeout to cancel
>>>>>>>> -            cancelTime := time.After(randTimeout())
>>>>>>>> +            cancelTime := time.After(10 * time.Millisecond)
>>>>>>>>              select {
>>>>>>>>              case <-cancelTime:
>>>>>>>>                  // cancel only if no other goroutines share
>>>>>>>>
>>>>>>>> On Wednesday, 21 September 2022 at 10:01:22 UTC+1 atomic wrote:
>>>>>>>>
>>>>>>>>> Thanks for your reply, but I still don't understand why time.Sleep 
>>>>>>>>> is causing my test program to panic.
>>>>>>>>>
>>>>>>>>> In fact, this is a real online environment problem. My application 
>>>>>>>>> uses http.Client.Do(), but it occasionally has errors: [lookup 
>>>>>>>>> xxxxx on xxxxx: dial udp xxxxx: operation was canceled], after 
>>>>>>>>> looking at 
>>>>>>>>> the code, I found that it may be There is a problem with 
>>>>>>>>> ForgetUnshared, 
>>>>>>>>> lookupIPAddr uses ForgetUnshared: 
>>>>>>>>> https://github.com/golang/go/blob/4a4127bccc826ebb6079af3252bc6bfeaec187c4/src/net/lookup.go#L336
>>>>>>>>>
>>>>>>>>> 在2022年9月21日星期三 UTC+8 16:17:35<cuong.m...@gmail.com> 写道:
>>>>>>>>>
>>>>>>>>>> Hello,
>>>>>>>>>>
>>>>>>>>>> You use time.Sleep in your program, so the behavior is not 
>>>>>>>>>> predictable. In fact, I get it success or panic randomly.
>>>>>>>>>>
>>>>>>>>>> You can see https://go-review.googlesource.com/c/sync/+/424114 
>>>>>>>>>> to see a predictable test of ForgetUnshared .
>>>>>>>>>>
>>>>>>>>>> On Wednesday, September 21, 2022 at 1:45:24 PM UTC+7 atomic wrote:
>>>>>>>>>>
>>>>>>>>>>> hello
>>>>>>>>>>>
>>>>>>>>>>> I find that the `src/internal/singleflight/singleflight.go 
>>>>>>>>>>> ForgetUnshared()` method returns results that are not always 
>>>>>>>>>>> expected
>>>>>>>>>>>
>>>>>>>>>>> For this I wrote a test code, I copied the code in the 
>>>>>>>>>>> src/internal/singleflight/singleflight.go file to the main package, 
>>>>>>>>>>> and 
>>>>>>>>>>> wrote a main function to test it, if ForgetUnshared() returns 
>>>>>>>>>>> correctly, 
>>>>>>>>>>> this code It should not panic, but the fact that it will panic 
>>>>>>>>>>> every time 
>>>>>>>>>>> it runs, is there something wrong with my understanding of 
>>>>>>>>>>> ForgetUnshared()?
>>>>>>>>>>>
>>>>>>>>>>> The test code cannot be run in goplay, so I posted a link: 
>>>>>>>>>>> https://gist.github.com/dchaofei/e07547bce17d94c3e05b1b2a7230f62f
>>>>>>>>>>>
>>>>>>>>>>> The go version I use for testing is 1.16, 1.19.1
>>>>>>>>>>>
>>>>>>>>>>> result:
>>>>>>>>>>> ```
>>>>>>>>>>> $ go run cmd/main.go
>>>>>>>>>>> panic: callUUID=[9314284969 <(931)%20428-4969>] err=[context 
>>>>>>>>>>> canceled] currentUUId=[6980556786]
>>>>>>>>>>> ```
>>>>>>>>>>>
>>>>>>>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/8810750d-29a8-4d5a-bff3-003fced16e42n%40googlegroups.com.

Reply via email to