OK, I think I have it.  It's ugly.

Firstly, note that multiple instances of doCall can be running for the same 
key.  This happens when:

1. you invoke DoChan.  This inserts a 'c' (call struct) into the map and 
starts doCall in a goroutine.
2. at this point it's not shared: i.e. you don't call DoChan again with the 
same key (yet).
3. you invoke ForgetUnshared on this key. This "detaches" it, but doCall 
carries on running. It has its own local copy of 'c' so it knows where to 
send the result, even though the map is now empty.
4. you invoke DoChan again with the same key.  This inserts a new 'c' into 
the map and starts a new doCall goroutine.

At this point, you have two instances of doCall running, and the map is 
pointing at the second one.

This is where it gets ugly.

5. you invoke DoChan yet again with the same key. This turns it into a 
shared task, with c.dups > 0, len(c.chans) > 1.
6. the first instance of doCall terminates.  At this point it 
unconditionally removes the key from the map - even though it had 
previously been removed by ForgetUnshared!

func (g *Group) doCall(c *call, key string, fn func() (interface{}, error)) 
{
        c.val, c.err = fn()
        c.wg.Done()

        g.mu.Lock()
*        delete(g.m, key)    // <<<< NOTE*
        for _, ch := range c.chans {
                ch <- Result{c.val, c.err, c.dups > 0}
        }
        g.mu.Unlock()
}

So, even though it's the first instance of doCall which is terminating, 
it's removing the second instance of doCall from the map.  This is now also 
a detached task.

7. In one of the two goroutines, the timeout event occurs.  It calls 
ForgetUnshared, which happily returns true because the key does not exist 
in the map - and therefore you proceed to cancel the context.

But actually a task with this key *is* running; and furthermore, it is a 
shared task, with 2 channel receivers.

8. Once the sleep has completed in the task function, it notices that the 
context is cancelled and returns an error.

9. doCall sends the resulting error down multiple channels (those you 
started in steps 4 and 5 above)

10. The select { case res := <-ch } triggers in the *other* goroutine - the 
one which didn't have a timeout. Hence it receives the error, and that's 
where you panic().

On Thursday, 22 September 2022 at 20:37:07 UTC+1 Brian Candler wrote:

> OK, I see where you're coming from - and I agree, this is a difficult one!
>
> The point you were making is that
>
>                                 if g.ForgetUnshared(key) {
>                                         cancel()
>                                 }
>
> should only invoke cancel() if this result wasn't shared: i.e. there's 
> only one receiver in the c.chans array, and c.dups == 0.  So where's the 
> race, given that everything in g is done under a mutex?
>
> What I have discovered so far is: when g.ForgetUnshared(key) returns true 
> and the problem occurs, the key is not present in the map (as opposed to 
> being present with c.dups == 0).  But I've not been able to work out why 
> yet.
>
> Incidentally, a minor style observation: you passed in ctx to your go 
> func(...), but not cancel. As far as I can see, both ctx and cancel are 
> local variables which drop immediately out of scope - there's no way they 
> can be modified later outside of the goroutine.  So I believe you don't 
> need to pass ctx at all: you can access it via the closure.  But if you do 
> pass one "to be on the safe side", then I think the other should be passed 
> as well - otherwise it's confusing why you passed in only one.
>
> In fact, in this case, you could move the ctx/cancel creation inside the 
> go func(...) anyway.  The only thing which needs to be outside is 
> the wg.Add(1).
>
> On Thursday, 22 September 2022 at 03:12:47 UTC+1 atomic wrote:
>
>> > Also notice that the random time you pick for cancelTime can be longer 
>> than the different random time you sleep inside the goroutine (i.e. the 
>> function which you pass to DoChan).  Hence the goroutine can return a 
>> result, before the cancelTime is reached.
>>
>> Although the goroutine can return a result before cancelTime arrives, the 
>> returned result should not be err because I haven't had time to call 
>> cancel(). 
>> 在2022年9月21日星期三 UTC+8 20:18:30<Brian Candler> 写道:
>>
>>> Notice that DoChan starts a goroutine for the task...
>>>
>>>         go g.doCall(c, key, fn)
>>>
>>> ... and then returns immediately.
>>>
>>> Also notice that the random time you pick for cancelTime can be longer 
>>> than the different random time you sleep inside the goroutine (i.e. the 
>>> function which you pass to DoChan).  Hence the goroutine can return a 
>>> result, before the cancelTime is reached.
>>>
>>> Try this modification:
>>>
>>> --- main.go.orig    2022-09-21 13:14:10.000000000 +0100
>>> +++ main.go    2022-09-21 13:13:43.000000000 +0100
>>> @@ -144,7 +144,7 @@
>>>              defer wg.Done()
>>>
>>>              ch, _ := g.DoChan(key, func() (interface{}, error) {
>>> -                time.Sleep(randTimeout())
>>> +                time.Sleep(5000 * time.Millisecond)
>>>                  if ctx.Err() == context.Canceled {
>>>                      return nil, fmt.Errorf("callUUID=[%d] err=[%s]", 
>>> uuid, ctx.Err())
>>>                  }
>>> @@ -152,7 +152,7 @@
>>>              })
>>>
>>>              // randomly choose a timeout to cancel
>>> -            cancelTime := time.After(randTimeout())
>>> +            cancelTime := time.After(10 * time.Millisecond)
>>>              select {
>>>              case <-cancelTime:
>>>                  // cancel only if no other goroutines share
>>>
>>> On Wednesday, 21 September 2022 at 10:01:22 UTC+1 atomic wrote:
>>>
>>>> Thanks for your reply, but I still don't understand why time.Sleep is 
>>>> causing my test program to panic.
>>>>
>>>> In fact, this is a real online environment problem. My application uses 
>>>> http.Client.Do(), but it occasionally has errors: [lookup xxxxx on 
>>>> xxxxx: dial udp xxxxx: operation was canceled], after looking at the code, 
>>>> I found that it may be There is a problem with ForgetUnshared, 
>>>> lookupIPAddr 
>>>> uses ForgetUnshared: 
>>>> https://github.com/golang/go/blob/4a4127bccc826ebb6079af3252bc6bfeaec187c4/src/net/lookup.go#L336
>>>>
>>>> 在2022年9月21日星期三 UTC+8 16:17:35<cuong.m...@gmail.com> 写道:
>>>>
>>>>> Hello,
>>>>>
>>>>> You use time.Sleep in your program, so the behavior is not 
>>>>> predictable. In fact, I get it success or panic randomly.
>>>>>
>>>>> You can see https://go-review.googlesource.com/c/sync/+/424114 to see 
>>>>> a predictable test of ForgetUnshared .
>>>>>
>>>>> On Wednesday, September 21, 2022 at 1:45:24 PM UTC+7 atomic wrote:
>>>>>
>>>>>> hello
>>>>>>
>>>>>> I find that the `src/internal/singleflight/singleflight.go 
>>>>>> ForgetUnshared()` method returns results that are not always expected
>>>>>>
>>>>>> For this I wrote a test code, I copied the code in the 
>>>>>> src/internal/singleflight/singleflight.go file to the main package, and 
>>>>>> wrote a main function to test it, if ForgetUnshared() returns correctly, 
>>>>>> this code It should not panic, but the fact that it will panic every 
>>>>>> time 
>>>>>> it runs, is there something wrong with my understanding of 
>>>>>> ForgetUnshared()?
>>>>>>
>>>>>> The test code cannot be run in goplay, so I posted a link: 
>>>>>> https://gist.github.com/dchaofei/e07547bce17d94c3e05b1b2a7230f62f
>>>>>>
>>>>>> The go version I use for testing is 1.16, 1.19.1
>>>>>>
>>>>>> result:
>>>>>> ```
>>>>>> $ go run cmd/main.go
>>>>>> panic: callUUID=[9314284969 <(931)%20428-4969>] err=[context 
>>>>>> canceled] currentUUId=[6980556786]
>>>>>> ```
>>>>>>
>>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/9f93b205-9106-4794-8a3c-4d5ee75d26ebn%40googlegroups.com.

Reply via email to