[go-nuts] Re: panic propagation can be blocked by blocking defer function

'Bryan Mills' via golang-nuts Mon, 02 Apr 2018 12:09:48 -0700

This is indeed a subtle pattern, but it's not clear to me how often it 
comes up in practice, and preventing deferred function calls from blocking 
could result in all kinds of other unwanted side effects.


I'm curious: how did you end up with the `wg.Add` call so far from the 
deferred `Wait`, and what triggers the corresponding `wg.Done`? (It seems 
unusual for the `Add` and `Done` calls not to have the same nesting 
structure as the potential panic and defer.)


On Saturday, March 31, 2018 at 8:28:18 PM UTC-4, Yongjian Xu wrote:
>
> This program will make panic not panicking but silently blocking forever.
>
> func panicfunc() {
>     panic("oops")
> }
>
> func run() {
>     defer func() {
>        select {}
>     }
>     panicfunc()
> }
>
> func main() {
>         // not trigger runtime deadlock.
>         go func() {
>          ticker := time.NewTicker(10*time.Second) 
>          select {
>          case <-ticker.C:
>          }
>     }()
>     run()
> }
>
> The common understanding is that the only way to prevent panic from 
> propagating is to write "recover" function in the defer call chain, but due 
> to that panic allows defers to be executed normally, it seems that 
> accidentally blocking panicking propagation could become fairly trivial and 
> subtle.
>
> The above program is too obvious as an illustration, so it's likely not 
> happening very often or sneaking out of some careful eyes, but in real 
> situation, this might become much less obvious.  e.g. (stripped from our 
> production code)
>
> func (r *Resumer) FromCheckpoint(id string) {
>      r.mu.Lock()
>      defer r.mu.Unlock()
>      ... mutate other fields of r...
>      r.wg.Add(1)
>     * r.resumeStarts[id] = time.Now()  *  
> }
>
> func (r *Resumer) WaitLoaded() {
>       r.wg.Wait()
> }
>
> func (r *Resumer) RecreateFromCheckpoint() {
>       for _, cp := r.checkpoint.Entries() {
>             id := cp.GetId()
>             r.FromCheckpoint(id)
>             task := NewTask(id, cp)
>             go task.Run()
>       }
> }
>
> func (r *Runner) Start() {
>        defer resumer.WaitLoaded()
>
>        ... other logic ...
>        if err := resumer.RecreateFromCheckpoint(); err != nil {
>               os.Exit()
>        }
>     
>        ... create RPC server...
>        ... register HTTP handler...
>        go server.Serve()
> }
>
> We added a new field in Resumer "resumeStarts" but forgot to put code to 
> correctly initiate it, so when the code runs, it should panic, but 
> unfortunately, it doesn't because we have a r.wg.Add(1) statement* 
> *before** the map assignment and the wg.Wait is under a defer path, which 
> will block panic propagation.
>
> The most easiest fix is to just write
>
> func (r *Resumer) FromCheckpoint(id string) {
>      r.mu.Lock()
>      defer r.mu.Unlock()
>      ... mutate other fields of r...
>     * r.resumeStarts[id] = time.Now()*
>      r.wg.Add(1)
> }
>
> Completely arbitrary order without altering the program semantics and 
> correctness but yet, write this way would not block the panic.
>
> Because the symptom is a completely block and there is absolutely no 
> recover call through out the stack, it was *SO NOT OBVIOUS* before we 
> found the issue.
>
> I am wonder if there is anything we can do to improve runtime support on 
> panic propagation to make sure that other than the builtin "recover", 
> nothing could prevent a panic unwinding the stack.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[go-nuts] Re: panic propagation can be blocked by blocking defer function

Reply via email to