This is indeed a subtle pattern, but it's not clear to me how often it comes up in practice, and preventing deferred function calls from blocking could result in all kinds of other unwanted side effects.
I'm curious: how did you end up with the `wg.Add` call so far from the deferred `Wait`, and what triggers the corresponding `wg.Done`? (It seems unusual for the `Add` and `Done` calls not to have the same nesting structure as the potential panic and defer.) On Saturday, March 31, 2018 at 8:28:18 PM UTC-4, Yongjian Xu wrote: > > This program will make panic not panicking but silently blocking forever. > > func panicfunc() { > panic("oops") > } > > func run() { > defer func() { > select {} > } > panicfunc() > } > > func main() { > // not trigger runtime deadlock. > go func() { > ticker := time.NewTicker(10*time.Second) > select { > case <-ticker.C: > } > }() > run() > } > > The common understanding is that the only way to prevent panic from > propagating is to write "recover" function in the defer call chain, but due > to that panic allows defers to be executed normally, it seems that > accidentally blocking panicking propagation could become fairly trivial and > subtle. > > The above program is too obvious as an illustration, so it's likely not > happening very often or sneaking out of some careful eyes, but in real > situation, this might become much less obvious. e.g. (stripped from our > production code) > > func (r *Resumer) FromCheckpoint(id string) { > r.mu.Lock() > defer r.mu.Unlock() > ... mutate other fields of r... > r.wg.Add(1) > * r.resumeStarts[id] = time.Now() * > } > > func (r *Resumer) WaitLoaded() { > r.wg.Wait() > } > > func (r *Resumer) RecreateFromCheckpoint() { > for _, cp := r.checkpoint.Entries() { > id := cp.GetId() > r.FromCheckpoint(id) > task := NewTask(id, cp) > go task.Run() > } > } > > func (r *Runner) Start() { > defer resumer.WaitLoaded() > > ... other logic ... > if err := resumer.RecreateFromCheckpoint(); err != nil { > os.Exit() > } > > ... create RPC server... > ... register HTTP handler... > go server.Serve() > } > > We added a new field in Resumer "resumeStarts" but forgot to put code to > correctly initiate it, so when the code runs, it should panic, but > unfortunately, it doesn't because we have a r.wg.Add(1) statement* > *before** the map assignment and the wg.Wait is under a defer path, which > will block panic propagation. > > The most easiest fix is to just write > > func (r *Resumer) FromCheckpoint(id string) { > r.mu.Lock() > defer r.mu.Unlock() > ... mutate other fields of r... > * r.resumeStarts[id] = time.Now()* > r.wg.Add(1) > } > > Completely arbitrary order without altering the program semantics and > correctness but yet, write this way would not block the panic. > > Because the symptom is a completely block and there is absolutely no > recover call through out the stack, it was *SO NOT OBVIOUS* before we > found the issue. > > I am wonder if there is anything we can do to improve runtime support on > panic propagation to make sure that other than the builtin "recover", > nothing could prevent a panic unwinding the stack. > > > -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.