On Tuesday, April 25, 2017 at 6:07:33 PM UTC-7, Dave Cheney wrote: > > > > On Wednesday, 26 April 2017 10:57:58 UTC+10, Chris G wrote: >> >> I think those are all excellent things to do. They do not preclude the >> use of recovering from a panic to assist (emphasis on assist - it is >> certainly no silver bullet) in achieving fault tolerance. >> >> Assuming a web service that needs to be highly available, crashing the >> entire process due to one misbehaved goroutine is irresponsible. There can >> be thousands of other active requests in flight that could fail gracefully >> as well, or succeed at their task. >> >> In this scenario, I believe a well behaved program should >> >> - clearly log all information about the fault >> >> panic does that >
Yes, and then crashes the program. In the scenario I described, with thousands of other requests in flight that meet an abrubt end. That could be incredibly costly, even if it's been planned for. > > >> >> - remove itself from a load balancer >> >> your load balancer should detect that; it shouldn't wait to be told that > a backend has failed. > Your load balancer should detect a crashed backend, yes. You shouldn't crash a live backend needlessly is my point. All load balancers that I'm aware of relying on heartbeating the backend. Flipping a healthcheck endpoint to return an unhealthy state will be detected as quickly as if it had crashed. The intent is to allow in flight requests to finish, while not allowing more in. > > >> >> - >> - alert some monitoring program that it has experienced critical >> errors >> >> The monitoring program should detect that the process exited; not the > other way around. > Fair, poor wording on my part. > > >> >> - depending on widespread severity, have a monitoring program alert a >> human to inspect it >> >> > Same; relying on a malfunctioning program to report its failure is like > asking a sick human to perform their own surgery. > I was assuming that the monitoring program was some seperate process or service (nagios, or some commercial provider). I'm sorry if there was miscommunication on that. I don't believe this analogy holds > > >> On Tuesday, April 25, 2017 at 3:32:56 PM UTC-7, Dave Cheney wrote: >>> >>> Aside from arguments about using panic/recover to simulate longjmp >>> inside recursive descent parsers I can think of no valid reason why recover >>> should be used in production code. >>> >>> Imo, the arguments about wrapping all goroutines in a catch all recover >>> are solving th wrong problem. >>> >>> - if third party code you use panics regularly, maybe don't use it, or >>> at least validate inputs passed to it to avoid provoking it. >>> - if your program needs to be available, then rather than trying to >>> diagnose the program's state internally, use something like daemontools, >>> upstart, or systemd to restart it if it crashes. Dont forget there are >>> plenty of other ways to exit a go program abruptly; os.Exit or log.Fatal >>> are two that come to mind. Prefer only software. >>> - if your program has to be highly available, then abandon the falsehood >>> that a single machine can meet these requirements and invest your >>> engineering effort in making your application run across multiple machines. >>> >>> IMO there is no justification for using recover as a general safety net >>> in production Go code. >> >> -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.