On Tue, May 9, 2017 at 11:12 PM, James Peach <jpe...@apache.org> wrote:
> > > On 9 May 2017, at 0:02, Masakazu Kitajo wrote: > > Hi, >> >> Don't worry guys. I'm working on it. >> >> I merged the PR to have a draining (shedding) logic for HTTP/2 (Double >> GOAWAY frames), and now I'm generalizing the flag so that we can invoke >> draining on both HTTP/1 and HTTP/2. I'm going to add "Connection: close" >> header automatically for HTTP/1. So, "proxy.config.stop.shutdown_timeout" >> will be available for both versions. The PR was step 1, and it will be >> step >> 2. >> > > I think that this should be "proxy.config.restart.shutdown_timeout" or > something like that. It is best to keep configuration variable in smaller > hierarchies to improve discoverability. I definitely think you should add a > new configuration if you want to keep the code that closes connections > while draining. I agree that keeping configuration variable in smaller hierarchies, but I don't see the difference between stop.shutdown_timeout and restart.shutdown_timeout. Sending "Connection: close" and sending two GOAWAY frames are the same thing. They both indicate "no more transactions on this session", which suggest a client to close the connection. > > > "traffic_ctl server stop --drain" is also in my mind. Once we have one >> generalized way for draining in traffic_server, we can invoke it from >> anywhere we want (e.g. from traffic_manager, TS API, metric). Currently it >> is invoked using the flag, but eventually, we should be able to decouple >> draining and shutdown, and should also be able to remove the flag and >> sleep(). >> > > I guess the sleeps will back up every time you get a signal? You mean the sleep() will be called every time? We should avoid that, currently it may happen though. > > > Doing these all at once is heavy and the PR would be big, and it slows >> review and progress, IMO. You might want full featured graceful shutdown >> but this is a reason why I didn't ask to do it. >> > > Ok, thanks for clarifying. > > > >> Thanks, >> Masakazu >> >> >> >> On Tue, May 9, 2017 at 2:40 PM, CrazyCow <zhangzizhong0...@gmail.com> >> wrote: >> >> I don't disagree. The reasons I chose this way are: >>> 1. We are using other stuff in my team instead of traffic_ctl to manage >>> the >>> process and do the upgrade. >>> 2. traffic_ctl --drain can only support HTTP and it can only be used when >>> restarting ATS. That makes it hardly useful in our use case. >>> >>> 2017-05-08 22:16 GMT-07:00 James Peach <jpe...@apache.org>: >>> >>> >>>> >>>> On 8 May 2017, at 21:48, Miles Libbey wrote: >>>> >>>> We'd also like a bit more fine grained control in the process -- we >>>> >>>>> frequently want to perform maintenance on a server (upgrading ATS; >>>>> upgrading the OS; performing hardware changes, etc) after draining but >>>>> before restarting ATS. I suppose this would mean allowing the --drain >>>>> option to apply to traffic_Ctl server stop. >>>>> >>>>> >>>> Yes I agree that draining make sense for stop as well as restart. >>>> >>>> >>>> >>>> miles >>>>> >>>>> On Mon, May 8, 2017 at 8:19 PM, James Peach <jpe...@apache.org> wrote: >>>>> >>>>> This patch adds another separate shutdown mechanism that only works for >>>>>> HTTP/2. I think that we really ought to have a single, well-defined >>>>>> graceful >>>>>> shutdown that works for all protocols. >>>>>> >>>>>> In HTTP/1.1 it works like this: >>>>>> - You (possibly dynamically) set >>>>>> proxy.config.restart.active_client_threshold >>>>>> - You run “traffic_ctl server restart —drain” >>>>>> >>>>>> Then, once client connections have been drained to the threshold, >>>>>> traffic_server restarts. Note that this assumes that you have some >>>>>> additional orchestration that triggers header_rewrite to inject a >>>>>> “Connection: close” header, and tell the GSLB to stop sending new >>>>>> connections. >>>>>> >>>>>> After this HTTP/2 change, there is a new graceful shutdown path >>>>>> - You (at startup only) set proxy.config.stop.shutdown_tim >>>>>> eout >>>>>> - You send a signal to traffic_server >>>>>> >>>>>> This flips a global variable to the “drain” state, then sleeps in the >>>>>> signal >>>>>> handler until the timeout is reached. In HTTP/2 only, any new >>>>>> >>>>> connections >>> >>>> will be accepted and then immediately closed. >>>>>> >>>>>> To rationalize these disparate approaches, I suggest that we go back >>>>>> to >>>>>> the >>>>>> traffic_ctl methodology, and enhance it so that it sends a message to >>>>>> traffic_server that puts it into a known draining state. This should >>>>>> be >>>>>> published in a metric and should be reversible so you can abort the >>>>>> drain. >>>>>> The metric can be observed by HTTP/1.1 and HTTP/2 to take appropriate >>>>>> action. We should also add a new setting >>>>>> “proxy.config.restart.active_client_timeout” (or something like that) >>>>>> >>>>> to >>> >>>> handle the maximum time to wait for traffic to drain. >>>>>> >>>>>> I’m not sure whether it is a good idea to close connections in HTTP/2 >>>>>> while >>>>>> we are in draining state. If there is a desire for this, I would like >>>>>> >>>>> it >>> >>>> to >>>>>> be configurable (defaulting to off). >>>>>> >>>>>> On 8 May 2017, at 18:17, Masakazu Kitajo wrote: >>>>>> >>>>>> Merged #1710. >>>>>> >>>>>>> >>>>>>> -- >>>>>>> You are receiving this because you are subscribed to this thread. >>>>>>> Reply to this email directly or view it on GitHub: >>>>>>> https://github.com/apache/trafficserver/pull/1710#event-1073784726 >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>