On 9 May 2017, at 0:02, Masakazu Kitajo wrote:

Hi,

Don't worry guys. I'm working on it.

I merged the PR to have a draining (shedding) logic for HTTP/2 (Double
GOAWAY frames), and now I'm generalizing the flag so that we can invoke draining on both HTTP/1 and HTTP/2. I'm going to add "Connection: close" header automatically for HTTP/1. So, "proxy.config.stop.shutdown_timeout" will be available for both versions. The PR was step 1, and it will be step
2.

I think that this should be "proxy.config.restart.shutdown_timeout" or something like that. It is best to keep configuration variable in smaller hierarchies to improve discoverability. I definitely think you should add a new configuration if you want to keep the code that closes connections while draining.

"traffic_ctl server stop --drain" is also in my mind. Once we have one
generalized way for draining in traffic_server, we can invoke it from
anywhere we want (e.g. from traffic_manager, TS API, metric). Currently it is invoked using the flag, but eventually, we should be able to decouple
draining and shutdown, and should also be able to remove the flag and
sleep().

I guess the sleeps will back up every time you get a signal?

Doing these all at once is heavy and the PR would be big, and it slows
review and progress, IMO. You might want full featured graceful shutdown
but this is a reason why I didn't ask to do it.

Ok, thanks for clarifying.


Thanks,
Masakazu



On Tue, May 9, 2017 at 2:40 PM, CrazyCow <zhangzizhong0...@gmail.com> wrote:

I don't disagree. The reasons I chose this way are:
1. We are using other stuff in my team instead of traffic_ctl to manage the
process and do the upgrade.
2. traffic_ctl --drain can only support HTTP and it can only be used when
restarting ATS. That makes it hardly useful in our use case.

2017-05-08 22:16 GMT-07:00 James Peach <jpe...@apache.org>:



On 8 May 2017, at 21:48, Miles Libbey wrote:

We'd also like a bit more fine grained control in the process -- we
frequently want to perform maintenance on a server (upgrading ATS;
upgrading the OS; performing hardware changes, etc) after draining but before restarting ATS. I suppose this would mean allowing the --drain
option to apply to traffic_Ctl server stop.


Yes I agree that draining make sense for stop as well as restart.



miles

On Mon, May 8, 2017 at 8:19 PM, James Peach <jpe...@apache.org> wrote:

This patch adds another separate shutdown mechanism that only works for HTTP/2. I think that we really ought to have a single, well-defined
graceful
shutdown that works for all protocols.

In HTTP/1.1 it works like this:
        - You (possibly dynamically) set
proxy.config.restart.active_client_threshold
        - You run “traffic_ctl server restart —drain”

Then, once client connections have been drained to the threshold,
traffic_server restarts. Note that this assumes that you have some
additional orchestration that triggers header_rewrite to inject a
“Connection: close” header, and tell the GSLB to stop sending new
connections.

After this HTTP/2 change, there is a new graceful shutdown path
- You (at startup only) set proxy.config.stop.shutdown_timeout
        - You send a signal to traffic_server

This flips a global variable to the “drain” state, then sleeps in the
signal
handler until the timeout is reached. In HTTP/2 only, any new
connections
will be accepted and then immediately closed.

To rationalize these disparate approaches, I suggest that we go back to
the
traffic_ctl methodology, and enhance it so that it sends a message to traffic_server that puts it into a known draining state. This should be published in a metric and should be reversible so you can abort the
drain.
The metric can be observed by HTTP/1.1 and HTTP/2 to take appropriate
action. We should also add a new setting
“proxy.config.restart.active_client_timeout” (or something like that)
to
handle the maximum time to wait for traffic to drain.

I’m not sure whether it is a good idea to close connections in HTTP/2
while
we are in draining state. If there is a desire for this, I would like
it
to
be configurable (defaulting to off).

On 8 May 2017, at 18:17, Masakazu Kitajo wrote:

Merged #1710.

--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/trafficserver/pull/1710#event-1073784726






Reply via email to