Re: [apache/trafficserver] HTTP2 drain (#1710)

James Peach Mon, 12 Jun 2017 20:56:54 -0700


On 11 Jun 2017, at 4:45, Masakazu Kitajo wrote:

On Sun, Jun 11, 2017 at 1:27 PM, James Peach <jpe...@apache.org>wrote:
On 9 May 2017, at 0:02, Masakazu Kitajo wrote:

Hi,
Don't worry guys. I'm working on it.
I merged the PR to have a draining (shedding) logic for HTTP/2(DoubleGOAWAY frames), and now I'm generalizing the flag so that we caninvokedraining on both HTTP/1 and HTTP/2. I'm going to add "Connection:close"header automatically for HTTP/1. So,"proxy.config.stop.shutdown_timeout"will be available for both versions. The PR was step 1, and it willbe
step
2.
“proxy.config.stop.shutdown_timeout” is still a mis-feature. Whatreallymatters for shutting down is how much traffic you are still serving,whichis expressed by “proxy.config.restart.active_client_threshold”. Ithinkyou might also want to express a maximum time you are willing towait, but
just having a fixed time is not what we should be doing.
I agree the current implementation is not perfect, but that part iswhat we
haven't completed yet. Probably we need to use the both configurations
because we can’t wait forever until traffic goes down, and we canuse
"proxy.config.stop.shutdown_timeout" as
"proxy.config.restart.active_client_timeout”.

What are you planning for proxy.config.restart.active_client_timeout todo?

I think their intention is the same and the difference is whichprocess use
it.

I’d like for “proxy.config.stop.shutdown_timeout” to go away andbe replaced by something like“proxy.config.restart.max_shutdown_wait” which is the maximum timeyou are willing to wait for the clients to drain. If the traffic drainsto the acceptable threshold faster, then you get to restart faster.

Also, while you are draining, you should also turn keep alive off andautomatically turn down the idle session timeouts.

"traffic_ctl server stop --drain" is also in my mind. Once we haveone
generalized way for draining in traffic_server, we can invoke itfromanywhere we want (e.g. from traffic_manager, TS API, metric).Currently itis invoked using the flag, but eventually, we should be able todecoupledraining and shutdown, and should also be able to remove the flagand
sleep().
Doing these all at once is heavy and the PR would be big, and itslowsreview and progress, IMO. You might want full featured gracefulshutdown
but this is a reason why I didn't ask to do it.
I’m fine with (and even prefer) doing this in pieces, but we needto agree
on the direction and that the end result will be what we want.
Ok, probably the step after PR#2106 will be how we invoke gracefulshutdownand which process should wait for / determine the completion ofdraining. These
should be the points we still need to discuss and agree on.
Actually, I basically agree on your last suggestion. I'm not keepingthe
configuration and the flag, but just left them on the PR.


OK, that sounds good :)

Thanks,
Masakazu
On Tue, May 9, 2017 at 2:40 PM, CrazyCow<zhangzizhong0...@gmail.com>
wrote:

I don't disagree. The reasons I chose this way are:
1. We are using other stuff in my team instead of traffic_ctl tomanage
the
process and do the upgrade.
2. traffic_ctl --drain can only support HTTP and it can only beused when
restarting ATS. That makes it hardly useful in our use case.

2017-05-08 22:16 GMT-07:00 James Peach <jpe...@apache.org>:
On 8 May 2017, at 21:48, Miles Libbey wrote:
We'd also like a bit more fine grained control in the process --we
frequently want to perform maintenance on a server (upgradingATS;upgrading the OS; performing hardware changes, etc) afterdraining butbefore restarting ATS. I suppose this would mean allowing the--drain
option to apply to traffic_Ctl server stop.
Yes I agree that draining make sense for stop as well as restart.



miles
On Mon, May 8, 2017 at 8:19 PM, James Peach <jpe...@apache.org>wrote:
This patch adds another separate shutdown mechanism that onlyworks for
HTTP/2. I think that we really ought to have a single,well-defined
graceful
shutdown that works for all protocols.

In HTTP/1.1 it works like this:
        - You (possibly dynamically) set
proxy.config.restart.active_client_threshold
        - You run “traffic_ctl server restart —drain”
Then, once client connections have been drained to thethreshold,traffic_server restarts. Note that this assumes that you havesomeadditional orchestration that triggers header_rewrite to injecta“Connection: close” header, and tell the GSLB to stopsending new
connections.

After this HTTP/2 change, there is a new graceful shutdown path
- You (at startup only) setproxy.config.stop.shutdown_tim
eout
        - You send a signal to traffic_server
This flips a global variable to the “drain” state, thensleeps in the
signal
handler until the timeout is reached. In HTTP/2 only, any new
connections
will be accepted and then immediately closed.
To rationalize these disparate approaches, I suggest that we goback
to
the
traffic_ctl methodology, and enhance it so that it sends amessage totraffic_server that puts it into a known draining state. Thisshould
be
published in a metric and should be reversible so you can abortthe
drain.
The metric can be observed by HTTP/1.1 and HTTP/2 to takeappropriate
action. We should also add a new setting
“proxy.config.restart.active_client_timeout” (or somethinglike that)
to
handle the maximum time to wait for traffic to drain.
I’m not sure whether it is a good idea to close connections inHTTP/2
while
we are in draining state. If there is a desire for this, I wouldlike
it
to
be configurable (defaulting to off).

On 8 May 2017, at 18:17, Masakazu Kitajo wrote:

Merged #1710.
--
You are receiving this because you are subscribed to thisthread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/trafficserver/pull/1710#event-1073784726

Re: [apache/trafficserver] HTTP2 drain (#1710)

Reply via email to