Context

If there is any problem with traffic server which causes the server not to
start, traffic manager will retry using an exponential sleep time till it
hits the max sleep time which is currently hardcoded at 60 seconds, once
this is reached, traffic manager this retry indefinitely.


Problem

1) The max sleep time is hardcoded at 60s.

2) The retry can go forever

3) We've seen some scenarios where an entire group of TS crashes at the
same time due to issues with external dependencies, like a 3rd party server
that we may be trying to contact(CKMS)

Proposal

1) Add a configuration field that let us set the max sleep time instead of
a hardcoded 60s.

2) Add a configuration field that let us set the maximum number of retries
after we reach the max sleep time.

3) Add a random variance between retries, my proposal would be add a
variance between 0 and 1s for any retry.


Any feedback or concerns would be appreciated.

Best Regards,


Damian

Verizon Media

Reply via email to