Context If there is any problem with traffic server which causes the server not to start, traffic manager will retry using an exponential sleep time till it hits the max sleep time which is currently hardcoded at 60 seconds, once this is reached, traffic manager this retry indefinitely.
Problem 1) The max sleep time is hardcoded at 60s. 2) The retry can go forever 3) We've seen some scenarios where an entire group of TS crashes at the same time due to issues with external dependencies, like a 3rd party server that we may be trying to contact(CKMS) Proposal 1) Add a configuration field that let us set the max sleep time instead of a hardcoded 60s. 2) Add a configuration field that let us set the maximum number of retries after we reach the max sleep time. 3) Add a random variance between retries, my proposal would be add a variance between 0 and 1s for any retry. Any feedback or concerns would be appreciated. Best Regards, Damian Verizon Media