Thank Evan. In our case, the edge’s
origin_max_connections: 0 
net.connection_throttle:500000
parent_proxy.total_connect_attemps:6
parent_proxy.per_parent_connec_attempts:1 

Do you think we are subject to the looping condition this commit addresses?

-Hongfei

> On Sep 30, 2020, at 2:37 PM, Evan Zelkowitz <e...@apache.org> wrote:
> 
> This sounds similar to issues we have had in the past. I think this
> was fixed in 7x with this commit,
> https://github.com/apache/trafficserver/commit/1da4fbcd5fd32ff86888ed401e0c297b22e00c52
> . There have been a few other commits past that for other tight loop
> scenarios when dealing with the connection limit in ATS as well so I'd
> recommend moving to at least 7.1.8
> 
> On Wed, Sep 30, 2020 at 12:07 PM Hongfei Zhang <hongfei...@gmail.com> wrote:
>> 
>> Hi Folks,
>> 
>> In a 2-tier CDN production environment using ATS, we recently had a series 
>> of issues that I want to reach out to the community for help.
>> 
>> The symptoms usually shows up  on edge caches as:
>>        1) elevated CPU - noticeable, but not 'through the roof'
>>        2) unusually high ingress bw, often order of magnitudes higher than 
>> egress. i.e. edge caches is pulling a lot of data from mids but not sending 
>> much to end clients. We often saw 3-4Gbps ingress vs. 500Mbps egress. The 
>> edge cache uses separate network interfaces - the ingress traffic is mainly 
>> from upstream interface.
>>        3) huge number of TIME_WAIT sockets on edge cache, 70-80K, all of 
>> them destined to mid. This number stays at that level until we kill/restart 
>> ATS process. It looks like the edge is open/close sockets to mid, as 
>> indicated by the debug log.
>>        4) debug log shows SM 702933367 stuck in the loop repeating the 
>> following, over and over and over. The regular transaction log, however, 
>> does not have an entry for the url this SM is handling.  So it feels like 
>> the the stuck loop runs in the ‘background’, GET the same url, without 
>> caching or sending to client.
>> ...
>> [Sep 22 03:54:23.135] Server {0x2b7e51d2d700} DEBUG: (http) [702933367] 
>> State Transition: SM_ACTION_ORIGIN_SERVER_OPEN -> 
>> SM_ACTION_ORIGIN_SERVER_OPEN
>> [Sep 22 03:54:23.135] Server {0x2b7e51d2d700} DEBUG: (http) [702933367] open 
>> connection to x.x.x.x 172.17.173.103:80
>> [Sep 22 03:54:23.137] Server {0x2b7e51d2d700} DEBUG: (http) [702933367] 
>> [HttpSM::main_handler, NET_EVENT_OPEN]
>> [Sep 22 03:54:23.137] Server {0x2b7e51d2d700} DEBUG: (http) [702933367] 
>> [&HttpSM::state_http_server_open, NET_EVENT_OPEN]
>> [Sep 22 03:54:23.137] Server {0x2b7e51d2d700} DEBUG: (http) [702933367] 
>> calling plugin on hook TS_HTTP_SEND_REQUEST_HDR_HOOK at hook 0x2b7e844b49e0
>> [Sep 22 03:54:23.137] Server {0x2b7e51d2d700} DEBUG: (http) [702933367] 
>> [&HttpSM::state_api_callback, HTTP_API_CONTINUE]
>> [Sep 22 03:54:23.137] Server {0x2b7e51d2d700} DEBUG: (http) [702933367] 
>> [&HttpSM::state_api_callout, HTTP_API_CONTINUE]
>> [Sep 22 03:54:23.187] Server {0x2b7e51d2d700} DEBUG: (http) [702933367] 
>> [HttpSM::main_handler, VC_EVENT_WRITE_COMPLETE]
>> [Sep 22 03:54:23.187] Server {0x2b7e51d2d700} DEBUG: (http) [702933367] 
>> [&HttpSM::state_send_server_request_header, VC_EVENT_WRITE_COMPLETE]
>> [Sep 22 03:54:23.226] Server {0x2b7e51d2d700} DEBUG: (http) [702933367] 
>> [HttpSM::main_handler, VC_EVENT_READ_READY]
>> [Sep 22 03:54:23.226] Server {0x2b7e51d2d700} DEBUG: (http) [702933367] 
>> [&HttpSM::state_read_server_response_header, VC_EVENT_READ_READY]
>> [Sep 22 03:54:23.227] Server {0x2b7e51d2d700} DEBUG: (http) [702933367] 
>> calling plugin on hook TS_HTTP_READ_RESPONSE_HDR_HOOK at hook 0x2b7e844b4a60
>> [Sep 22 03:54:23.231] Server {0x2b7e51d2d700} DEBUG: (http) [702933367] 
>> [&HttpSM::state_api_callback, HTTP_API_CONTINUE]
>> [Sep 22 03:54:23.232] Server {0x2b7e51d2d700} DEBUG: (http) [702933367] 
>> [&HttpSM::state_api_callout, HTTP_API_CONTINUE]
>> [Sep 22 03:54:23.233] Server {0x2b7e51d2d700} DEBUG: (http_trans) 
>> [702933367] [HttpTransact::HandleResponse]
>> [Sep 22 03:54:23.233] Server {0x2b7e51d2d700} DEBUG: (http_seq) [702933367] 
>> [HttpTransact::HandleResponse] Response received
>> [Sep 22 03:54:23.233] Server {0x2b7e51d2d700} DEBUG: (http_trans) 
>> [702933367] [HandleResponse] response_received_time: 1600746863
>> [Sep 22 03:54:23.233] Server {0x2b7e51d2d700} DEBUG: (http_trans) 
>> [702933367] [is_response_valid] No errors in response
>> [Sep 22 03:54:23.234] Server {0x2b7e51d2d700} DEBUG: (http_seq) [702933367] 
>> [HttpTransact::HandleResponse] Response valid
>> [Sep 22 03:54:23.235] Server {0x2b7e51d2d700} DEBUG: (http_hdrs) [702933367] 
>> [initialize_state_variables_from_response]Server is keep-alive.
>> [Sep 22 03:54:23.236] Server {0x2b7e51d2d700} DEBUG: (http) [702933367] 
>> State Transition: SM_ACTION_ORIGIN_SERVER_OPEN -> 
>> SM_ACTION_ORIGIN_SERVER_OPEN
>> [Sep 22 03:54:23.240] Server {0x2b7e51d2d700} DEBUG: (http) [702933367] open 
>> connection to x.x.x.x 172.17.173.103:80
>> [Sep 22 03:54:23.241] Server {0x2b7e51d2d700} DEBUG: (http) [702933367] 
>> [HttpSM::main_handler, NET_EVENT_OPEN]
>> [Sep 22 03:54:23.242] Server {0x2b7e51d2d700} DEBUG: (http) [702933367] 
>> [&HttpSM::state_http_server_open, NET_EVENT_OPEN]
>> [Sep 22 03:54:23.242] Server {0x2b7e51d2d700} DEBUG: (http) [702933367] 
>> calling plugin on hook TS_HTTP_SEND_REQUEST_HDR_HOOK at hook 0x2b7e844b49e0
>> [Sep 22 03:54:23.243] Server {0x2b7e51d2d700} DEBUG: (http) [702933367] 
>> [&HttpSM::state_api_callback, HTTP_API_CONTINUE]
>> [Sep 22 03:54:23.244] Server {0x2b7e51d2d700} DEBUG: (http) [702933367] 
>> [&HttpSM::state_api_callout, HTTP_API_CONTINUE]
>> [Sep 22 03:54:23.281] Server {0x2b7e51d2d700} DEBUG: (http) [702933367] 
>> [HttpSM::main_handler, VC_EVENT_WRITE_COMPLETE]
>> [Sep 22 03:54:23.281] Server {0x2b7e51d2d700} DEBUG: (http) [702933367] 
>> [&HttpSM::state_send_server_request_header, VC_EVENT_WRITE_COMPLETE]
>> [Sep 22 03:54:23.285] Server {0x2b7e51d2d700} DEBUG: (http) [702933367] 
>> [HttpSM::main_handler, VC_EVENT_READ_READY]
>> [Sep 22 03:54:23.286] Server {0x2b7e51d2d700} DEBUG: (http) [702933367] 
>> [&HttpSM::state_read_server_response_header, VC_EVENT_READ_READY]
>> [Sep 22 03:54:23.288] Server {0x2b7e51d2d700} DEBUG: (http) [702933367] 
>> calling plugin on hook TS_HTTP_READ_RESPONSE_HDR_HOOK at hook 0x2b7e844b4a60
>> [Sep 22 03:54:23.289] Server {0x2b7e51d2d700} DEBUG: (http) [702933367] 
>> [&HttpSM::state_api_callback, HTTP_API_CONTINUE]
>> [Sep 22 03:54:23.290] Server {0x2b7e51d2d700} DEBUG: (http) [702933367] 
>> [&HttpSM::state_api_callout, HTTP_API_CONTINUE]
>> [Sep 22 03:54:23.290] Server {0x2b7e51d2d700} DEBUG: (http_trans) 
>> [702933367] [HttpTransact::HandleResponse]
>> [Sep 22 03:54:23.290] Server {0x2b7e51d2d700} DEBUG: (http_seq) [702933367] 
>> [HttpTransact::HandleResponse] Response received
>> [Sep 22 03:54:23.290] Server {0x2b7e51d2d700} DEBUG: (http_trans) 
>> [702933367] [HandleResponse] response_received_time: 1600746863
>> [Sep 22 03:54:23.290] Server {0x2b7e51d2d700} DEBUG: (http_trans) 
>> [702933367] [is_response_valid] No errors in response
>> [Sep 22 03:54:23.292] Server {0x2b7e51d2d700} DEBUG: (http_seq) [702933367] 
>> [HttpTransact::HandleResponse] Response valid
>> [Sep 22 03:54:23.292] Server {0x2b7e51d2d700} DEBUG: (http_hdrs) [702933367] 
>> [initialize_state_variables_from_response]Server is keep-alive.
>> …
>>        5) the transaction log on the mid confirms there were request storms 
>> for the same url from a the edge, but all resulted in crc=ERR_CLIENT_ABORT, 
>> cfsc=INTR.
>> 
>> 
>> Our initial investigation confirmed the this traffic storm situation usually 
>> follows a repeated series of parent proxy (mid tier cache) loss and recovery 
>> events - recovery is usually 10 seconds after initial mark down, and the 
>> loss/recover repeats 3 or 4 times in about a minute.
>> 
>> The ATS is 7.1.5 based, process running on bare metal server with Centos 
>> 7.7. The only way to recover today is to restart ATS process. Obviously this 
>> is service impacting.
>> 
>> Was this problem reported before? Any verdict on root cause? Debugging tip 
>> or pointers to resolution?
>> 
>> Thanks,
>> -Hongfei

Reply via email to