Hi all, I am working on a fix for TS-4475 which is a core dump in LogCollationClientSM.cc. I gather that there is not a lot of interest in the Log Collation client so I am reaching out to the mailing list especially for folks familiar with inactivity cop.
Background: The core dump occurs when a VC_INACTIVITY_TIMEOUT_EVENT == 105 is sent to various of the state machine's event handlers This event is not coded into the switch(event) so it core dumps as follows due to the following code - default: ink_assert(!"unexpcted state"); return EVENT_CONT; We have been seeing this inactivity event occur recently since we changed the net inactivity default timer from 86400 to 300s (in order to use inactivity_cop to help get rid of hanging connections during an over-loaded traffic condition). We are using ATS as a log collation client, i.e., we have defined log collation hosts in logs_xml.config. I have inserted the following code into the various event handlers to "ignore" the time-out event - + case VC_EVENT_INACTIVITY_TIMEOUT: + Note("[LogCollationClientSM] - ignoring VC_EVENT_INACTIVITY_TIMEOUT"); + return EVENT_CONT; case VC_EVENT_EOS: This seems to work (generating a NOTE in diags.log every five minutes), but I am concerned whether NetHandler::manage_active_queue() would be trying to close the VC even if I ignore the time-out event in the event handler? Example from diags.log - [Jun 24 00:39:35.031] Server {0x2b387ec7c700} NOTE: [LogCollationClientSM] - ignoring VC_EVENT_INACTIVITY_TIMEOUT Debug from traffic.out - [Jun 24 00:39:35.031] Server {0x2b387ec7c700} DEBUG: (inactivity_cop_verbose) vc: 0x2b38c0018d00 now: 1466728775031910904 timeout at : 1466728774 timeout in: 300 [Jun 24 00:39:35.031] Server {0x2b387ec7c700} DEBUG: (inactivity_cop_verbose) vc: 0x2b38c0018d00 now: 1466728775031910904 timeout at : 1466728774291725549 timeout in: 300000000000 [Jun 24 00:39:37.024] Server {0x2b387ec7c700} DEBUG: (inactivity_cop_verbose) vc: 0x2b38c0018d00 now: 1466728777024229232 timeout at : 1466729076 timeout in: 300 It seems like the net effect is just to advance the time-out another ~ 300s upon each time-out. Note that the debug message numbers seem to change units at the transition point. Appreciate any recommendations or comments on this solution. Thanks, Peter