Hi all,

I am working on a fix for TS-4475 which is a core dump in 
LogCollationClientSM.cc. I gather that there is not a lot of interest in the 
Log Collation client so I am reaching out to the mailing list especially for 
folks familiar with inactivity cop.

Background: The core dump occurs when a VC_INACTIVITY_TIMEOUT_EVENT == 105 is 
sent to various of the state machine's event handlers This event is not coded 
into the switch(event) so it core dumps as follows due to the following code -

  default:
    ink_assert(!"unexpcted state");
    return EVENT_CONT;

We have been seeing this inactivity event occur recently since we changed the 
net inactivity default timer from 86400 to 300s (in order to use inactivity_cop 
to help get rid of hanging connections during an over-loaded traffic 
condition). We are using ATS as a log collation client, i.e., we have defined 
log collation hosts in logs_xml.config.

I have inserted the following code into the various event handlers to "ignore" 
the time-out event -

+  case VC_EVENT_INACTIVITY_TIMEOUT:
+    Note("[LogCollationClientSM] - ignoring VC_EVENT_INACTIVITY_TIMEOUT");
+    return EVENT_CONT;
  case VC_EVENT_EOS:

This seems to work (generating a NOTE in diags.log every five minutes), but I 
am concerned whether NetHandler::manage_active_queue() would be trying to close 
the VC even if I ignore the time-out event in the event handler?

Example from diags.log -

[Jun 24 00:39:35.031] Server {0x2b387ec7c700} NOTE: [LogCollationClientSM] - 
ignoring VC_EVENT_INACTIVITY_TIMEOUT

Debug from traffic.out -

[Jun 24 00:39:35.031] Server {0x2b387ec7c700} DEBUG: (inactivity_cop_verbose) 
vc: 0x2b38c0018d00 now: 1466728775031910904 timeout at
: 1466728774 timeout in: 300
[Jun 24 00:39:35.031] Server {0x2b387ec7c700} DEBUG: (inactivity_cop_verbose) 
vc: 0x2b38c0018d00 now: 1466728775031910904 timeout at
: 1466728774291725549 timeout in: 300000000000
[Jun 24 00:39:37.024] Server {0x2b387ec7c700} DEBUG: (inactivity_cop_verbose) 
vc: 0x2b38c0018d00 now: 1466728777024229232 timeout at
: 1466729076 timeout in: 300

It seems like the net effect is just to advance the time-out another ~ 300s 
upon each time-out. Note that the debug message numbers seem to change units at 
the transition point.

Appreciate any recommendations or comments on this solution.

Thanks,
Peter

Reply via email to