Interestingly the point you make about an overloaded network segment
possibly causing dropped packets might be a likely candidate as the network
is certainly more heavily utilised on that site than at any other and as we
are not in control of any of the platform configurations, network schematics
etc it is beyond our control/scope to do anything about. 

I am adding extra trace in 4 additional events and logging the socket states
and error codes in the 

OnSessionAvailable, OnDebugAvailable, OnClientCreate and OnSessionClosed
events.

Previously we were not connecting any event handlers to these so were unable
to log the state. Analysis by the Prime contractor on the customers site
appeared to suggest that an FD_ACCEPT message was being processed but with
an error code as they reported that using a network analysis trace the
socket initialization was started correctly but that the log entries written
in the OnClientConnect were not written. 

I looked through the source and can see in the TriggerSessionAvailable
handler the line  If Error <> 0 then Exit;

And this is done before the construction of a 'client socket' object with
which to handle the connection and also prior to the point where the
OnClientConnect method is called.

So I am guessing an error is being passed in the LParam of the message.
Hopefully by attaching the OnSessionAvailable event we might be able to
capture what this error is and the be able to understand why this site has a
particular problem.

If and when I receive these additional logs I will post any conclusions
here.

Best regards,

Damien.


-----Original Message-----
From: twsocket-boun...@elists.org [mailto:twsocket-boun...@elists.org] On
Behalf Of Francois PIETTE
Sent: Tuesday, August 03, 2010 5:19 PM
To: ICS support mailing
Subject: Re: [twsocket] TWSocketServer OnConnection event

>   I have been asked to investigate a strange issue we are encountering at 
> a
> customer site in Mexico. I am a contractor for a company which supplied
> surveillance and monitoring software based on the ICS component set. The
> software runs fine on other sites with no problems encountered for over 8
> months but on the site in Mexico after a matter of hours or days the
> software (and or server) crashes.

> The servers are all identical HP Blade servers running Windows Server 2003
> vanilla installs. This is true of sites that are functioning and the ones 
> in
> Mexico that are not.


If the software runs fine on several indentical systems and fails on a 
single system, I would concentrate on what make that failing site different 
because it has to be different. Fist check the service pack level. I suggest

first to verify that no malware is intercepting winsock calls. This is done 
by malware to capture trafic. Then, I would check if any suspect LSP is not 
installed on the system. Also check if some security products are not 
interfering with winsock: they frequently intercept winsock calls to block 
some kind of trafic. Those security products could be buggy.

> My analysis of the problem to date suggests that an OnClientConnect is
> firing but the passed Client object is incomplete or invalid. The code for
> the OnClientConnect event does not check the ErrorCode and accepts the
> connection but traffic appears not to flow correctly between client and
> server.

I suggest checking the error code and reporting it into the logile for 
analisys.

> if I run
> NetStat on the server it appears a windows socket object is left in 
> FIN-WAIT
> 1 or FIN-WAIT2 state. Eventually the system fails as all windows socket
> objects are expended and there is a catastrophic failure of the software
> and/or server.

> the steps that should be taken when an error does occur to ensure that
> the windows sockets are correctly 'cleaned
> up' and released back to the Operating System ?

FIN-WAIT-1 and FIN-WAIT-2 means the orderly shutdown sequence is occuring 
but the remote site do not answer (Have a look here: 
http://www.tcpipguide.com/free/t_TCPConnectionTermination-2.htm). An orderly

shutdown is a multiple steps sequence between client and server. What is 
strange here is that FIN-WAIT-1 and FIN-WAIT-2 states are client side 
states, not server side. So it is possible that the socket you see in that 
sate are NOT the one failing. Maybe something else is failing (maybe in the 
same software) causing those sockets to be in those states and consume all 
available sockets which cause trouble in the software for accepting a new 
connection because accepting a new connection means creating a new socket.

So I see the possibility that some other software or another part of your 
software has an issue with /client/ connection close, this result in a lot 
of sockets in the FIN-WAIT-1 or FIN-WAIT-2 state, consuming all available 
socket and making new connection acceptance failure.

Why those client connexions could have problems with their server not 
answering ? This could be cause by malware sending forget IP packets to 
break existing connection or a misconfiguring security product (firewall) 
dropping packets, or simply an overloaded network segment which is dropping 
packets because trafic is too high. An overloaded layer 2 switch may simply 
drop packets when is it not able to switch the packets fast enough.

--
francois.pie...@overbyte.be
The author of the freeware multi-tier middleware MidWare
The author of the freeware Internet Component Suite (ICS)
http://www.overbyte.be

--
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be

--
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be

Reply via email to