On Jan 30, 2006, at 3:44 PM, Ochs, Duane wrote:
AIX 5.3 TSM 5.3.1.2 This weekend one of my three TSM servers had the DSMSERV process hang. The machine was accessible, the DSMSERV process still existed. It was still accepting connections but not talking to them. ...
Duane - One cause of a problem of this type is a thread failure; some key thread fails, while the rest of the process lives on, but rather crippled. There should in any case be evidence in your Activity Log, typically an ANR9999 message. Where a thread failure has occurred, there will likely be a dsmserv.err file in the server directory giving details.
Does anybody have a method in place or an idea to monitor if the TSM server is actually capable of communication ?
The most standardized method is to test the responsiveness of the TSM server's Web admin port (usually, 1580). Various HTTP-based packages can be used to do this. Here is a fragment from execution of an HTTP prober which I wrote, to illustrate: http_check: Connected to HTTP server. Now sending data... http_check: Request 'GET / HTTP/1.1^M^JHost: ourhost.bu.edu^M^J^M^J' has been sent to HTTP server '1111.222.333.444'. Now awaiting reply... http_check: Response took 0.009691 seconds to arrive. http_check: Received 2907 bytes of data from HTTP server: 'HTTP/1.0 200 OK Server: ADSM_HTTP/0.1 Content-type: text/html <HEAD> <TITLE> Server Administration </TITLE> ... Or you could run a TSM consolemode perl command, for example, to follow the Activity Log and call out any irregularities. Richard Sims