We too have just started to have this problem in the last 4 days. In our case the symptoms and solutions seem to fit in with what's described in IBM Document Ref #: PK00196. However that was to have been fixed with 5.3.1 release which we are using. Can anyone shed more light on what might be triggering this situation? AIX 5.2 ML5 TSM 5.3.1.0
Here's a series of errors that cropped up this week for the first time. Any insights would be helpful. 02/27/06 21:59:00 ANR9999D imgroup.c(1180): ThreadId<90> Error 8 retrieving Backup Objects row for object 0.101495737 (SESSION: 2838) 02/27/06 21:59:00 ANR9999D ThreadId<90> issued message 9999 from: <-0x000000010001bf74 outDiagf <-0x00000001003fb114 imIsGroupLeader <-0x0000000100396b9c SmNodeSession <-0x000000010047f854 HandleNodeSession <-0x0000000100485760 smExecuteSession <-0x000000010051c3e4 SessionThread <-0x000000010000e958 StartThread <-0x0900000000286460 _pthread_body (SESSION: 2838) 02/27/06 21:59:00 ANR9999D smnode.c(7353): ThreadId<90> Session 2838: Invalid Group Id 0,101495737 for ADD function (SESSION: 2838) 02/27/06 21:59:00 ANR9999D ThreadId<90> issued message 9999 from: <-0x000000010001bf74 outDiagf <-0x0000000100396bc4 SmNodeSession <-0x000000010047f854 HandleNodeSession <-0x0000000100485760 smExecuteSession <-0x000000010051c3e4 SessionThread <-0x000000010000e958 StartThread <-0x0900000000286460 _pthread_body (SESSION: 2838) 02/28/06 23:24:55 ANR9999D lmlcaud.c(506): ThreadId<75> Error 17 checking filespace data for license audit. (PROCESS: 72) 02/28/06 23:24:55 ANR9999D ThreadId<75> issued message 9999 from: <-0x000000010001bf74 outDiagf <-0x00000001006d8e70 LmLcAuditThread <-0x000000010000e958 StartThread <-0x0900000000286460 _pthread_body (PROCESS: 72) 03/01/06 11:20:55 ANR9999D lmlcaud.c(506): ThreadId<43> Error 17 checking filespace data for license audit. (PROCESS: 79) 03/01/06 11:20:55 ANR9999D ThreadId<43> issued message 9999 from: <-0x000000010001bf74 outDiagf <-0x00000001006d8e70 LmLcAuditThread <-0x000000010000e958 StartThread <-0x0900000000286460 _pthread_body (PROCESS: 79) 03/03/06 03:41:10 ANR9999D lmlcaud.c(506): ThreadId<51> Error 17 checking filespace data for license audit. (PROCESS: 29) 03/03/06 03:41:10 ANR9999D ThreadId<51> issued message 9999 from: <-0x000000010001bf74 outDiagf <-0x00000001006d8e70 LmLcAuditThread <-0x000000010000e958 StartThread <-0x0900000000286460 _pthread_body (PROCESS: 29) In each case we need to halt and restart the TSM server to free up the locks. Finding slack time to do that is not always easy. "Ochs, Duane" <[EMAIL PROTECTED]> Sent by: "ADSM: Dist Stor Manager" <ADSM-L@VM.MARIST.EDU> 01/30/2006 12:44 PM Please respond to "ADSM: Dist Stor Manager" <ADSM-L@VM.MARIST.EDU> To ADSM-L@VM.MARIST.EDU cc Subject [ADSM-L] dsmserv process hung. AIX 5.3 TSM 5.3.1.2 This weekend one of my three TSM servers had the DSMSERV process hang. The machine was accessible, the DSMSERV process still existed. It was still accepting connections but not talking to them. In turn our cross server backups and volume reconciliation hung from the the other 2 TSM servers. One server ended up crashing due to a full recovery log. The other was near that same point. Looks like the root cause was a full recovery log on the hung server. I monitor to see if DSMSERV exists, I monitor for backup and archive failures. I use operational reporting to give me additional information for clients. I even monitor to make sure the client scheduler is running and communicating. Does anybody have a method in place or an idea to monitor if the TSM server is actually capable of communication ? Duane Ochs Information Systems - Enterprise Computing Quad/Graphics Inc. Sussex, Wisconsin 414-566-2375 phone 414-566-4010 pin# 2375 beeper [EMAIL PROTECTED] www.QG.com <outbind://8/www.QG.com>