(Sorry for the length of this, but this is complicated. Don't read it unless you are getting ANR9999D on reclaims):
ANR9999 is a generic bucket message for anything that doesn't have its own error message number. so you can get ANR9999D for lots of reasons. The original post had this information: ANR9999D ssrecons.c(2405): ThreadId<50> Expected: Magic=53454652, SrvId=0, SegGroupId=5855761. Whenever I have gotten ANR9999D on a RECLAIM with BOTH keywords: ssrecons.s AND Magic=, I have found there really are a few backup files that are permanently damaged. When I get this error, if I run an AUDIT against the tape, it finds the damaged files. (You have to be at 4.1.3 or above for the AUDIT to find it, and if the error is from an offsite reclamation, you should run the AUDIT against the primary pool tape, not the tape that was being reclaimed.) If the files are damaged, I can run a RESTORE VOLUME to recreate the primary files from the offsite pool. The RESTORE VOLUME will complete OK, but the files are still damaged, and the resulting new tape can't be reclaimed, either. I end up deleting the remaining files on the tape to clear the problem. (Clearly there is a bug in RESTORE VOLUME.) If you try the RESTORE VOLUME, check the results by RUNNING AN AUDIT ON THE RESULTING TAPE. If it still says the tapes are damaged, those files are toast. I have also found that MOVE DATA (at least without RECONSTRUCT) will move the files and complete with no complaints, but the files will still be bad, and the resulting tapes will still not reclaim completely. So I assume the problem is only detected when TSM is reconstructing an aggregate. If this error occurs on reclaim of a primary volume, you will find the volume reclaims down to just a few damaged files. You can Q CONTENT for the volume to see what is damaged, then DELETE with DISCARDDATA. (As ALWAYS - NEVER RUN DELETE with DISCARDDATA until you are DARN SURE YOU KNOW WHAT YOU ARE DOING.) If the error occurs on reclaim of a copy pool volume, it is harder to decide what to do. The copy pool volume will also reclaim down to just a couple of files, the AUDIT will show damaged data on the primary tape, but the primary tape may still have a zillion good files on it. (DON"T PANIC and think that whole volume is bad, it probably isn't.) I usually get the file names from the AUDIT output, and DELETE the copy pool tape, knowing that the problem may surface again when the primary tape reclaims. In our case, every time I had this error occur, I have been able track it back (I think) to a DB restore we did 2 years ago. I also saw a couple errors like this when we had some intermittent hard disk errors caused some damage in my disk pool. DB restores and hardware errors are both legitimate reasons for data damage. I have NO REASON to believe there is anything out there CREATING these errors for NEW BACKUPS, I don't think there is any integrity problem, except the bug in RESTORE VOLUME. In our case, when we did the DB restore 2 years ago we lost about 24 hours of backup data. But also reclaim had been running, and since my primary pool is collocated, a lot of tapes had been touched. We did an AUDIT on EVERY tape that was touched, but that was TSM 3.7 and AUDIT at TSM 3.7 didn't catch the problem (there were apparently significant AUDIT improvements at 4.1). The ANR9999D errors started surfacing months later when the tapes went through reclaim. We ran RESTORE VOLUME on them to rebuild the data from the offsite pool. RESTORE VOLUME completed OK, and I assumed the problem was fixed. ONLY LATER (like a year later) when THOSE tapes started reclaiming did the ANR9999D errors surface again, and I realized that RESTORE VOLUME was faulty. But every time one of these errors surfaces (I have had 5 in the last 6 months) I check the files that won't reclaim against the BACKUPS table, and find that they are small numbers of old backups that were probably caught in our DB debacle. SO again, this is a case of the reclaim catching OLD problems. I have NO REASON to believe there is anything out there CREATING these errors for new files. I have found NO DOCUMENTATION ANYWHERE that explains this problem. All this is stuff I've done on my own, and from which I have drawn my own conclusions. Use the information if it helps you, but your situation may be different. Whew. Time for lunch. ************************************************************************ Wanda Prather The Johns Hopkins Applied Physics Lab 443-778-8769 [EMAIL PROTECTED] "Intelligence has much less practical application than you'd think" - Scott Adams/Dilbert ************************************************************************ -----Original Message----- From: Seay, Paul [mailto:[EMAIL PROTECTED]] Sent: Thursday, June 06, 2002 3:22 AM To: [EMAIL PROTECTED] Subject: Re: ANR9999D Error message: Do you how to figure out this message ? Since we installed 4.2.1.15 server we have seen a lot of these lately. We have a PMR open. Paul D. Seay, Jr. Technical Specialist Naptheon, INC 757-688-8180 -----Original Message----- From: Adam Rowe [mailto:[EMAIL PROTECTED]] Sent: Friday, May 31, 2002 8:09 AM To: [EMAIL PROTECTED] Subject: Re: ANR9999D Error message: Do you how to figure out this message? Received this information from the Tivoli Error Message Manager.pdf Improved ANR9999D Messages By setting message context reporting to ON, you get additional information when the server issues ANR9999D messages. The additional information can help to identify problem causes. See the SET CONTEXTMESSAGING command in Administrator's Reference. Also see Messages. Adam Y. Rowe St. Rita's Medical Center |---------+----------------------------> | | Zlatko Krastev | | | <[EMAIL PROTECTED]| | | ET> | | | Sent by: "ADSM: | | | Dist Stor | | | Manager" | | | <[EMAIL PROTECTED]| | | .EDU> | | | | | | | | | 05/30/2002 07:35 | | | PM | | | Please respond to| | | "ADSM: Dist Stor | | | Manager" | | | | |---------+----------------------------> >--------------------------------------------------------------------------- -----------------------------------------------------------------------| | | | To: [EMAIL PROTECTED] | | cc: | | Subject: Re: ANR9999D Error message: Do you how to figure out this message? | >--------------------------------------------------------------------------- -----------------------------------------------------------------------| ANR9999D is generic error message handling unknown situations. Usually such problems can be resolved only by IBM support due to their rare occurences. The large output is tracing information to help support and developers to analyse the root cause. The only hint I can give you is to perform AUDIT DB but it may show nothing useful. Zlatko Krastev IT Consultant Please respond to "ADSM: Dist Stor Manager" <[EMAIL PROTECTED]> Sent by: "ADSM: Dist Stor Manager" <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] cc: Subject: ANR9999D Error message: Do you how to figure out this message? --------------------------------------- O/S: SunOS 5.6 TSM: Version 4, Release 2, Level 1.15 Tape library: scalar 1000 from ADIC Tape drive: AIT2 drive Tape : AIT2 (50GB~100GB) --------------------------------------- Dear All, Situation we face I am wondering where I can get more detailed information about ANR9999D error message. Actually we have been suffering from so many problems including robot, drive problems and so on. Also we've got many times "ANR8300E Changer Failure" error msg and then TSM just died...T_T But funny thing is there is no tape stuck in the drive even though the library LCD screen says for example "Element 1212(drive12) is obstructed". Last night we had the same problem again and I tried to update the drive12 onlin=no then all the other drives are working fine no more changer failure since the last reboot. So I think this error msg is misled by something else. Please if you had faced similar situation before please get back to me. Questions As you see the log below, I found something going wrong with space reclamation on c3(copy pool for offsite). As soon as I did "update stg c3 recl=60" from recl=100( basically no space reclamation), I've got the ANR999D error message. This error msg just never stopped!! Scared and put it back. Then I tried it again after "set CONTEXTMESSAGING on" to get details. You can see the red marked detailed information about ANR999D msg below but simply I don't know what it means. Can you let me know good reference or any documents? Thanks, Meensun from Tokyo ---------------------------------------------------------------------------- --------------------- Activity Log ---------------------------------------------------------------------------- --------------------- 05/30/02 16:06:35 ANR2017I Administrator AHNME issued command: UPDATE STGPOOL c3 recl=60 05/30/02 16:06:35 ANR2202I Storage pool C3 updated. 05/30/02 16:06:35 ANR0984I Process 36 for SPACE RECLAMATION started in the BACKGROUND at 16:06:35. 05/30/02 16:06:36 ANR1040I Space reclamation started for volume 005086, storage pool C3 (process number 36). .. .. .. 05/30/02 16:06:36 ANR1040I Space reclamation started for volume 005691, storage pool C3 (process number 36). 05/30/02 16:06:36 ANR1040I Space reclamation started for volume 005973, storage pool C3 (process number 36). 05/30/02 16:06:36 ANR9999D ssrecons.c(2391): ThreadId<50> Actual: Magic=80B93C01, SrvId=-219899260, SegGroupId=14301299977853604355, SeqNum=2031456424, converted=T. 05/30/02 16:06:36 ANR9999D ssrecons.c(2405): ThreadId<50> Expected: Magic=53454652, SrvId=0, SegGroupId=5855761. SeqNum=80. 05/30/02 16:06:38 (50) Context report 05/30/02 16:06:38 (50) SsAuxReconSrcThread : ANR9999D calling thread 05/30/02 16:06:38 (50) Generating TM Context Report: (struct=tmTxnDesc) (slots=256) 05/30/02 16:06:38 (50) *** no transactions found *** 05/30/02 16:06:38 (50) Generating Database Transaction Table Context: 05/30/02 16:06:38 (50) Tsn=0:37757505 --> Valid=1, inRollback=0, endNTA=0, State=2, Index=7, LatchCount=0, SavePoint=0, TotLogRecs=0, TotLogBytes=0, UndoLogRecs=0, UndoLogBytes=0, LogReserve=0, PageReserve=0, Elapsed=230 (ms), MinLsn=0.0.0, MaxLsn=0.0.0, LastLsn=0.0.0, UndoNextLsn=0.0.0, logWriter=False, backupTxn=False 05/30/02 16:06:38 (50) Open objects: 05/30/02 16:06:38 (50) name ->SS.Volume.Names<- (sp=0) 05/30/02 16:06:38 (50) *** no transactions found *** 05/30/02 16:06:38 (50) Generating SM Context Report: 05/30/02 16:06:38 (50) *** no sessions found *** 05/30/02 16:06:38 (50) Generating AS Vol Context Report: 05/30/02 16:06:38 (50) No mounted (or mount in progress) volumes. 05/30/02 16:06:38 (50) Generating ssSession Context Report: 05/30/02 16:06:38 (50) No storage service sessions active. 05/30/02 16:06:38 (50) Generating ssOpenSeg Context Report: 05/30/02 16:06:38 (50) Storage Service Segments: 05/30/02 16:06:38 (50) VolId=1250, Start=64757, Offset=0, SessId=612, Locked=False, Deallocated=False 05/30/02 16:06:38 (50) Generating BF Copy Control Context Report: 05/30/02 16:06:38 (50) No global copy control blocks. 05/30/02 16:06:38 05/30/02 16:06:38 (50) End Context report 05/30/02 16:06:38 (50) Context report 05/30/02 16:06:38 (50) SsAuxReconSrcThread : ANR9999D calling thread 05/30/02 16:06:38 (50) Generating TM Context Report: (struct=tmTxnDesc) (slots=256) 05/30/02 16:06:38 (50) *** no transactions found *** 05/30/02 16:06:38 (50) Generating Database Transaction Table Context: 05/30/02 16:06:38 (50) Tsn=0:37757505 --> Valid=1, inRollback=0, endNTA=0, State=2, Index=7, LatchCount=0, SavePoint=0, TotLogRecs=0, TotLogBytes=0, UndoLogRecs=0, UndoLogBytes=0, LogReserve=0, PageReserve=0, Elapsed=254 (ms), MinLsn=0.0.0, MaxLsn=0.0.0, LastLsn=0.0.0, UndoNextLsn=0.0.0, logWriter=False, backupTxn=False 05/30/02 16:06:38 (50) Open objects: 05/30/02 16:06:38 (50) name ->SS.Volume.Names<- (sp=0) 05/30/02 16:06:38 (50) *** no transactions found *** 05/30/02 16:06:38 (50) Generating SM Context Report: 05/30/02 16:06:38 (50) *** no sessions found *** 05/30/02 16:06:38 (50) Generating AS Vol Context Report: 05/30/02 16:06:38 (50) No mounted (or mount in progress) volumes. 05/30/02 16:06:38 (50) Generating ssSession Context Report: 05/30/02 16:06:38 (50) No storage service sessions active. 05/30/02 16:06:39 (50) Generating ssOpenSeg Context Report: 05/30/02 16:06:39 (50) Storage Service Segments: 05/30/02 16:06:39 (50) VolId=1250, Start=64757, Offset=0, SessId=612, Locked=False, Deallocated=False 05/30/02 16:06:39 (50) Generating BF Copy Control Context Report: 05/30/02 16:06:39 (50) No global copy control blocks. 05/30/02 16:06:39 05/30/02 16:06:39 (50) End Context report 05/30/02 16:06:39 ANR2017I Administrator AHNME issued command: UPDATE STGPOOL c3 recl=100 05/30/02 16:06:39 ANR2202I Storage pool C3 updated. No more ANR9999D messages....since reclamation threshold=100 -- This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden. .