(Sorry for the length of this, but this is complicated.  Don't read it
unless you are getting ANR9999D on reclaims):

ANR9999 is a generic bucket message for anything that doesn't have its own
error message number. so you can get ANR9999D for lots of reasons.

The original post had this information:

  ANR9999D ssrecons.c(2405): ThreadId<50> Expected:
                          Magic=53454652, SrvId=0, SegGroupId=5855761.

Whenever I have gotten ANR9999D on a RECLAIM with BOTH keywords: ssrecons.s
AND Magic=, I have found there really are a few backup files that are
permanently damaged.

When I get this error, if I run an AUDIT against the tape, it finds the
damaged files. (You have to be at 4.1.3 or above for the AUDIT to find it,
and if the error is from an offsite reclamation, you should run the AUDIT
against the primary pool tape, not the tape that was being reclaimed.)

If the files are damaged, I can run a RESTORE VOLUME to recreate the primary
files from the offsite pool.  The RESTORE VOLUME will complete OK, but the
files are still damaged, and the resulting new tape can't be reclaimed,
either.   I end up deleting the remaining files on the tape to clear the
problem.  (Clearly there is a bug in RESTORE VOLUME.)

If you try the RESTORE VOLUME, check the results by RUNNING AN AUDIT ON THE
RESULTING TAPE.  If it still says the tapes are damaged, those files are
toast.

I have also found that MOVE DATA (at least without RECONSTRUCT) will move
the files and complete with no complaints, but the files will still be bad,
and the resulting tapes will still not reclaim completely.  So I assume the
problem is only detected when TSM is reconstructing an aggregate.

If this error occurs on reclaim of a primary volume, you will find the
volume reclaims down to just a few damaged files.  You can Q CONTENT for the
volume to see what is damaged, then DELETE with DISCARDDATA.  (As ALWAYS -
NEVER RUN DELETE with DISCARDDATA until you are DARN SURE YOU KNOW WHAT YOU
ARE DOING.)

If the error occurs on reclaim of a copy pool volume, it is harder to decide
what to do.  The copy pool volume will also reclaim down to just a couple of
files, the AUDIT will show damaged data on the primary tape, but the primary
tape may still have a zillion good files on it.  (DON"T PANIC and think that
whole volume is bad, it probably isn't.)  I usually get the file names from
the AUDIT output, and DELETE the copy pool tape, knowing that the problem
may surface again when the primary tape reclaims.

In our case, every time I had this error occur, I have been able track it
back (I think) to a DB restore we did 2 years ago. I also saw a couple
errors like this when we had some intermittent hard disk errors caused some
damage in my disk pool.  DB restores and hardware errors are both legitimate
reasons for data damage.  I have NO REASON to believe there is anything out
there CREATING these errors for NEW BACKUPS, I don't think there is any
integrity problem, except the bug in RESTORE VOLUME.

In our case, when we did the DB restore 2 years ago we lost about 24 hours
of backup data.  But also reclaim had been running, and since my primary
pool is collocated, a lot of tapes had been touched.  We did an AUDIT on
EVERY tape that was touched, but that was TSM 3.7 and AUDIT at TSM 3.7
didn't catch the problem (there were apparently significant AUDIT
improvements at 4.1).  The ANR9999D errors started surfacing months later
when the tapes went through reclaim.  We ran RESTORE VOLUME on them to
rebuild the data from the offsite pool.  RESTORE VOLUME completed OK, and I
assumed the problem was fixed.  ONLY LATER (like a year later) when THOSE
tapes started reclaiming did the ANR9999D errors surface again, and I
realized that RESTORE VOLUME was faulty.  But every time one of these errors
surfaces (I have had 5 in the last 6 months) I check the files that won't
reclaim against the BACKUPS table, and find that they are small numbers of
old backups that were probably caught in our DB debacle.  SO again, this is
a case of the reclaim catching OLD problems.  I have NO REASON to believe
there is anything out there CREATING these errors for new files.

I have found NO DOCUMENTATION ANYWHERE that explains this problem.  All this
is stuff I've done on my own, and from which I have drawn my own
conclusions.   Use the information if it helps you, but your situation may
be different.

Whew.  Time for lunch.
************************************************************************
Wanda Prather
The Johns Hopkins Applied Physics Lab
443-778-8769
[EMAIL PROTECTED]

"Intelligence has much less practical application than you'd think" -
Scott Adams/Dilbert
************************************************************************







-----Original Message-----
From: Seay, Paul [mailto:[EMAIL PROTECTED]]
Sent: Thursday, June 06, 2002 3:22 AM
To: [EMAIL PROTECTED]
Subject: Re: ANR9999D Error message: Do you how to figure out this
message ?


Since we installed 4.2.1.15 server we have seen a lot of these lately.  We
have a PMR open.

Paul D. Seay, Jr.
Technical Specialist
Naptheon, INC
757-688-8180


-----Original Message-----
From: Adam Rowe [mailto:[EMAIL PROTECTED]]
Sent: Friday, May 31, 2002 8:09 AM
To: [EMAIL PROTECTED]
Subject: Re: ANR9999D Error message: Do you how to figure out this message?


Received this information from the Tivoli Error Message Manager.pdf

Improved ANR9999D Messages
By setting message context reporting to ON, you get additional information
when the server issues ANR9999D messages. The additional information can
help to identify problem causes. See the SET CONTEXTMESSAGING command in
Administrator's Reference. Also see Messages.

Adam Y. Rowe
St. Rita's Medical Center





|---------+---------------------------->
|         |           Zlatko Krastev   |
|         |           <[EMAIL PROTECTED]|
|         |           ET>              |
|         |           Sent by: "ADSM:  |
|         |           Dist Stor        |
|         |           Manager"         |
|         |           <[EMAIL PROTECTED]|
|         |           .EDU>            |
|         |                            |
|         |                            |
|         |           05/30/2002 07:35 |
|         |           PM               |
|         |           Please respond to|
|         |           "ADSM: Dist Stor |
|         |           Manager"         |
|         |                            |
|---------+---------------------------->

>---------------------------------------------------------------------------
-----------------------------------------------------------------------|
  |
|
  |       To:       [EMAIL PROTECTED]
|
  |       cc:
|
  |       Subject:  Re: ANR9999D Error message: Do you how to figure out
this message?
|

>---------------------------------------------------------------------------
-----------------------------------------------------------------------|




ANR9999D is generic error message handling unknown situations. Usually such
problems can be resolved only by IBM support due to their rare occurences.
The large output is tracing information to help support and developers to
analyse the root cause. The only hint I can give you is to perform AUDIT DB
but it may show nothing useful.

Zlatko Krastev
IT Consultant




Please respond to "ADSM: Dist Stor Manager" <[EMAIL PROTECTED]>
Sent by:        "ADSM: Dist Stor Manager" <[EMAIL PROTECTED]>
To:     [EMAIL PROTECTED]
cc:

Subject:        ANR9999D Error message: Do you how to figure out this
message?

---------------------------------------
O/S: SunOS 5.6
TSM:  Version 4, Release 2, Level 1.15
Tape library: scalar 1000 from ADIC
Tape drive: AIT2 drive
Tape : AIT2 (50GB~100GB)
---------------------------------------


Dear All,

Situation we face
I am wondering where I can get more detailed information about ANR9999D
error message. Actually we have been suffering from so many problems
including robot, drive problems and so on. Also we've got many times
"ANR8300E Changer Failure" error msg and then TSM just died...T_T But funny
thing is there is no tape stuck in the drive even though the library LCD
screen says for example "Element 1212(drive12) is obstructed". Last night we
had the same problem again and I tried to update the drive12 onlin=no then
all the other drives are working fine no more changer failure since the last
reboot. So I think this error msg is misled by something else. Please if you
had faced similar situation before please get back to me.

Questions
As you see the log below, I found something going wrong with space
reclamation on c3(copy pool for offsite). As soon as I did "update stg c3
recl=60" from recl=100( basically no space reclamation), I've got the
ANR999D error message. This error msg just never stopped!! Scared and put it
back. Then I tried it again after "set CONTEXTMESSAGING on" to get details.
You can see the red marked detailed information about ANR999D msg below but
simply I don't know what it means. Can you let me know good reference or any
documents?

Thanks,
Meensun from Tokyo



----------------------------------------------------------------------------
---------------------

Activity Log
----------------------------------------------------------------------------
---------------------

05/30/02   16:06:35      ANR2017I Administrator AHNME issued command:
UPDATE STGPOOL c3 recl=60
05/30/02   16:06:35      ANR2202I Storage pool C3 updated.

05/30/02   16:06:35      ANR0984I Process 36 for SPACE RECLAMATION started
in the
                          BACKGROUND at 16:06:35.
05/30/02   16:06:36      ANR1040I Space reclamation started for volume
005086,
                          storage pool C3 (process number 36). .. .. ..

05/30/02   16:06:36      ANR1040I Space reclamation started for volume
005691,
                          storage pool C3 (process number 36).
05/30/02   16:06:36      ANR1040I Space reclamation started for volume
005973,
                          storage pool C3 (process number 36).

05/30/02   16:06:36      ANR9999D ssrecons.c(2391): ThreadId<50> Actual:
                          Magic=80B93C01, SrvId=-219899260,
                          SegGroupId=14301299977853604355,
SeqNum=2031456424,
                          converted=T.
05/30/02   16:06:36      ANR9999D ssrecons.c(2405): ThreadId<50> Expected:
                          Magic=53454652, SrvId=0, SegGroupId=5855761.
SeqNum=80.
05/30/02   16:06:38      (50) Context report
05/30/02   16:06:38      (50) SsAuxReconSrcThread : ANR9999D calling
thread
05/30/02   16:06:38      (50) Generating TM Context Report:
(struct=tmTxnDesc)
                          (slots=256)
05/30/02   16:06:38      (50)  *** no transactions found ***
05/30/02   16:06:38      (50) Generating Database Transaction Table
Context:
05/30/02   16:06:38      (50) Tsn=0:37757505 --> Valid=1, inRollback=0,
endNTA=0,
                          State=2, Index=7, LatchCount=0, SavePoint=0,
                          TotLogRecs=0, TotLogBytes=0, UndoLogRecs=0,
                          UndoLogBytes=0, LogReserve=0, PageReserve=0,
Elapsed=230
                          (ms), MinLsn=0.0.0, MaxLsn=0.0.0, LastLsn=0.0.0,
                          UndoNextLsn=0.0.0, logWriter=False,
backupTxn=False
05/30/02   16:06:38      (50)  Open objects:
05/30/02   16:06:38      (50)    name ->SS.Volume.Names<- (sp=0)
05/30/02   16:06:38      (50)  *** no transactions found ***
05/30/02   16:06:38      (50) Generating SM Context Report:
05/30/02   16:06:38      (50)  *** no sessions found ***
05/30/02   16:06:38      (50) Generating AS Vol Context Report:
05/30/02   16:06:38      (50)  No mounted (or mount in progress) volumes.
05/30/02   16:06:38      (50) Generating ssSession Context Report:
05/30/02   16:06:38      (50)  No storage service sessions active.
05/30/02   16:06:38      (50) Generating ssOpenSeg Context Report:
05/30/02   16:06:38      (50)  Storage Service Segments:
05/30/02   16:06:38      (50)  VolId=1250, Start=64757, Offset=0,
SessId=612,
                          Locked=False, Deallocated=False
05/30/02   16:06:38      (50) Generating BF Copy Control Context Report:
05/30/02   16:06:38      (50)  No global copy control blocks.
05/30/02   16:06:38
05/30/02   16:06:38      (50) End Context report
05/30/02   16:06:38      (50) Context report
05/30/02   16:06:38      (50) SsAuxReconSrcThread : ANR9999D calling
thread
05/30/02   16:06:38      (50) Generating TM Context Report:
(struct=tmTxnDesc)
                          (slots=256)
05/30/02   16:06:38      (50)  *** no transactions found ***
05/30/02   16:06:38      (50) Generating Database Transaction Table
Context:
05/30/02   16:06:38      (50) Tsn=0:37757505 --> Valid=1, inRollback=0,
endNTA=0,
                          State=2, Index=7, LatchCount=0, SavePoint=0,
                          TotLogRecs=0, TotLogBytes=0, UndoLogRecs=0,
                          UndoLogBytes=0, LogReserve=0, PageReserve=0,
Elapsed=254
                          (ms), MinLsn=0.0.0, MaxLsn=0.0.0, LastLsn=0.0.0,
                          UndoNextLsn=0.0.0, logWriter=False,
backupTxn=False
05/30/02   16:06:38      (50)  Open objects:
05/30/02   16:06:38      (50)    name ->SS.Volume.Names<- (sp=0)
05/30/02   16:06:38      (50)  *** no transactions found ***
05/30/02   16:06:38      (50) Generating SM Context Report:
05/30/02   16:06:38      (50)  *** no sessions found ***
05/30/02   16:06:38      (50) Generating AS Vol Context Report:
05/30/02   16:06:38      (50)  No mounted (or mount in progress) volumes.
05/30/02   16:06:38      (50) Generating ssSession Context Report:
05/30/02   16:06:38      (50)  No storage service sessions active.
05/30/02   16:06:39      (50) Generating ssOpenSeg Context Report:
05/30/02   16:06:39      (50)  Storage Service Segments:
05/30/02   16:06:39      (50)  VolId=1250, Start=64757, Offset=0,
SessId=612,
                          Locked=False, Deallocated=False
05/30/02   16:06:39      (50) Generating BF Copy Control Context Report:
05/30/02   16:06:39      (50)  No global copy control blocks.
05/30/02   16:06:39
05/30/02   16:06:39      (50) End Context report
05/30/02   16:06:39      ANR2017I Administrator AHNME issued command:
UPDATE
                          STGPOOL c3 recl=100
05/30/02   16:06:39      ANR2202I Storage pool C3 updated.

No more ANR9999D messages....since reclamation threshold=100


--

This e-mail may contain confidential and/or privileged information. If you
are not the intended recipient (or have received this e-mail in error)
please notify the sender immediately and destroy this e-mail. Any
unauthorized copying, disclosure or distribution of the material in this
e-mail is strictly forbidden.


.

Reply via email to