How's your audit situation working out? I've got a similar problem. Here's a sampling of the messages I get:
ANR9999D dfbackup.c(2044): ThreadId<79> Error 145 getting volume name for volId 9 ANR9999D dsalloc.c(1979): ThreadId<57> Error 1 opening bit vector DSKV0000000009 for volid 9. ANR9999D bfcreate.c(441): ThreadId<57> Bitfile erase prohibited - transaction failed. ANR9999D dfaudit.c(3816): ThreadId<0> AUDITDB: Storage volume with internal id 9 does not exist, but bitfile 0.213167917 is stored on it - The volume cannot be re-created. ANR9999D dsaudit.c(3054): ThreadId<0> AUDITDB: Error 1 opening bit vector object DSKV0000000009.
It started about March. I did an AUD VOL on the stgpool volumes involved and it seemed to go away. Last month it was back with a vengence. I've been working with Level 2 on one of the undocumented options of DSMSERV AUDITDB. The first time, it ended in a reasonable length of time, but with a message about the LOG being in ROLLFORWARD. That turned out to be a not very informative way of saying the audit didn't run. Since then I've run it with log wet to NORMAL and it dies, even with FIX=YES.
So even if I could bring down a production machine for a week to run a full audit, it may not work. Now what I have is enough internals on the DB to enable me to write a program to extract the identity of DSKV0..09 by working my way thru a large three or four level tree.
My real question, rhetorical of course, is when is TIVOLI going to supply us with a tool kit for identifying and working thru DB problems? Why do I have to write some code to identify an object in the DB? Why are there no functions available to do this?
Wouldn't it be nice if V5R3 or V6R1, whichever comes first, were devoted to the care and feeding of the TSM DB.
My $.02 anyway.
At 09:27 AM 5/27/2003 -0400, you wrote:
I've been plagued by a few problems when deleting accounts. So far, it seems like Win2K or WinXP clients (SYSTEM OBJECTs are involved again!) are prone to this error:
05/27/2003 09:00:56 ANR2017I Administrator XXXXXX issued command: DELETE FILESPACE ZZZZZZ * 05/27/2003 09:00:56 ANR0984I Process 185 for DELETE FILESPACE started in the BACKGROUND at 09:00:56. 05/27/2003 09:00:56 ANR0800I DELETE FILESPACE * (fsId=6) for node ZZZZZZ started as process 185. 05/27/2003 09:00:56 ANR0802I DELETE FILESPACE * (fsId=6) (backup/archive data) for node ZZZZZZ started. 05/27/2003 09:00:57 ANR0104E imutil.c(7761): Error 2 deleting row from table "Expiring.Objects". 05/27/2003 09:00:57 ANR9999D imfsdel.c(1872): ThreadId<25> Error 19 deleting group leader 0 176658713.
I've tried a number of things - renaming the filespace, moving the node data and then auditing the tape, deleting the filespace specifically - but it's really a database 'corruption' and can only be fixed by an audit (per support).
Over the course of the last two weeks, I recovered this database to a test server and ran an audit. Here are the pertinent stats for your reference:
Server: H80, 4 way, 2 GB, AIX 4.3.3 TSM: v5.1.6.4 DB size: 179,544 MB - 74.4% full Log size: 13,280 MB Audit command: dsmserv auditdb fix=yes Audit start: 5/19 09:05 Audit end: 5/25 19:15 Number of database entries: Processed 1050073565 database entries (cumulative). Elapsed time: 6 days 10 hours 10 minutes
The audit was successful and did allow me to delete the problem node. However, there really should be a way to go after the offending entry and blast it (under adult supervision, of course!). I'm not really going to be able to justify a down time of 7 days just to clean up an account. It's now happened again on another server, so I will have to do this test again to get a good estimate of the down time required to clean that server up.
I've pushed these accounts 'aside' by renaming them and changing the contact info, but the clients would really like me to remove the data (legal reasons). Having errors like this makes me wonder what else is going on in the database.
Gretchen Thiele Princeton University