v6.3.5 hung db2??
Two days ago we upgrade one of our TSM instances to v6.3.5 (from v6.3.4). This is our first v6.3.5 instance. It runs on a AIX server. Last night at 19:32 it looks like DB2 went into some kind of a loop. The instance became unresponsive. Dsmadmc cmds hung (didn't error, just hung). Dsmserv process was getting almost no cpu, while ds2sync was running the box At 65-70% but had no disk I/O. I killed dsmserv, but db2 didn't go down. I tried db2stop but it did nothing. Finally rebooted to get everything up. The actlog shows no nasty errors. Just wondering if anyone else has had a runaway db2. Thanks Rick - The information contained in this message is intended only for the personal and confidential use of the recipient(s) named above. If the reader of this message is not the intended recipient or an agent responsible for delivering it to the intended recipient, you are hereby notified that you have received this document in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify us immediately, and delete the original message.
FW: v6.3.5 hung db2??
Now this is really weird. TSM came up after we rebooted. But it threw a bunch of ANR msgs, then QUIT LOGGING. It seems to be running - I go onto a server and did a incr bkup, but nothing is logging in the actlog. 02/13/15 10:00:22 ANRD_2891663292 GetDomainByNodeId(pmcache.c:2645) Thread<280>: Node id 626 not found in table Policy.Domain.Members. (SESSION: 125) 02/13/15 10:00:22 ANRD Thread<280> issued message from: (SESSION: 125) 02/13/15 10:00:22 ANRD Thread<280> 0x00010001ca7c StdPutText (SESSION: 125) 02/13/15 10:00:22 ANRD Thread<280> 0x00010001d514 OutDiagToCons (SESSION: 125) 02/13/15 10:00:22 ANRD Thread<280> 0x000190bc outDiagfExt (SESSION: 125) 02/13/15 10:00:22 ANRD Thread<280> 0x0001004bf254 GetDomainByNodeId (SESSION: 125) 02/13/15 10:00:22 ANRD Thread<280> 0x0001004beeec pmOpenDomain (SESSION: 125) 02/13/15 10:00:22 ANRD Thread<280> 0x0001006ac78c BeginVbTxn (SESSION: 125) 02/13/15 10:00:22 ANRD Thread<280> 0x0001006a4068 SmNodeSession (SESSION: 125) 02/13/15 10:00:22 ANRD Thread<280> 0x00010053ca64 SmSchedSession (SESSION: 125) 02/13/15 10:00:22 ANRD Thread<280> 0x0001005525d8 HandleNodeSession (SESSION: 125) 02/13/15 10:00:22 ANRD Thread<280> 0x000100549c54 DoNodeSched (SESSION: 125) 02/13/15 10:00:22 ANRD Thread<280> 0x000100544900 smExecuteSession (SESSION: 125) 02/13/15 10:00:22 ANRD Thread<280> 0x000100078a7c psSessionThread (SESSION: 125) 02/13/15 10:00:22 ANRD Thread<280> 0x0001c264 StartThread (SESSION: 125) 02/13/15 10:00:22 ANRD_3095886799 HandleShortCircuitCodes(dbieval.c:1072) Thread<280>: Invalid handle used from tbtbl.c(10153). (SESSION: 125) 02/13/15 10:00:22 ANRD Thread<280> issued message from: (SESSION: 125) 02/13/15 10:00:22 ANRD Thread<280> 0x00010001ca7c StdPutText (SESSION: 125) 02/13/15 10:00:22 ANRD Thread<280> 0x00010001d514 OutDiagToCons (SESSION: 125) 02/13/15 10:00:22 ANRD Thread<280> 0x000190bc outDiagfExt (SESSION: 125) 02/13/15 10:00:22 ANRD Thread<280> 0x0001000cbb28 HandleShortCircuitCodes (SESSION: 125) 02/13/15 10:00:22 ANRD Thread<280> 0x0001000cb0a0 DbiEvalSQLOutcomeX (SESSION: 125) 02/13/15 10:00:22 ANRD Thread<280> 0x0001000a0a18 TblClose (SESSION: 125) 02/13/15 10:00:22 ANRD Thread<280> 0x00010019b13c FreeTxnDesc (SESSION: 125) 02/13/15 10:00:22 ANRD Thread<280> 0x00010019af14 dbiEndTxn (SESSION: 125) 02/13/15 10:00:22 ANRD Thread<280> 0x0001000458bc DoEndFuncCallbacks (SESSION: 125) 02/13/15 10:00:22 ANRD Thread<280> 0x000100045d70 tmAbortX (SESSION: 125) 02/13/15 10:00:22 ANRD Thread<280> 0x0001004bef60 pmOpenDomain (SESSION: 125) 02/13/15 10:00:22 ANRD Thread<280> 0x0001006ac78c BeginVbTxn (SESSION: 125) 02/13/15 10:00:22 ANRD Thread<280> 0x0001006a4068 SmNodeSession (SESSION: 125) 02/13/15 10:00:22 ANRD Thread<280> 0x00010053ca64 SmSchedSession (SESSION: 125) 02/13/15 10:00:22 ANRD Thread<280> 0x0001005525d8 HandleNodeSession (SESSION: 125) 02/13/15 10:00:22 ANRD Thread<280> 0x000100549c54 DoNodeSched (SESSION: 125) 02/13/15 10:00:22 ANRD Thread<280> 0x000100544900 smExecuteSession (SESSION: 125) 02/13/15 10:00:22 ANRD Thread<280> 0x000100078a7c psSessionThread (SESSION: 125) 02/13/15 10:00:22 ANRD Thread<280> 0x0001c264 StartThread (SESSION: 125) It then threw this error and STOPPED LOGGING into actlog. 02/13/15 10:03:24 ANR0103E admattrm.c(806): Error 2332 updating row in table "Global.Attributes". From: Rhodes, Richard L. Sent: Friday, February 13, 2015 9:49 AM To: adsm-l mailing list (ADSM-L@VM.MARIST.EDU) Subject: v6.3.5 hung db2?? Two days ago we upgrade one of our TSM instances to v6.3.5 (from v6.3.4). This is our first v6.3.5 instance. It runs on a AIX server. Last night at 19:32 it looks like DB2 went into some kind of a loop. The instance became unresponsive. Dsmadmc cmds hung (didn't error, just hung). Dsmserv process was getting almost no cpu, while ds2sync was running the box At 65-70% but had no disk I/O. I killed dsmserv, but db2 didn't go down. I tried db2stop but it did nothing. Finally rebooted to get everything up. The actlog shows no nasty errors. Just wondering if anyone else has had a runaway db2. Thanks Rick - The information contained in this message is intended only for the personal and confidential use of the recipient(s) named above. If the reader of this message is not the intended recipient or an agent responsible for delivering it to the
Re: FW: v6.3.5 hung db2??
Hi Rick, Off-hand I am not sure what the problem is, I think it would be a good idea to open a PMR if you have not already done so. Best regards, - Andy Andrew Raibeck | Tivoli Storage Manager Level 3 Technical Lead | stor...@us.ibm.com IBM Tivoli Storage Manager links: Product support: http://www.ibm.com/support/entry/portal/Overview/Software/Tivoli/Tivoli_Storage_Manager Online documentation: http://www.ibm.com/support/knowledgecenter/SSGSG7/welcome Product Wiki: https://www.ibm.com/developerworks/community/wikis/home/wiki/Tivoli%20Storage%20Manager "ADSM: Dist Stor Manager" wrote on 2015-02-13 10:41:55: > From: "Rhodes, Richard L." > To: ADSM-L@VM.MARIST.EDU > Date: 2015-02-13 10:44 > Subject: FW: v6.3.5 hung db2?? > Sent by: "ADSM: Dist Stor Manager" > > Now this is really weird. > > TSM came up after we rebooted. But it threw a bunch of ANR msgs, > then QUIT LOGGING. It seems to be running - I go onto a server and > did a incr bkup, but nothing is logging in the actlog. > > 02/13/15 10:00:22 ANRD_2891663292 GetDomainByNodeId > (pmcache.c:2645) Thread<280>: Node id 626 not found in table > Policy.Domain.Members. (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> issued message > from: (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x00010001ca7c > StdPutText (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x00010001d514 > OutDiagToCons (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x000190bc > outDiagfExt (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x0001004bf254 > GetDomainByNodeId (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x0001004beeec > pmOpenDomain (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x0001006ac78c > BeginVbTxn (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x0001006a4068 > SmNodeSession (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x00010053ca64 > SmSchedSession (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x0001005525d8 > HandleNodeSession (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x000100549c54 > DoNodeSched (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x000100544900 > smExecuteSession (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x000100078a7c > psSessionThread (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x0001c264 > StartThread (SESSION: 125) > 02/13/15 10:00:22 ANRD_3095886799 HandleShortCircuitCodes > (dbieval.c:1072) Thread<280>: Invalid handle used from tbtbl.c > (10153). (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> issued message > from: (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x00010001ca7c > StdPutText (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x00010001d514 > OutDiagToCons (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x000190bc > outDiagfExt (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x0001000cbb28 > HandleShortCircuitCodes (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x0001000cb0a0 > DbiEvalSQLOutcomeX (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x0001000a0a18 > TblClose (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x00010019b13c > FreeTxnDesc (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x00010019af14 > dbiEndTxn (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x0001000458bc > DoEndFuncCallbacks (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x000100045d70 > tmAbortX (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x0001004bef60 > pmOpenDomain (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x0001006ac78c > BeginVbTxn (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x0001006a4068 > SmNodeSession (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x00010053ca64 > SmSchedSession (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x0001005525d8 > HandleNodeSession (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x000100549c54 > DoNodeSched (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x000100544900 > smExecuteSession (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x000100078a7c > psSessionThread (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x0001c264 > StartThread (SESSION: 125) > > It then threw this error and STOPPED LOGGING into actlog. > > 02/13/15 10:03:24 ANR0103E admattrm.c(806): Error 2332 > updating row in table "Global.Attributes". > > > > > From: Rhodes, Richard L. > Sent: Friday, February 13
Re: FW: v6.3.5 hung db2??
Yea. I opened a Sev 1. Thanks! Rick -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Andrew Raibeck Sent: Friday, February 13, 2015 10:57 AM To: ADSM-L@VM.MARIST.EDU Subject: Re: FW: v6.3.5 hung db2?? Hi Rick, Off-hand I am not sure what the problem is, I think it would be a good idea to open a PMR if you have not already done so. Best regards, - Andy Andrew Raibeck | Tivoli Storage Manager Level 3 Technical Lead | stor...@us.ibm.com IBM Tivoli Storage Manager links: Product support: http://www.ibm.com/support/entry/portal/Overview/Software/Tivoli/Tivoli_Storage_Manager Online documentation: http://www.ibm.com/support/knowledgecenter/SSGSG7/welcome Product Wiki: https://www.ibm.com/developerworks/community/wikis/home/wiki/Tivoli%20Storage%20Manager "ADSM: Dist Stor Manager" wrote on 2015-02-13 10:41:55: > From: "Rhodes, Richard L." > To: ADSM-L@VM.MARIST.EDU > Date: 2015-02-13 10:44 > Subject: FW: v6.3.5 hung db2?? > Sent by: "ADSM: Dist Stor Manager" > > Now this is really weird. > > TSM came up after we rebooted. But it threw a bunch of ANR msgs, > then QUIT LOGGING. It seems to be running - I go onto a server and > did a incr bkup, but nothing is logging in the actlog. > > 02/13/15 10:00:22 ANRD_2891663292 GetDomainByNodeId > (pmcache.c:2645) Thread<280>: Node id 626 not found in table > Policy.Domain.Members. (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> issued message > from: (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x00010001ca7c > StdPutText (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x00010001d514 > OutDiagToCons (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x000190bc > outDiagfExt (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x0001004bf254 > GetDomainByNodeId (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x0001004beeec > pmOpenDomain (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x0001006ac78c > BeginVbTxn (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x0001006a4068 > SmNodeSession (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x00010053ca64 > SmSchedSession (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x0001005525d8 > HandleNodeSession (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x000100549c54 > DoNodeSched (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x000100544900 > smExecuteSession (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x000100078a7c > psSessionThread (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x0001c264 > StartThread (SESSION: 125) > 02/13/15 10:00:22 ANRD_3095886799 HandleShortCircuitCodes > (dbieval.c:1072) Thread<280>: Invalid handle used from tbtbl.c > (10153). (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> issued message > from: (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x00010001ca7c > StdPutText (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x00010001d514 > OutDiagToCons (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x000190bc > outDiagfExt (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x0001000cbb28 > HandleShortCircuitCodes (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x0001000cb0a0 > DbiEvalSQLOutcomeX (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x0001000a0a18 > TblClose (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x00010019b13c > FreeTxnDesc (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x00010019af14 > dbiEndTxn (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x0001000458bc > DoEndFuncCallbacks (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x000100045d70 > tmAbortX (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x0001004bef60 > pmOpenDomain (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x0001006ac78c > BeginVbTxn (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x0001006a4068 > SmNodeSession (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x00010053ca64 > SmSchedSession (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x0001005525d8 > HandleNodeSession (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x000100549c54 > DoNodeSched (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x000100544900 > smExecuteSession (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x000100078a7c > psSessionThread (SESSION: 125) > 02/13/15 10:00:22 ANRD Thread<280> 0x0001c264 >
Re: FW: v6.3.5 hung db2??
Hello, please keep us posted. I will have to go from 6.3.4-300 to a higher version because of the NDMP dump > 2TB overwrite problem... Bye Rainer On 13.02.2015 17:05, Rhodes, Richard L. wrote: > Yea. I opened a Sev 1. > > Thanks! > > Rick > > > > -Original Message- > From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of > Andrew Raibeck > Sent: Friday, February 13, 2015 10:57 AM > To: ADSM-L@VM.MARIST.EDU > Subject: Re: FW: v6.3.5 hung db2?? > > Hi Rick, > > Off-hand I am not sure what the problem is, I think it would be a good idea > to open a PMR if you have not already done so. > > Best regards, > > - Andy > > > > Andrew Raibeck | Tivoli Storage Manager Level 3 Technical Lead | > stor...@us.ibm.com > > IBM Tivoli Storage Manager links: > Product support: > http://www.ibm.com/support/entry/portal/Overview/Software/Tivoli/Tivoli_Storage_Manager > > Online documentation: > http://www.ibm.com/support/knowledgecenter/SSGSG7/welcome > Product Wiki: > https://www.ibm.com/developerworks/community/wikis/home/wiki/Tivoli%20Storage%20Manager > > "ADSM: Dist Stor Manager" wrote on 2015-02-13 > 10:41:55: > >> From: "Rhodes, Richard L." >> To: ADSM-L@VM.MARIST.EDU >> Date: 2015-02-13 10:44 >> Subject: FW: v6.3.5 hung db2?? >> Sent by: "ADSM: Dist Stor Manager" >> >> Now this is really weird. >> >> TSM came up after we rebooted. But it threw a bunch of ANR msgs, >> then QUIT LOGGING. It seems to be running - I go onto a server and >> did a incr bkup, but nothing is logging in the actlog. >> >> 02/13/15 10:00:22 ANRD_2891663292 GetDomainByNodeId >> (pmcache.c:2645) Thread<280>: Node id 626 not found in table >> Policy.Domain.Members. (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> issued message >> from: (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x00010001ca7c >> StdPutText (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x00010001d514 >> OutDiagToCons (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x000190bc >> outDiagfExt (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x0001004bf254 >> GetDomainByNodeId (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x0001004beeec >> pmOpenDomain (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x0001006ac78c >> BeginVbTxn (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x0001006a4068 >> SmNodeSession (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x00010053ca64 >> SmSchedSession (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x0001005525d8 >> HandleNodeSession (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x000100549c54 >> DoNodeSched (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x000100544900 >> smExecuteSession (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x000100078a7c >> psSessionThread (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x0001c264 >> StartThread (SESSION: 125) >> 02/13/15 10:00:22 ANRD_3095886799 HandleShortCircuitCodes >> (dbieval.c:1072) Thread<280>: Invalid handle used from tbtbl.c >> (10153). (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> issued message >> from: (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x00010001ca7c >> StdPutText (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x00010001d514 >> OutDiagToCons (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x000190bc >> outDiagfExt (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x0001000cbb28 >> HandleShortCircuitCodes (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x0001000cb0a0 >> DbiEvalSQLOutcomeX (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x0001000a0a18 >> TblClose (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x00010019b13c >> FreeTxnDesc (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x00010019af14 >> dbiEndTxn (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x0001000458bc >> DoEndFuncCallbacks (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x000100045d70 >> tmAbortX (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x0001004bef60 >> pmOpenDomain (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x0001006ac78c >> BeginVbTxn (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x0001006a4068 >> SmNodeSession (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x00010053ca64 >> SmSchedSession (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x0001005525d8 >> HandleNodeSession (SESSION: 125) >> 02/13/15 10:00:22 A
DEVCLASS=FILE - what am I missing
Up until recently, I have always used DEVCLASS=DISK for disk storage and always preformatted/allocated the disk volumes into multiple chunks to all for multi-I/O benefits. When I recently stood-up a new server, I decided to try DEVCLASS=FILE for disk-based storage/incoming backups. I thought I understood that FILE type storage was basically "tape/sequential files on disk" and would act accordingly and things like reclamation now applied so when the file chunks (I defined 50GB file sizes) got below the reclaim value, it would reclaim such files, create new ones and delete the old ones automagically. Well, last night became a disaster. Backups failing all over because it couldn't allocate any more files and also would not automatically shift to use the "nextpool" which is defined as a tape pool. So, what am I doing wrong? What assumptions are wrong? Here is the devclass values with the empty values left out...: Device Class Name: TSMFS Device Access Strategy: Sequential Storage Pool Count: 1 Device Type: FILE Format: DRIVE Est/Max Capacity (MB): 51,200.0 Mount Limit: 40 Directory: /tsmpool Here is the lone stgpool that used this devclass: 12:06:21 PM GALAXY : q stg backuppool f=d Storage Pool Name: BACKUPPOOL Storage Pool Type: Primary Device Class Name: TSMFS Estimated Capacity: 7,106 G Space Trigger Util: 84.5 Pct Util: 80.9 Pct Migr: 80.9 Pct Logical: 99.2 High Mig Pct: 85 Low Mig Pct: 75 Migration Delay: 0 Migration Continue: Yes Migration Processes: 1 Reclamation Processes: 1 Next Storage Pool: PRIMARY-ONSITE Reclaim Storage Pool: Maximum Size Threshold: No Limit Access: Read/Write Description: Overflow Location: Cache Migrated Files?: Collocate?: No Reclamation Threshold: 59 Offsite Reclamation Limit: Maximum Scratch Volumes Allowed: 143 Number of Scratch Volumes Used: 137 Delay Period for Volume Reuse: 0 Day(s) Migration in Progress?: No Amount Migrated (MB): 0.00 Elapsed Migration Time (seconds): 1,009 Reclamation in Progress?: No Last Update by (administrator): ZFORRAY Last Update Date/Time: 02/13/2015 11:44:23 Storage Pool Data Format: Native Copy Storage Pool(s): Active Data Pool(s): Continue Copy on Error?: Yes CRC Data: No Reclamation Type: Threshold Overwrite Data when Deleted: Deduplicate Data?: No Processes For Identifying Duplicates: Duplicate Data Not Stored: Auto-copy Mode: Client Contains Data Deduplicated by Client?: No I calculated the "Max Scratch Volumes" value based on having ~7.6TB filesystem so 50GB * 143 = 7.1TB This morning when I checked, there were plenty of volumes with <40% utilized. SO why didn't reclaim kick-in? or am I totally off on this assumption? I manually performed move data on them and it freed things up. -- *Zoltan Forray* TSM Software & Hardware Administrator BigBro / Hobbit / Xymon Administrator Virginia Commonwealth University UCC/Office of Technology Services zfor...@vcu.edu - 804-828-4807 Don't be a phishing victim - VCU and other reputable organizations will never use email to request that you reply with your password, social security number or confidential personal information. For more details visit http://infosecurity.vcu.edu/phishing.html
Re: DEVCLASS=FILE - what am I missing
As the number of vols is damn close to Max scratch. I'd say you ran out. No thing in actlog? Bung it up and see what happens. On Fri, 13 Feb 2015 17:15 Zoltan Forray wrote: > Up until recently, I have always used DEVCLASS=DISK for disk storage and > always preformatted/allocated the disk volumes into multiple chunks to all > for multi-I/O benefits. > > When I recently stood-up a new server, I decided to try DEVCLASS=FILE for > disk-based storage/incoming backups. > > I thought I understood that FILE type storage was basically > "tape/sequential files on disk" and would act accordingly and things like > reclamation now applied so when the file chunks (I defined 50GB file sizes) > got below the reclaim value, it would reclaim such files, create new ones > and delete the old ones automagically. > > Well, last night became a disaster. Backups failing all over because it > couldn't allocate any more files and also would not automatically shift to > use the "nextpool" which is defined as a tape pool. > > So, what am I doing wrong? What assumptions are wrong? Here is the > devclass values with the empty values left out...: > > Device Class Name: TSMFS > Device Access Strategy: Sequential > Storage Pool Count: 1 >Device Type: FILE > Format: DRIVE > Est/Max Capacity (MB): 51,200.0 >Mount Limit: 40 > Directory: /tsmpool > > Here is the lone stgpool that used this devclass: > > 12:06:21 PM GALAXY : q stg backuppool f=d > Storage Pool Name: BACKUPPOOL > Storage Pool Type: Primary > Device Class Name: TSMFS >Estimated Capacity: 7,106 G >Space Trigger Util: 84.5 > Pct Util: 80.9 > Pct Migr: 80.9 > Pct Logical: 99.2 > High Mig Pct: 85 > Low Mig Pct: 75 > Migration Delay: 0 >Migration Continue: Yes > Migration Processes: 1 > Reclamation Processes: 1 > Next Storage Pool: PRIMARY-ONSITE > Reclaim Storage Pool: >Maximum Size Threshold: No Limit >Access: Read/Write > Description: > Overflow Location: > Cache Migrated Files?: >Collocate?: No > Reclamation Threshold: 59 > Offsite Reclamation Limit: > Maximum Scratch Volumes Allowed: 143 >Number of Scratch Volumes Used: 137 > Delay Period for Volume Reuse: 0 Day(s) >Migration in Progress?: No > Amount Migrated (MB): 0.00 > Elapsed Migration Time (seconds): 1,009 > Reclamation in Progress?: No >Last Update by (administrator): ZFORRAY > Last Update Date/Time: 02/13/2015 11:44:23 > Storage Pool Data Format: Native > Copy Storage Pool(s): > Active Data Pool(s): > Continue Copy on Error?: Yes > CRC Data: No > Reclamation Type: Threshold > Overwrite Data when Deleted: > Deduplicate Data?: No > Processes For Identifying Duplicates: > Duplicate Data Not Stored: >Auto-copy Mode: Client > Contains Data Deduplicated by Client?: No > > I calculated the "Max Scratch Volumes" value based on having ~7.6TB > filesystem so 50GB * 143 = 7.1TB > > This morning when I checked, there were plenty of volumes with <40% > utilized. SO why didn't reclaim kick-in? or am I totally off on this > assumption? I manually performed move data on them and it freed things > up. > -- > *Zoltan Forray* > TSM Software & Hardware Administrator > BigBro / Hobbit / Xymon Administrator > Virginia Commonwealth University > UCC/Office of Technology Services > zfor...@vcu.edu - 804-828-4807 > Don't be a phishing victim - VCU and other reputable organizations will > never use email to request that you reply with your password, social > security number or confidential personal information. For more details > visit http://infosecurity.vcu.edu/phishing.html >
Re: FW: v6.3.5 hung db2??
Hi all, I'll like to remove my UserID in this group, can I do that? Regards, Diego F. On Feb 13, 2015 1:54 PM, "Rainer Tammer" wrote: > Hello, > please keep us posted. > > I will have to go from 6.3.4-300 to a higher version because of the NDMP > dump > 2TB overwrite problem... > > Bye > Rainer > > On 13.02.2015 17:05, Rhodes, Richard L. wrote: > > Yea. I opened a Sev 1. > > > > Thanks! > > > > Rick > > > > > > > > -Original Message- > > From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf > Of Andrew Raibeck > > Sent: Friday, February 13, 2015 10:57 AM > > To: ADSM-L@VM.MARIST.EDU > > Subject: Re: FW: v6.3.5 hung db2?? > > > > Hi Rick, > > > > Off-hand I am not sure what the problem is, I think it would be a good > idea > > to open a PMR if you have not already done so. > > > > Best regards, > > > > - Andy > > > > > > > > > Andrew Raibeck | Tivoli Storage Manager Level 3 Technical Lead | > > stor...@us.ibm.com > > > > IBM Tivoli Storage Manager links: > > Product support: > > > http://www.ibm.com/support/entry/portal/Overview/Software/Tivoli/Tivoli_Storage_Manager > > > > Online documentation: > > http://www.ibm.com/support/knowledgecenter/SSGSG7/welcome > > Product Wiki: > > > https://www.ibm.com/developerworks/community/wikis/home/wiki/Tivoli%20Storage%20Manager > > > > "ADSM: Dist Stor Manager" wrote on 2015-02-13 > > 10:41:55: > > > >> From: "Rhodes, Richard L." > >> To: ADSM-L@VM.MARIST.EDU > >> Date: 2015-02-13 10:44 > >> Subject: FW: v6.3.5 hung db2?? > >> Sent by: "ADSM: Dist Stor Manager" > >> > >> Now this is really weird. > >> > >> TSM came up after we rebooted. But it threw a bunch of ANR msgs, > >> then QUIT LOGGING. It seems to be running - I go onto a server and > >> did a incr bkup, but nothing is logging in the actlog. > >> > >> 02/13/15 10:00:22 ANRD_2891663292 GetDomainByNodeId > >> (pmcache.c:2645) Thread<280>: Node id 626 not found in table > >> Policy.Domain.Members. (SESSION: 125) > >> 02/13/15 10:00:22 ANRD Thread<280> issued message > >> from: (SESSION: 125) > >> 02/13/15 10:00:22 ANRD Thread<280> 0x00010001ca7c > >> StdPutText (SESSION: 125) > >> 02/13/15 10:00:22 ANRD Thread<280> 0x00010001d514 > >> OutDiagToCons (SESSION: 125) > >> 02/13/15 10:00:22 ANRD Thread<280> 0x000190bc > >> outDiagfExt (SESSION: 125) > >> 02/13/15 10:00:22 ANRD Thread<280> 0x0001004bf254 > >> GetDomainByNodeId (SESSION: 125) > >> 02/13/15 10:00:22 ANRD Thread<280> 0x0001004beeec > >> pmOpenDomain (SESSION: 125) > >> 02/13/15 10:00:22 ANRD Thread<280> 0x0001006ac78c > >> BeginVbTxn (SESSION: 125) > >> 02/13/15 10:00:22 ANRD Thread<280> 0x0001006a4068 > >> SmNodeSession (SESSION: 125) > >> 02/13/15 10:00:22 ANRD Thread<280> 0x00010053ca64 > >> SmSchedSession (SESSION: 125) > >> 02/13/15 10:00:22 ANRD Thread<280> 0x0001005525d8 > >> HandleNodeSession (SESSION: 125) > >> 02/13/15 10:00:22 ANRD Thread<280> 0x000100549c54 > >> DoNodeSched (SESSION: 125) > >> 02/13/15 10:00:22 ANRD Thread<280> 0x000100544900 > >> smExecuteSession (SESSION: 125) > >> 02/13/15 10:00:22 ANRD Thread<280> 0x000100078a7c > >> psSessionThread (SESSION: 125) > >> 02/13/15 10:00:22 ANRD Thread<280> 0x0001c264 > >> StartThread (SESSION: 125) > >> 02/13/15 10:00:22 ANRD_3095886799 HandleShortCircuitCodes > >> (dbieval.c:1072) Thread<280>: Invalid handle used from tbtbl.c > >> (10153). (SESSION: 125) > >> 02/13/15 10:00:22 ANRD Thread<280> issued message > >> from: (SESSION: 125) > >> 02/13/15 10:00:22 ANRD Thread<280> 0x00010001ca7c > >> StdPutText (SESSION: 125) > >> 02/13/15 10:00:22 ANRD Thread<280> 0x00010001d514 > >> OutDiagToCons (SESSION: 125) > >> 02/13/15 10:00:22 ANRD Thread<280> 0x000190bc > >> outDiagfExt (SESSION: 125) > >> 02/13/15 10:00:22 ANRD Thread<280> 0x0001000cbb28 > >> HandleShortCircuitCodes (SESSION: 125) > >> 02/13/15 10:00:22 ANRD Thread<280> 0x0001000cb0a0 > >> DbiEvalSQLOutcomeX (SESSION: 125) > >> 02/13/15 10:00:22 ANRD Thread<280> 0x0001000a0a18 > >> TblClose (SESSION: 125) > >> 02/13/15 10:00:22 ANRD Thread<280> 0x00010019b13c > >> FreeTxnDesc (SESSION: 125) > >> 02/13/15 10:00:22 ANRD Thread<280> 0x00010019af14 > >> dbiEndTxn (SESSION: 125) > >> 02/13/15 10:00:22 ANRD Thread<280> 0x0001000458bc > >> DoEndFuncCallbacks (SESSION: 125) > >> 02/13/15 10:00:22 ANRD Thread<280> 0x000100045d70 > >> tmAbortX (SESSION: 125) > >> 02/13/15 10:00:22 ANRD Thread<280> 0x0001004bef60 > >> pmOpenDomain (SESSION: 125) > >> 02/13/15 10:00:22 ANRD Thread<280>
Re: DEVCLASS=FILE - what am I missing
At 12:12 PM 2/13/2015, Zoltan Forray wrote: >Well, last night became a disaster. Backups failing all over because it >couldn't allocate any more files and also would not automatically shift to >use the "nextpool" which is defined as a tape pool. Alas, TSM doesn't automatically "roll over" when the ingest pool in FILE. I really wish that it did. Here's the relevant documentation for NEXTSTG for FILE stgpools: > When there is insufficient space available in the current storage pool, > the NEXTSTGPOOL parameter for sequential access storage pools does not > allow data to be stored into the next pool. In this case, the server issues > a message and the transaction fails. ..Paul -- Paul ZarnowskiPh: 607-255-4757 Assistant Director for Storage Services Fx: 607-255-8521 IT at Cornell / InfrastructureEm: p...@cornell.edu 719 Rhodes Hall, Ithaca, NY 14853-3801
Re: DEVCLASS=FILE - what am I missing
Sure there were plenty of errors: 2/12/2015 10:00:23 PM ANR0522W Transaction failed for session 2639 for node RO-CVS (Linux x86-64) - no space available in storage pool BACKUPPOOL and all successor pools. 2/12/2015 10:04:20 PM ANR0522W Transaction failed for session 2648 for node RDO3 (WinNT) - no space available in storage pool BACKUPPOOL and all successor pools. 2/12/2015 10:05:00 PM ANR0522W Transaction failed for session 2636 for node BARRACUDA.RADONC.RDO.MCVH-VCU.EDU (Linux86) - no space available in storage pool BACKUPPOOL and all successor pools. 2/12/2015 10:08:44 PM ANR0522W Transaction failed for session 2653 for node RO-MCCB129B (WinNT) - no space available in storage pool BACKUPPOOL and all successor pools. 2/13/2015 1:13:01 AM ANR0522W Transaction failed for session 2740 for node TSMCIFS2 (WinNT) - no space available in storage pool BACKUPPOOL and all successor pools. There was plenty of space in the disk. It just hit the max number of volumes.. But why didn't it reclaim the volumes it should have, that were less than 50% used? Am I supposed to do something to start reclamation processing for FILE pools? My tapes automatically reclaim themselves On Fri, Feb 13, 2015 at 1:16 PM, Steven Langdale wrote: > As the number of vols is damn close to Max scratch. I'd say you ran out. > No thing in actlog? Bung it up and see what happens. > > On Fri, 13 Feb 2015 17:15 Zoltan Forray wrote: > > > Up until recently, I have always used DEVCLASS=DISK for disk storage and > > always preformatted/allocated the disk volumes into multiple chunks to > all > > for multi-I/O benefits. > > > > When I recently stood-up a new server, I decided to try DEVCLASS=FILE for > > disk-based storage/incoming backups. > > > > I thought I understood that FILE type storage was basically > > "tape/sequential files on disk" and would act accordingly and things like > > reclamation now applied so when the file chunks (I defined 50GB file > sizes) > > got below the reclaim value, it would reclaim such files, create new ones > > and delete the old ones automagically. > > > > Well, last night became a disaster. Backups failing all over because it > > couldn't allocate any more files and also would not automatically shift > to > > use the "nextpool" which is defined as a tape pool. > > > > So, what am I doing wrong? What assumptions are wrong? Here is the > > devclass values with the empty values left out...: > > > > Device Class Name: TSMFS > > Device Access Strategy: Sequential > > Storage Pool Count: 1 > >Device Type: FILE > > Format: DRIVE > > Est/Max Capacity (MB): 51,200.0 > >Mount Limit: 40 > > Directory: /tsmpool > > > > Here is the lone stgpool that used this devclass: > > > > 12:06:21 PM GALAXY : q stg backuppool f=d > > Storage Pool Name: BACKUPPOOL > > Storage Pool Type: Primary > > Device Class Name: TSMFS > >Estimated Capacity: 7,106 G > >Space Trigger Util: 84.5 > > Pct Util: 80.9 > > Pct Migr: 80.9 > > Pct Logical: 99.2 > > High Mig Pct: 85 > > Low Mig Pct: 75 > > Migration Delay: 0 > >Migration Continue: Yes > > Migration Processes: 1 > > Reclamation Processes: 1 > > Next Storage Pool: PRIMARY-ONSITE > > Reclaim Storage Pool: > >Maximum Size Threshold: No Limit > >Access: Read/Write > > Description: > > Overflow Location: > > Cache Migrated Files?: > >Collocate?: No > > Reclamation Threshold: 59 > > Offsite Reclamation Limit: > > Maximum Scratch Volumes Allowed: 143 > >Number of Scratch Volumes Used: 137 > > Delay Period for Volume Reuse: 0 Day(s) > >Migration in Progress?: No > > Amount Migrated (MB): 0.00 > > Elapsed Migration Time (seconds): 1,009 > > Reclamation in Progress?: No > >Last Update by (administrator): ZFORRAY > > Last Update Date/Time: 02/13/2015 11:44:23 > > Storage Pool Data Format: Native > > Copy Storage Pool(s): > > Active Data Pool(s): > > Continue Copy on Error?: Yes > > CRC Data: No > > Reclamation Type: Threshold > > Overwrite Data when Deleted: > > Deduplicate Data?: No > > Processes For Identifying Duplicates: > > Duplicate Data Not Stored: > >Auto-copy Mode: Client
Re: DEVCLASS=FILE - what am I missing
WOW - I didn't realize that. Thanks for pointing that out. Won't automatically go to nextstgpool, didn't automatically reclaim? So, what is the advantage/benefit of DEVCLASS=FILE? Sounds like time to go back to DEVCLASS=DISK On Fri, Feb 13, 2015 at 1:22 PM, Paul Zarnowski wrote: > At 12:12 PM 2/13/2015, Zoltan Forray wrote: > >Well, last night became a disaster. Backups failing all over because it > >couldn't allocate any more files and also would not automatically shift to > >use the "nextpool" which is defined as a tape pool. > > Alas, TSM doesn't automatically "roll over" when the ingest pool in FILE. > I really wish that it did. Here's the relevant documentation for NEXTSTG > for FILE stgpools: > > > When there is insufficient space available in the current storage > pool, the NEXTSTGPOOL parameter for sequential access storage pools does > not allow data to be stored into the next pool. In this case, the server > issues a message and the transaction fails. > > ..Paul > > > -- > Paul ZarnowskiPh: 607-255-4757 > Assistant Director for Storage Services Fx: 607-255-8521 > IT at Cornell / InfrastructureEm: p...@cornell.edu > 719 Rhodes Hall, Ithaca, NY 14853-3801 > -- *Zoltan Forray* TSM Software & Hardware Administrator BigBro / Hobbit / Xymon Administrator Virginia Commonwealth University UCC/Office of Technology Services zfor...@vcu.edu - 804-828-4807 Don't be a phishing victim - VCU and other reputable organizations will never use email to request that you reply with your password, social security number or confidential personal information. For more details visit http://infosecurity.vcu.edu/phishing.html
Re: DEVCLASS=FILE - what am I missing
FILE allows deduplication; DISK doesn't. My impression after some experimenting is that FILE wasn't meant to replace DISK; it was solely meant to replace tape device classes. We didn't need to, so those experiments quietly ended. On Fri, Feb 13, 2015 at 12:30 PM, Zoltan Forray wrote: > WOW - I didn't realize that. Thanks for pointing that out. > > Won't automatically go to nextstgpool, didn't automatically reclaim? So, > what is the advantage/benefit of DEVCLASS=FILE? Sounds like time to go > back to DEVCLASS=DISK > > On Fri, Feb 13, 2015 at 1:22 PM, Paul Zarnowski wrote: > > > At 12:12 PM 2/13/2015, Zoltan Forray wrote: > > >Well, last night became a disaster. Backups failing all over because it > > >couldn't allocate any more files and also would not automatically shift > to > > >use the "nextpool" which is defined as a tape pool. > > > > Alas, TSM doesn't automatically "roll over" when the ingest pool in FILE. > > I really wish that it did. Here's the relevant documentation for NEXTSTG > > for FILE stgpools: > > > > > When there is insufficient space available in the current storage > > pool, the NEXTSTGPOOL parameter for sequential access storage pools > does > > not allow data to be stored into the next pool. In this case, the > server > > issues a message and the transaction fails. > > > > ..Paul > > > > > > -- > > Paul ZarnowskiPh: 607-255-4757 > > Assistant Director for Storage Services Fx: 607-255-8521 > > IT at Cornell / InfrastructureEm: p...@cornell.edu > > 719 Rhodes Hall, Ithaca, NY 14853-3801 > > > > > > -- > *Zoltan Forray* > TSM Software & Hardware Administrator > BigBro / Hobbit / Xymon Administrator > Virginia Commonwealth University > UCC/Office of Technology Services > zfor...@vcu.edu - 804-828-4807 > Don't be a phishing victim - VCU and other reputable organizations will > never use email to request that you reply with your password, social > security number or confidential personal information. For more details > visit http://infosecurity.vcu.edu/phishing.html >
Re: DEVCLASS=FILE - what am I missing
Probably just bad luck…. When I set up FILE pools for customers, I usually have to tweak them a couple of times to get the sizing right, depends on the load, the number of concurrent sessions, etc. Been there, done that, got the scars. Assumptions you should change: • Unlike a disk pool, if there is no space available in a TYPE=FILE pool, backups don't fail over to the NEXT stgpool. WAD. I don't know why it's that way. I think some RFE pressure is indicated, it causes me grief. • In a seq pool on disk, you need to be much more aggressive about reclaims. If you have reclaim set at 59, you are saying you are willing to live with 59% of your disk space dead/expired and unusable! That means you need to size your disk pool so that 41% is big enough to hold the entire night's backup. I set reclaim on my disk pools to 20%, or 15% if the disk throughput is sufficient to tolerate the I/O. • Migration from a sequential pool may not be working like you think; read the DEFINE STGPOOL HIGHMIG option definition in the admin ref for your version. I always set MAXSCRATCH to 0 for a sequential file pool and use pre-defined volumes instead of scratch so I have better control over what happens. • You have mountlimit set to 40 in the devclass; how many concurrent client sessions do you have writing to that pool? • Also check server option NUMOPENVOLSALLOWED to make sure you can have enough volumes in use at once to do concurrent backups plus reclaims plus backup stgpool plus migration etc etc etc. • If you are going to fill this pool and empty it out via migration every night, best to force the migration yourself with a MIGRATE STPOOL command rather than relying on the threshold. And if reclaims don't kick in on their own regularly, set up a RECLAIM STGPOOL schedule to fire daily anyway. Won't hurt. • The usual problem I see is that people don't have enough volumes defined in the pool to account for all the concurrent sessions, plus some empty volumes to allow for reclaims, plus a high enough NUMOPENVOLSALLOWED. You've defined your volumes at 50G, so you should have enough. One of these other issues is probably your problem. • While things are working well, check daily to see what is a "normal" value of the number of "empty" volumes in that pool. Then set yourself an alert to let you know when the number of "empty" volumes drops below the "normal" value so you can investigate before disaster sets in. Good luck! Wanda Prather TSM Consultant ICF International Enterprise and Cybersecurity Systems Division -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Zoltan Forray Sent: Friday, February 13, 2015 12:13 PM To: ADSM-L@VM.MARIST.EDU Subject: [ADSM-L] DEVCLASS=FILE - what am I missing Up until recently, I have always used DEVCLASS=DISK for disk storage and always preformatted/allocated the disk volumes into multiple chunks to all for multi-I/O benefits. When I recently stood-up a new server, I decided to try DEVCLASS=FILE for disk-based storage/incoming backups. I thought I understood that FILE type storage was basically "tape/sequential files on disk" and would act accordingly and things like reclamation now applied so when the file chunks (I defined 50GB file sizes) got below the reclaim value, it would reclaim such files, create new ones and delete the old ones automagically. Well, last night became a disaster. Backups failing all over because it couldn't allocate any more files and also would not automatically shift to use the "nextpool" which is defined as a tape pool. So, what am I doing wrong? What assumptions are wrong? Here is the devclass values with the empty values left out...: Device Class Name: TSMFS Device Access Strategy: Sequential Storage Pool Count: 1 Device Type: FILE Format: DRIVE Est/Max Capacity (MB): 51,200.0 Mount Limit: 40 Directory: /tsmpool Here is the lone stgpool that used this devclass: 12:06:21 PM GALAXY : q stg backuppool f=d Storage Pool Name: BACKUPPOOL Storage Pool Type: Primary Device Class Name: TSMFS Estimated Capacity: 7,106 G Space Trigger Util: 84.5 Pct Util: 80.9 Pct Migr: 80.9 Pct Logical: 99.2 High Mig Pct: 85 Low Mig Pct: 75 Migration Delay: 0 Migration Continue: Yes Migration Processes: 1 Reclamation Processes: 1 Next Storage Pool: PRIMARY-ONSITE Reclaim Storage Pool: Maximum Size Threshold: No Limit
Re: DEVCLASS=FILE - what am I missing
I front end several of my file device class pool with a disk device pool which will migrate to the file device pool and control the number of filling volumes. Volume that are in filling status will not reclaim, but you can manually move the data. -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Nick Laflamme Sent: Friday, February 13, 2015 10:37 AM To: ADSM-L@VM.MARIST.EDU Subject: Re: DEVCLASS=FILE - what am I missing FILE allows deduplication; DISK doesn't. My impression after some experimenting is that FILE wasn't meant to replace DISK; it was solely meant to replace tape device classes. We didn't need to, so those experiments quietly ended. On Fri, Feb 13, 2015 at 12:30 PM, Zoltan Forray wrote: > WOW - I didn't realize that. Thanks for pointing that out. > > Won't automatically go to nextstgpool, didn't automatically reclaim? So, > what is the advantage/benefit of DEVCLASS=FILE? Sounds like time to go > back to DEVCLASS=DISK > > On Fri, Feb 13, 2015 at 1:22 PM, Paul Zarnowski wrote: > > > At 12:12 PM 2/13/2015, Zoltan Forray wrote: > > >Well, last night became a disaster. Backups failing all over because it > > >couldn't allocate any more files and also would not automatically shift > to > > >use the "nextpool" which is defined as a tape pool. > > > > Alas, TSM doesn't automatically "roll over" when the ingest pool in FILE. > > I really wish that it did. Here's the relevant documentation for NEXTSTG > > for FILE stgpools: > > > > > When there is insufficient space available in the current storage > > pool, the NEXTSTGPOOL parameter for sequential access storage pools > does > > not allow data to be stored into the next pool. In this case, the > server > > issues a message and the transaction fails. > > > > ..Paul > > > > > > -- > > Paul ZarnowskiPh: 607-255-4757 > > Assistant Director for Storage Services Fx: 607-255-8521 > > IT at Cornell / InfrastructureEm: p...@cornell.edu > > 719 Rhodes Hall, Ithaca, NY 14853-3801 > > > > > > -- > *Zoltan Forray* > TSM Software & Hardware Administrator > BigBro / Hobbit / Xymon Administrator > Virginia Commonwealth University > UCC/Office of Technology Services > zfor...@vcu.edu - 804-828-4807 > Don't be a phishing victim - VCU and other reputable organizations will > never use email to request that you reply with your password, social > security number or confidential personal information. For more details > visit http://infosecurity.vcu.edu/phishing.html >
Re: DEVCLASS=FILE - what am I missing
Thanks for the detailed explanations / experiences / suggestions. I greatly appreciate and will store away in case I ever try this again. Yes we do have lots of clients backing up at once - we easily hit 40-simultaneous sessions thus the reason for the high number. NUMOPENVOLSALLOWED is set to 10 for this server. I had not planned to empty it every night. This server doesn't have that much incoming backups. It has been running for a month without needing to force migration to tape. On Fri, Feb 13, 2015 at 1:39 PM, Prather, Wanda wrote: > Probably just bad luck…. > > When I set up FILE pools for customers, I usually have to tweak them a > couple of times to get the sizing right, depends on the load, the number of > concurrent sessions, etc. Been there, done that, got the scars. > > Assumptions you should change: > > • Unlike a disk pool, if there is no space available in a TYPE=FILE > pool, backups don't fail over to the NEXT stgpool. WAD. I don't know why > it's that way. I think some RFE pressure is indicated, it causes me grief. > > • In a seq pool on disk, you need to be much more aggressive about > reclaims. If you have reclaim set at 59, you are saying you are willing to > live with 59% of your disk space dead/expired and unusable! That means you > need to size your disk pool so that 41% is big enough to hold the entire > night's backup. I set reclaim on my disk pools to 20%, or 15% if the disk > throughput is sufficient to tolerate the I/O. > > • Migration from a sequential pool may not be working like you > think; read the DEFINE STGPOOL HIGHMIG option definition in the admin ref > for your version. I always set MAXSCRATCH to 0 for a sequential file pool > and use pre-defined volumes instead of scratch so I have better control > over what happens. > > • You have mountlimit set to 40 in the devclass; how many concurrent > client sessions do you have writing to that pool? > > • Also check server option NUMOPENVOLSALLOWED to make sure you can > have enough volumes in use at once to do concurrent backups plus reclaims > plus backup stgpool plus migration etc etc etc. > > • If you are going to fill this pool and empty it out via migration > every night, best to force the migration yourself with a MIGRATE STPOOL > command rather than relying on the threshold. And if reclaims don't kick > in on their own regularly, set up a RECLAIM STGPOOL schedule to fire daily > anyway. Won't hurt. > > • The usual problem I see is that people don't have enough volumes > defined in the pool to account for all the concurrent sessions, plus some > empty volumes to allow for reclaims, plus a high enough > NUMOPENVOLSALLOWED. You've defined your volumes at 50G, so you should have > enough. One of these other issues is probably your problem. > > • While things are working well, check daily to see what is a > "normal" value of the number of "empty" volumes in that pool. Then set > yourself an alert to let you know when the number of "empty" volumes drops > below the "normal" value so you can investigate before disaster sets in. > > Good luck! > > Wanda Prather > TSM Consultant > ICF International Enterprise and Cybersecurity Systems Division > > > > > > -Original Message- > From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of > Zoltan Forray > Sent: Friday, February 13, 2015 12:13 PM > To: ADSM-L@VM.MARIST.EDU > Subject: [ADSM-L] DEVCLASS=FILE - what am I missing > > Up until recently, I have always used DEVCLASS=DISK for disk storage and > always preformatted/allocated the disk volumes into multiple chunks to all > for multi-I/O benefits. > > When I recently stood-up a new server, I decided to try DEVCLASS=FILE for > disk-based storage/incoming backups. > > I thought I understood that FILE type storage was basically > "tape/sequential files on disk" and would act accordingly and things like > reclamation now applied so when the file chunks (I defined 50GB file sizes) > got below the reclaim value, it would reclaim such files, create new ones > and delete the old ones automagically. > > Well, last night became a disaster. Backups failing all over because it > couldn't allocate any more files and also would not automatically shift to > use the "nextpool" which is defined as a tape pool. > > So, what am I doing wrong? What assumptions are wrong? Here is the > devclass values with the empty values left out...: > > Device Class Name: TSMFS > Device Access Strategy: Sequential > Storage Pool Count: 1 >Device Type: FILE > Format: DRIVE > Est/Max Capacity (MB): 51,200.0 >Mount Limit: 40 > Directory: /tsmpool > > Here is the lone stgpool that used this devclass: > > 12:06:21 PM GALAXY : q stg backuppool f=d > Storage Pool Name: BACKUPPOOL > Storage Pool Type: Primary >
Re: DEVCLASS=FILE - what am I missing
Thanks for all the replies. Pretty much confirms that FILE isn't for me. We don't do dedupe and there are a lot of manual/monitoring processes involved (I have enough to do with 8-TSM servers I manage - don't need more). Now to migrate 7TB of disk to tape so I can switch back to using DISK devclass On Fri, Feb 13, 2015 at 1:43 PM, Gee, Norman wrote: > I front end several of my file device class pool with a disk device pool > which will migrate to the file device pool and control the number of > filling volumes. Volume that are in filling status will not reclaim, but > you can manually move the data. > > -Original Message- > From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of > Nick Laflamme > Sent: Friday, February 13, 2015 10:37 AM > To: ADSM-L@VM.MARIST.EDU > Subject: Re: DEVCLASS=FILE - what am I missing > > FILE allows deduplication; DISK doesn't. > > My impression after some experimenting is that FILE wasn't meant to replace > DISK; it was solely meant to replace tape device classes. We didn't need > to, so those experiments quietly ended. > > > On Fri, Feb 13, 2015 at 12:30 PM, Zoltan Forray wrote: > > > WOW - I didn't realize that. Thanks for pointing that out. > > > > Won't automatically go to nextstgpool, didn't automatically reclaim? > So, > > what is the advantage/benefit of DEVCLASS=FILE? Sounds like time to go > > back to DEVCLASS=DISK > > > > On Fri, Feb 13, 2015 at 1:22 PM, Paul Zarnowski > wrote: > > > > > At 12:12 PM 2/13/2015, Zoltan Forray wrote: > > > >Well, last night became a disaster. Backups failing all over because > it > > > >couldn't allocate any more files and also would not automatically > shift > > to > > > >use the "nextpool" which is defined as a tape pool. > > > > > > Alas, TSM doesn't automatically "roll over" when the ingest pool in > FILE. > > > I really wish that it did. Here's the relevant documentation for > NEXTSTG > > > for FILE stgpools: > > > > > > > When there is insufficient space available in the current storage > > > pool, the NEXTSTGPOOL parameter for sequential access storage pools > > does > > > not allow data to be stored into the next pool. In this case, the > > server > > > issues a message and the transaction fails. > > > > > > ..Paul > > > > > > > > > -- > > > Paul ZarnowskiPh: 607-255-4757 > > > Assistant Director for Storage Services Fx: 607-255-8521 > > > IT at Cornell / InfrastructureEm: p...@cornell.edu > > > 719 Rhodes Hall, Ithaca, NY 14853-3801 > > > > > > > > > > > -- > > *Zoltan Forray* > > TSM Software & Hardware Administrator > > BigBro / Hobbit / Xymon Administrator > > Virginia Commonwealth University > > UCC/Office of Technology Services > > zfor...@vcu.edu - 804-828-4807 > > Don't be a phishing victim - VCU and other reputable organizations will > > never use email to request that you reply with your password, social > > security number or confidential personal information. For more details > > visit http://infosecurity.vcu.edu/phishing.html > > > -- *Zoltan Forray* TSM Software & Hardware Administrator BigBro / Hobbit / Xymon Administrator Virginia Commonwealth University UCC/Office of Technology Services zfor...@vcu.edu - 804-828-4807 Don't be a phishing victim - VCU and other reputable organizations will never use email to request that you reply with your password, social security number or confidential personal information. For more details visit http://infosecurity.vcu.edu/phishing.html
Re: FW: v6.3.5 hung db2??
Working with some good support folks! Looks like we hit this: http://www-01.ibm.com/support/docview.wss?crawler=1&uid=swg1IT06126 The v6.3.5 and v7.1.0 caused a bug in the rc.dsmserv startup script. The result is that db2 was running on limited memory - 32MB in our case. This was the default value in /etc/security/limits. Lvl 2 had me change /etc/security/limits default to unlimited memory. Lvl 1 had this above APAR and I fixed the rc.dsmserv script per the instructions. So it looks like our problems were caused by very low db2 memory. I believe it was restricted to 32mb! Rick -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Rainer Tammer Sent: Friday, February 13, 2015 11:53 AM To: ADSM-L@VM.MARIST.EDU Subject: Re: FW: v6.3.5 hung db2?? Hello, please keep us posted. I will have to go from 6.3.4-300 to a higher version because of the NDMP dump > 2TB overwrite problem... Bye Rainer On 13.02.2015 17:05, Rhodes, Richard L. wrote: > Yea. I opened a Sev 1. > > Thanks! > > Rick > > > > -Original Message- > From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of > Andrew Raibeck > Sent: Friday, February 13, 2015 10:57 AM > To: ADSM-L@VM.MARIST.EDU > Subject: Re: FW: v6.3.5 hung db2?? > > Hi Rick, > > Off-hand I am not sure what the problem is, I think it would be a good idea > to open a PMR if you have not already done so. > > Best regards, > > - Andy > > > > Andrew Raibeck | Tivoli Storage Manager Level 3 Technical Lead | > stor...@us.ibm.com > > IBM Tivoli Storage Manager links: > Product support: > http://www.ibm.com/support/entry/portal/Overview/Software/Tivoli/Tivoli_Storage_Manager > > Online documentation: > http://www.ibm.com/support/knowledgecenter/SSGSG7/welcome > Product Wiki: > https://www.ibm.com/developerworks/community/wikis/home/wiki/Tivoli%20Storage%20Manager > > "ADSM: Dist Stor Manager" wrote on 2015-02-13 > 10:41:55: > >> From: "Rhodes, Richard L." >> To: ADSM-L@VM.MARIST.EDU >> Date: 2015-02-13 10:44 >> Subject: FW: v6.3.5 hung db2?? >> Sent by: "ADSM: Dist Stor Manager" >> >> Now this is really weird. >> >> TSM came up after we rebooted. But it threw a bunch of ANR msgs, >> then QUIT LOGGING. It seems to be running - I go onto a server and >> did a incr bkup, but nothing is logging in the actlog. >> >> 02/13/15 10:00:22 ANRD_2891663292 GetDomainByNodeId >> (pmcache.c:2645) Thread<280>: Node id 626 not found in table >> Policy.Domain.Members. (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> issued message >> from: (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x00010001ca7c >> StdPutText (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x00010001d514 >> OutDiagToCons (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x000190bc >> outDiagfExt (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x0001004bf254 >> GetDomainByNodeId (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x0001004beeec >> pmOpenDomain (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x0001006ac78c >> BeginVbTxn (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x0001006a4068 >> SmNodeSession (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x00010053ca64 >> SmSchedSession (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x0001005525d8 >> HandleNodeSession (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x000100549c54 >> DoNodeSched (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x000100544900 >> smExecuteSession (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x000100078a7c >> psSessionThread (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x0001c264 >> StartThread (SESSION: 125) >> 02/13/15 10:00:22 ANRD_3095886799 HandleShortCircuitCodes >> (dbieval.c:1072) Thread<280>: Invalid handle used from tbtbl.c >> (10153). (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> issued message >> from: (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x00010001ca7c >> StdPutText (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x00010001d514 >> OutDiagToCons (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x000190bc >> outDiagfExt (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x0001000cbb28 >> HandleShortCircuitCodes (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x0001000cb0a0 >> DbiEvalSQLOutcomeX (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x0001000a0a18 >> TblClose (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x00010019b13c >> FreeTxnDesc (SESSION: 125) >>
Re: FW: v6.3.5 hung db2??
Rick, Thank you for letting us know about this. It would be interesting to know if related messages were captured in the db2diag.log when this started to manifest itself. Best, Ruth U of I, Urbana, IL -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Rhodes, Richard L. Sent: Friday, February 13, 2015 1:38 PM To: ADSM-L@VM.MARIST.EDU Subject: Re: [ADSM-L] FW: v6.3.5 hung db2?? Working with some good support folks! Looks like we hit this: http://www-01.ibm.com/support/docview.wss?crawler=1&uid=swg1IT06126 The v6.3.5 and v7.1.0 caused a bug in the rc.dsmserv startup script. The result is that db2 was running on limited memory - 32MB in our case. This was the default value in /etc/security/limits. Lvl 2 had me change /etc/security/limits default to unlimited memory. Lvl 1 had this above APAR and I fixed the rc.dsmserv script per the instructions. So it looks like our problems were caused by very low db2 memory. I believe it was restricted to 32mb! Rick -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Rainer Tammer Sent: Friday, February 13, 2015 11:53 AM To: ADSM-L@VM.MARIST.EDU Subject: Re: FW: v6.3.5 hung db2?? Hello, please keep us posted. I will have to go from 6.3.4-300 to a higher version because of the NDMP dump > 2TB overwrite problem... Bye Rainer On 13.02.2015 17:05, Rhodes, Richard L. wrote: > Yea. I opened a Sev 1. > > Thanks! > > Rick > > > > -Original Message- > From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf > Of Andrew Raibeck > Sent: Friday, February 13, 2015 10:57 AM > To: ADSM-L@VM.MARIST.EDU > Subject: Re: FW: v6.3.5 hung db2?? > > Hi Rick, > > Off-hand I am not sure what the problem is, I think it would be a good > idea to open a PMR if you have not already done so. > > Best regards, > > - Andy > > __ > __ > > Andrew Raibeck | Tivoli Storage Manager Level 3 Technical Lead | > stor...@us.ibm.com > > IBM Tivoli Storage Manager links: > Product support: > http://www.ibm.com/support/entry/portal/Overview/Software/Tivoli/Tivol > i_Storage_Manager > > Online documentation: > http://www.ibm.com/support/knowledgecenter/SSGSG7/welcome > Product Wiki: > https://www.ibm.com/developerworks/community/wikis/home/wiki/Tivoli%20 > Storage%20Manager > > "ADSM: Dist Stor Manager" wrote on 2015-02-13 > 10:41:55: > >> From: "Rhodes, Richard L." >> To: ADSM-L@VM.MARIST.EDU >> Date: 2015-02-13 10:44 >> Subject: FW: v6.3.5 hung db2?? >> Sent by: "ADSM: Dist Stor Manager" >> >> Now this is really weird. >> >> TSM came up after we rebooted. But it threw a bunch of ANR msgs, >> then QUIT LOGGING. It seems to be running - I go onto a server and >> did a incr bkup, but nothing is logging in the actlog. >> >> 02/13/15 10:00:22 ANRD_2891663292 GetDomainByNodeId >> (pmcache.c:2645) Thread<280>: Node id 626 not found in table >> Policy.Domain.Members. (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> issued message >> from: (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x00010001ca7c >> StdPutText (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x00010001d514 >> OutDiagToCons (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x000190bc >> outDiagfExt (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x0001004bf254 >> GetDomainByNodeId (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x0001004beeec >> pmOpenDomain (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x0001006ac78c >> BeginVbTxn (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x0001006a4068 >> SmNodeSession (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x00010053ca64 >> SmSchedSession (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x0001005525d8 >> HandleNodeSession (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x000100549c54 >> DoNodeSched (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x000100544900 >> smExecuteSession (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x000100078a7c >> psSessionThread (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x0001c264 >> StartThread (SESSION: 125) >> 02/13/15 10:00:22 ANRD_3095886799 HandleShortCircuitCodes >> (dbieval.c:1072) Thread<280>: Invalid handle used from tbtbl.c >> (10153). (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> issued message >> from: (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x00010001ca7c >> StdPutText (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x00010001d514 >> OutDiagToCons (SESSION: 125) >> 02/13/15 10:00:22 ANRD Thread<280> 0x000100