I've seen that but generally once the log has hit 80% (of a 13 GB log) and TSM is imposing a 2 millisecond delay in transactions in order to try to keep the log from filling and crashing TSM. I only have seen it a few times a year across various very active environments but one thing that seems to be common with me (and you) is that there is a long running and slow running client session present in conjunction with expiration. Notice your SERVERGRAPH client... about 54K sessions old and only 63 MB sent & 150 received. Not sure how many sessions a day you have but the pattern I've seen is that if a slow running client has been running across a couple of expiration processes, something will get pinned (but not a deadlock condition) and eventually the server will have to be halted to clear the situation. I haven't bothered to call in a problem because I don't have the time nor the environments available to try to make this recreatable on demand.
I personally think that the long running client blocks the commit from expiration and ~if~ it had a long enough time to run it would eventually release its pin of the log but in the systems I work with, we can't exist/wait that long. I'll also point out that when we have caught this in its early stage and run a DB backup... it would complete and have no impact on clearing the log. I've seen this when running both log mode normal and roll forward. (Also if caught early enough, I've been able to cancel the client successfully.) Dwight -----Original Message----- From: ADSM: Dist Stor Manager [mailto:ads...@vm.marist.edu] On Behalf Of Fred Johanson Sent: Monday, November 23, 2009 2:27 PM To: ADSM-L@VM.MARIST.EDU Subject: [ADSM-L] Log pinned A different problem than the one a few weeks gone by. DB backup finished, leaving log usage at 0.1 % used. No sessions running, so I started an expiration process. After 30 minutes, this is what I saw: tsm: GLASSHOUSE>q pr Process Process Description Status Number -------- -------------------- ------------------------------------------------- 873 Backup Storage Pool Primary Pool SERVDISKPOOL, Copy Pool OFFSITEPOOL, Files Backed Up: 43179, Bytes Backed Up: 145,892,524,032, Unreadable Files: 0, Unreadable Bytes: 0. Current Physical File (bytes): 2,147,143,680 Current output volume: C31199. 874 Backup Storage Pool Primary Pool SERVDISKPOOL, Copy Pool OFFSITEPOOL, Files Backed Up: 12475, Bytes Backed Up: 137,712,177,152, Unreadable Files: 0, Unreadable Bytes: 0. Current Physical File (bytes): 1,384,419,328 Current output volume: C32082. 875 Backup Storage Pool Primary Pool SERVDISKPOOL, Copy Pool OFFSITEPOOL, Files Backed Up: 48409, Bytes Backed Up: 160,964,038,656, Unreadable Files: 0, Unreadable Bytes: 0. Current Physical File (bytes): 3,980,283,904 Current output volume: C32083. 878 Expiration Examined 207408 objects, deleting 207376 backup objects, 0 archive objects, 0 DB backup volumes, 0 recovery plan files; 0 errors encountered. tsm: GLASSHOUSE>q sess Sess Comm. Sess Wait Bytes Bytes Sess Platform Client Name Number Method State Time Sent Recvd Type ------- ------ ------ ------ ------- ------- ----- -------- ------------------- 388,208 Tcp/Ip Run 0 S 63.7 M 150 Admin Linux86 SERVERGRAPH 442,970 Tcp/Ip Run 0 S 134.3 K 2.2 K Admin AIX FRED 443,069 Tcp/Ip IdleW 8.9 M 38.4 K 1.2 K Admin AIX AILIAS 445,050 Tcp/Ip IdleW 15 S 4.3 K 405 Node SUN SOL- JON ARIS tsm: GLASSHOUSE>q log Available Assigned Maximum Maximum Page Total Used Pct Max. Space Capacity Extension Reduction Size Usable Pages Util Pct (MB) (MB) (MB) (MB) (bytes) Pages Util --------- -------- --------- --------- ------- --------- --------- ----- ----- 12,788 12,788 0 6,996 4,096 3,273,216 1,482,016 45.3 92.1 I cancelled the expiration and the runaway log stopped running. My thinking is that the expiration is feverishly writing its transaction to the log. Since there are no other sessions accessing the log, there are no interrupts to trigger a commit. So the log keeps filling, triggering and retriggering DB backups. If my thinking is correct, this is a design feature in the pre-DB2 b-tree DB, not easily remedied. Just something to constantly monitor. Anybody have any thoughts on this? Fred Johanson TSM Administrator University of Chicago 773-702-8464