Based on your description, a couple things are going on. The following message:
>>> Named pipe error >>> connecting to server WaitOnPipe failed. > NpOpen: call failed with >>> return code:121 pipe name //./pipe.jnl'. indicates that another backup session is attempting to connect to the journal daemon while another journal based backup session is in progress. This can happen if multiple backup client processes attempt to perform a journal based backup at the same time, or if the ResourceUtilization option setting is higher than 2 and produces multiple backup sessions. The level of client you are running will only wait about 2 minutes for a connection to the journal daemon to become free and will then timeout. A testflag was implemented in the 5.1.6.2 level fixtest to allow a client to specify a timeout value that the client will wait for a connection to the journal daemon to become free (that is, the currently running jbb session to finish). You might also consider reducing the ResourceUtilization setting to 2 or less. Multi session journal based backup isn't currently supported and is a know requirement for a future release (apar IC36361 is currently opened against this problem). I have also recently discovered a problem in which a valid journal gets invalidated anytime a journal based backup starts but doesn't complete (due to a session drop, client terminated by the user, etc.). The result of this is that the next backup will not be journal based (will be a normal full incremental) and journal based backup won't be available until a full backup completes and re-validates the journal. Apar IC37908 has been opened against this problem and should be fixed in the 5.22 level client. It is reasonable for the journal daemon process to utilize a large amount of memory while processing a large journal query, which involves building a sorted list of objects to send to the client, but the memory should eventually be released when the journal based backup completes. I have notices that very large journal queries and journal based backups can create prolonged delays in the journal daemon, and I am looking at ways of making these queries more efficient, both in terms of memory utilization and in terms of processing time. Hope this helps answer your questions .... Regards, Pete Pete Tanenhaus Tivoli Storage Solutions Software Development email: [EMAIL PROTECTED] tieline: 320.8778, external: 607.754.4213 "Those who refuse to challenge authority are condemned to conform to it" ---------------------- Forwarded by Pete Tanenhaus/San Jose/IBM on 11/07/2003 10:11 AM --------------------------- Please respond to "ADSM: Dist Stor Manager" <[EMAIL PROTECTED]> Sent by: "ADSM: Dist Stor Manager" <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] cc: Subject: Hanging TSM backup invalidates journal? Dear TSMville, Hanging Backup Invalidates TSM Journal o - TSM Client 5.1.5.0, Journaling engine backing up c 12,000,000 files with a daily addition/update of around 80,000. When it works = 40 minutes. When it doesn't = 13 hours... o - TSM Server 5.1.6.2 WinNT A couple of nights ago, a journal backup hung and just kinda stayed around on the TSM server in IdleW without anyone noticing. The next day's backup began and, I'm guessing from hereon, it couldn't get access to the TSM journal, so it reverted to a looooong normal incremental backup. I subsequently spotted this, killed off the two IdleW sessions and kicked off a new backup on the journal client. However, it failed to do a journal backup and started a normal incremental again... Looking in the dsmerror.log, I spy a 'NpOpen: Named pipe error connecting to server WaitOnPipe failed. > NpOpen: call failed with return code:121 pipe name //./pipe.jnl'. I understand that this named pipe is opened up at the initiation of a journal backup as the b/a client attempts to connect to the journal daemon - the return code 121 suggests that the connect failed, and possibly the tsmjbbd.exe process wasn't up and running. I look at task manager, and it is, but consuming a 'healthy' 263,632K of memory. Observing its behaviour, I see it is still doing some work 'I/O Other' in Task Manager's useful extra columns, but nothing in the 'I/O Writes' or 'Reads' section, is this suspect... I'm guessing that the journal became invalidated somewhere down the line during the hung backup, or that the subsequent attempt at a backup failed as maybe the old TSM backup still has a lock on it? The tsmjbbd.exe is still present, and there is nothing from these dates in the jbberror.log. Any ideas what may be going on here? I seem to be able to get around 6 or 7 days of JBB backups before it starts to break and I have to hand-hold it to get it up again... In terms of automatically monitoring this, sticking a Tivoli process monitor to make sure the tsmjbbd.exe process is running is only useful to a point (i.e. it wouldn't have spotted the above), so it looks as though I'm going to have to trawl the stdout of our backup logs to make sure that 'using journal for x$' is present. Any ideas where else I should be looking - perhaps in the (what we've called) jbberror.log for 'Journal will be restarted for FS x'? So, questions are: o - any ideas what might be behind the above? A dead/alive tsmjbbd.exe, and if so, how? o - tsmjbbd.exe - how big should it be in 'healthy' usage? Is 263MB a bit excessive? o - any ideas about the best way to monitor (preferably using Tivoli e.g. ITM, logfile adapters etc) jbb backups? Quite a lot there - sorry! Rgds, David McClelland Global Management Systems Reuters 85 Fleet Street London EC4P 4AJ E-mail [EMAIL PROTECTED] Reuters Messaging [EMAIL PROTECTED] -------------------------------------------------------------- -- Visit our Internet site at http://www.reuters.com Get closer to the financial markets with Reuters Messaging - for more information and to register, visit http://www.reuters.com/messaging Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Reuters Ltd.