Hanging TSM backup invalidates journal?

Pete Tanenhaus Fri, 07 Nov 2003 08:43:04 -0800

Based on your description, a couple things are going on.

The following message:


>>> Named pipe error
>>> connecting to server WaitOnPipe failed. > NpOpen: call failed with
>>> return code:121 pipe name //./pipe.jnl'.

indicates that another backup session is attempting to connect to the
journal daemon while another
journal based backup session is in progress.

This can happen if multiple backup client processes attempt to perform a
journal based backup
at the same time, or if the ResourceUtilization option setting is higher
than 2 and produces
multiple backup sessions.

The level of client you are running will only wait about 2 minutes for a
connection to the
journal daemon to become free and will then timeout.

A testflag was implemented in the 5.1.6.2 level fixtest to allow a client
to specify a timeout
value that the client will wait for a connection to the journal daemon to
become free (that
is, the currently running jbb session to finish).

You might also consider reducing the ResourceUtilization setting to 2 or
less.

Multi session journal based backup isn't currently supported and is a know
requirement for a future
release (apar IC36361 is currently opened against this problem).

I have also recently discovered a problem in which a valid journal gets
invalidated
anytime a journal based backup starts but doesn't complete (due to a
session drop,
client terminated by the user, etc.).

The result of this is that the next backup will not be journal based (will
be a normal full incremental)
and journal based backup won't be available until a full backup completes
and
re-validates the journal.

Apar IC37908 has been opened against this problem and should be fixed in
the 5.22
level client.

It is reasonable for the journal daemon process to utilize a large amount
of memory
while processing a large journal query, which involves building a sorted
list of objects
to send to the client, but the memory should eventually be released when
the journal
based backup completes.

I have notices that very large journal queries and journal based backups
can create
prolonged delays in the journal daemon, and I am looking at ways of making
these
queries more efficient, both in terms of memory utilization and in terms
of processing
time.

Hope this helps answer your questions ....

Regards, Pete


Pete Tanenhaus
Tivoli Storage Solutions Software Development
email: [EMAIL PROTECTED]
tieline: 320.8778, external: 607.754.4213

"Those who refuse to challenge authority are condemned to conform to it"

---------------------- Forwarded by Pete Tanenhaus/San Jose/IBM on 11/07/2003 10:11 AM 
---------------------------
Please respond to "ADSM: Dist Stor Manager" <[EMAIL PROTECTED]>
Sent by:        "ADSM: Dist Stor Manager" <[EMAIL PROTECTED]>
To:     [EMAIL PROTECTED]
cc:
Subject:        Hanging TSM backup invalidates journal?



Dear TSMville,

Hanging Backup Invalidates TSM Journal

o - TSM Client 5.1.5.0, Journaling engine backing up c 12,000,000 files
with a daily addition/update of around 80,000. When it works = 40
minutes. When it doesn't = 13 hours...
o - TSM Server 5.1.6.2 WinNT

A couple of nights ago, a journal backup hung and just kinda stayed
around on the TSM server in IdleW without anyone noticing. The next
day's backup began and, I'm guessing from hereon, it couldn't get access
to the TSM journal, so it reverted to a looooong normal incremental
backup. I subsequently spotted this, killed off the two IdleW sessions
and kicked off a new backup on the journal client. However, it failed to
do a journal backup and started a normal incremental again...

Looking in the dsmerror.log, I spy a 'NpOpen: Named pipe error
connecting to server WaitOnPipe failed. > NpOpen: call failed with
return code:121 pipe name //./pipe.jnl'.

I understand that this named pipe is opened up at the initiation of a
journal backup as the b/a client attempts to connect to the journal
daemon - the return code 121 suggests that the connect failed, and
possibly the tsmjbbd.exe process wasn't up and running. I look at task
manager, and it is, but consuming a 'healthy' 263,632K of memory.
Observing its behaviour, I see it is still doing some work 'I/O Other'
in Task Manager's useful extra columns, but nothing in the 'I/O Writes'
or 'Reads' section, is this suspect...

I'm guessing that the journal became invalidated somewhere down the line
during the hung backup, or that the subsequent attempt at a backup
failed as maybe the old TSM backup still has a lock on it? The
tsmjbbd.exe is still present, and there is nothing from these dates in
the jbberror.log.

Any ideas what may be going on here? I seem to be able to get around 6
or 7 days of JBB backups before it starts to break and I have to
hand-hold it to get it up again... In terms of automatically monitoring
this, sticking a Tivoli process monitor to make sure the tsmjbbd.exe
process is running is only useful to a point (i.e. it wouldn't have
spotted the above), so it looks as though I'm going to have to trawl the
stdout of our backup logs to make sure that 'using journal for x$' is
present. Any ideas where else I should be looking - perhaps in the (what
we've called) jbberror.log for 'Journal will be restarted for FS x'?

So, questions are:

o - any ideas what might be behind the above? A dead/alive tsmjbbd.exe,
and if so, how?
o - tsmjbbd.exe - how big should it be in 'healthy' usage? Is 263MB a
bit excessive?
o - any ideas about the best way to monitor (preferably using Tivoli
e.g. ITM, logfile adapters etc) jbb backups?

Quite a lot there - sorry!

Rgds,

David McClelland
Global Management Systems
Reuters
85 Fleet Street
London EC4P 4AJ
E-mail  [EMAIL PROTECTED]
Reuters Messaging       [EMAIL PROTECTED]




-------------------------------------------------------------- --
Visit our Internet site at http://www.reuters.com

Get closer to the financial markets with Reuters Messaging - for more
information and to register, visit http://www.reuters.com/messaging

Any views expressed in this message are those of  the  individual
sender,  except  where  the sender specifically states them to be
the views of Reuters Ltd.

Hanging TSM backup invalidates journal?

Reply via email to