Pete, Thanks for your responses...
>> The error you are seeing in the journal daemon is probably caused because the journal db has exceeded the supported >> maximum of 2 gig. I was watching the journal files, and the big one never went above 1.6GB... This was during the initial full backup. The jbberror.log entries accompanying the termination of the service go something like these: 09/23/2003 20:06:37 jnlDbCntrl(): Error updating the journal for fs 'G:', dbUpdEntry() rc = -1, last error = 27 09/23/2003 20:06:38 JbbMonitorThread(): DB Access thread, tid 3100 ended with return code 215. 09/23/2003 20:07:39 NpOpen: Named pipe error connecting to server WaitOnPipe failed. NpOpen: call failed with return code:121 pipe name \\.\pipe\jnl 09/23/2003 20:07:39 NpListeningThreadCleanUp(): NpOpen(): Error -190 Still looks as though I'm seeing your error as described below though... >> That having been said, the real problem to look at is why the journal grew so large. Agreed! Although as I say, it didn't seem to go above the 2GB limit. >> Keep in mind that each journal entry represents a the most recent change for a file/directory, and that journal >> entries are unique, meaning the there can only be one entry for each object on the file system. Okay, well, this was the first full backup of a 9 million file filesystem, so would this cause a big journal file? If so, does it follow that in practice, we're best to do a normal 'unjournalled' initial backup of a filesystem so that we get all of the initial hit out of the way (an don't come a cropper (is that only an English term?) with a large journal file), and *then* do another incremental with journaling enabled so we get the journaling engine initialised? >> Are you running virus scan software and if so what type and version ? >> (example: Norton Anti-Virus Corporate Edition Version 8.00) >> Some virus protection software touches every file processed during virus scan processing, >> and this in turn floods the journal with change notifications and grows the journal. Okay, I'm running Sophos Antivirus 3.69. My include exclude list means I'm only backing up one filepath on a drive (e.g. g:\file_data\...\*), but I guess that the journal engine records all changes, regardless of include/exclude list specification. I'd be very interested in having a look at the journal proofing utility - please feel free to point me at it/mail it off-list if necessary. Pete - thanks for all your help so far... Rgds, David McClelland Management Systems Integrator Global Management Systems Reuters 85 Fleet Street London EC4P 4AJ E-mail - [EMAIL PROTECTED] Reuters Messaging - [EMAIL PROTECTED] -----Original Message----- From: Pete Tanenhaus [mailto:[EMAIL PROTECTED] Sent: 24 September 2003 14:26 To: [EMAIL PROTECTED] Subject: I'll try to answer/address your questions as best I can. >>> My TSM client is a file server, on its first full incremental backup >>> (with journaling turned on) stowed away nearly 9 million files on >>> the TSM server - a perfect candidate for the TSM journaling engine I >>> thought. However, the tsmjbbd.exe process bombed just before the >>> end>> >>> with a 'DB Access Critical Thread Return code 215' type error, although >>> the backup continued. The error you are seeing in the journal daemon is probably caused because the journal db has exceeded the supported maximum of 2 gig. If you look in your journal errorlog (jbberror.log) you'll probably see the following message: Error updating the journal for fs C:', dbUpdEntry() rc = 27 There is a bug the journal service which causes the process to shutdown when this error occurs and apar IC37040 has been opened and the fix will be included in an upcoming fixtest. That having been said, the real problem to look at is why the journal grew so large. Keep in mind that each journal entry represents a the most recent change for a file/directory, and that journal entries are unique, meaning the there can only be one entry for each object on the file system. Are you running virus scan software and if so what type and version ? (example: Norton Anti-Virus Corporate Edition Version 8.00) Some virus protection software touches every file processed during virus scan processing, and this in turn floods the journal with change notifications and grows the journal. There are circumventions from at least one of the virus protection vendors (Symantec) for this problem. >>>Now, 9 million files, at an average of maybe 500K per TSM database entry >>>equals roughly 4.5GB. Was TSM trying to send the *whole* 4.5GB inventory >>>for this node to the dsmc.exe process on the client? Needless to say, at >>>2GB (I believe the limit that Win2K places on a single process) the >>>TSM client had had enough and ended with an 'ANS1030E System ran out >>>of memory. Process ended'. >>> So, what shall I do - is MEMORYEFFICIENTBACKUP YES my only get out >>>of jail card here, and exactly what does this do differently? Is my >>>understanding above what is actually happening? Keep in mind that a full progressive incremental backup must be done (one that results in the Last Backup Complete Data being updated on the server) before backups will be journal based. Once the intial backup has been completed and the journal is validated the next backup should be journal based. So you may want to use MEMORYEFFICIENTBACKUP for the initial backup at least. Journal Based Backup should use much less memory since the only objects inspected are those obtained from the journal. >>> Now, 9 million files, at an average of maybe 500K per TSM database entry >>> equals roughly 4.5GB. Was TSM trying to send the *whole* 4.5GB inventory >>> for this node to the dsmc.exe process on the client? Needless to >>> say, at >>> 2GB (I believe the limit that Win2K places on a single process) the TSM >>> client had had enough and ended with an 'ANS1030E System ran out of >>> memory. Process ended'. >>> So, what shall I do - is MEMORYEFFICIENTBACKUP YES my only get out >>> of jail card here, and exactly what does this do differently? Is my >>> understanding above what is actually happening? >>> I'd be most grateful to hear of anyone else's positive or negative >>> experiences of using the Journaling Engine, as it seems just so *ideal* >>> for some of our file servers, yet my experiences so far suggest it might >>> not be as easy and robust as I would ideally like it to be (i.e. >>> cancelled backups forcing restart of journal, process bombing out midway >>> through backup etc.), especially as a full or normal incremental backup >>> can run into days to complete.. Aborting a backup doesn't cause the journal process to be restarted or the journal to be invalidated, but certain other circumstances can. The most likely cause of this is when the file system is flooded with a large amount of change activity which either fills up the journal or can't be processed fast enough by the journal file system monitor. The process should never shutdown when these problems occur (again, there is an apar opened against it shutting down when the journal grows larger than 2 gig), but the journal has to be invalidated which means that backup's can't be journal based until other full incremental is performed. Another thing to keep in mind is that journals are always invalidated when the journal daemon process is recycled unless the PreserveDbOnExit flag is specified. All this having been said, Journal Based Backup is only a viable solution for environments in which the amount of file system activity is light to moderate, and that the activty is somewhat well distributed. Running applications which touch every file (or a very large percentage of files) on the file system, or which flood the file system with changes in a very short period of time (such as copying a very large directory tree) will make journaling unusable. I have developed a file system monitoring/profiling tools which can useful in determining if journaling is viable for a particular file system, and I am more than willing to provide it to anyone who is interested. Hope this helps ... Pete Tanenhaus Tivoli Storage Solutions Software Development email: [EMAIL PROTECTED] tieline: 320.8778, external: 607.754.4213 "Those who refuse to challenge authority are condemned to conform to it" -------------------------------------------------------------- -- Visit our Internet site at http://www.reuters.com Get closer to the financial markets with Reuters Messaging - for more information and to register, visit http://www.reuters.com/messaging Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Reuters Ltd.