Response to questions: >>> Okay, well, this was the first full backup of a 9 million file >>>> filesystem, so would this cause a big journal file?
No. Journal entries are generated as the result of changes to the file system, doing a backup (journal based or not) never increases the size of the journal (if anything it decreases it as I will explain). The journal daemon monitors file systems specified in the jbb config file for changes. If a valid journal exists the journal is updated to reflect the changes, if it isn't valid the changes are discarded. When a backup begins the b/a client connects to the journal daemon (assuming it is running). Once a connection is established it queries the state of all file systems being journaled. If a valid journal exists for a file system being backed up the backup will be journal based, if one doesn't exist the backup will be normal incremental. Journals are always in the invalid state when they come online (when the journal daemon starts) unless a previously valid journal was brought offline with the PreserveDbOnExit configuration setting. Once a full incremental backup completes (and this backup MUST result in the Last Backup Date on the server being updated) the b/a client notifies the journal daemon and the journal is marked as valid for the node and server the backup was done with. A journal is considered valid for a particular file system, a particular node, and a particular TSM server. Subsequent backups by the same node to the same server will be journal based provided that the journal remains valid (backups by a different node and/or to a different server won't invalidate the journal, but the backup won't be journal based). When objects are processed (backed up or expired) during a journal based backup, the b/a client notifies the journal daemon to remove the journal entries which have been processed. Note that removing entries from the journal doesn't shrink the amount of disk space the journal database uses, it just marks the space occupied by the entry as free. >>> Okay, I'm running Sophos Antivirus 3.69. My include exclude list means >>> I'm only backing up one filepath on a drive (e.g. g:\file_data\...\*), >>> but I guess that the journal engine records all changes, regardless of >>> include/exclude list specification. I'm not familiar with this product, but you can quickly verify if it is touching files using the filemon utility I am sending you. The easiest method of doing this is having filemon monitor an individual directory with a small number of files, and then doing a virus scan on that directory (I assume your virus scan software allows individual files/dirs to be scanned). This will indicate immediately if the virus scan is touching files and if so you will need to work with the virus protection software vendor to get it fixed (I will do what I can to help). >>> I'd be very interested in having a look at the journal proofing utility >>> - please feel free to point me at it/mail it off-list if necessary. Apparently the list doesn't allow very big attachments so I will send to you directly. Hope this helps and please post or email me directly if you have further questions. Pete Tanenhaus Tivoli Storage Solutions Software Development email: [EMAIL PROTECTED] tieline: 320.8778, external: 607.754.4213 "Those who refuse to challenge authority are condemned to conform to it" ---------------------- Forwarded by Pete Tanenhaus/San Jose/IBM on 09/24/2003 11:43 AM --------------------------- Please respond to "ADSM: Dist Stor Manager" <[EMAIL PROTECTED]> Sent by: "ADSM: Dist Stor Manager" <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] cc: Subject: Re: More TSM journaling stuff Pete, Thanks for your responses... >> The error you are seeing in the journal daemon is probably caused because the journal db has exceeded the supported >> maximum of 2 gig. I was watching the journal files, and the big one never went above 1.6GB... This was during the initial full backup. The jbberror.log entries accompanying the termination of the service go something like these: 09/23/2003 20:06:37 jnlDbCntrl(): Error updating the journal for fs 'G:', dbUpdEntry() rc = -1, last error = 27 09/23/2003 20:06:38 JbbMonitorThread(): DB Access thread, tid 3100 ended with return code 215. 09/23/2003 20:07:39 NpOpen: Named pipe error connecting to server WaitOnPipe failed. NpOpen: call failed with return code:121 pipe name \\.\pipe\jnl 09/23/2003 20:07:39 NpListeningThreadCleanUp(): NpOpen(): Error -190 Still looks as though I'm seeing your error as described below though... >> That having been said, the real problem to look at is why the journal grew so large. Agreed! Although as I say, it didn't seem to go above the 2GB limit. >> Keep in mind that each journal entry represents a the most recent change for a file/directory, and that journal >> entries are unique, meaning the there can only be one entry for each object on the file system. Okay, well, this was the first full backup of a 9 million file filesystem, so would this cause a big journal file? If so, does it follow that in practice, we're best to do a normal 'unjournalled' initial backup of a filesystem so that we get all of the initial hit out of the way (an don't come a cropper (is that only an English term?) with a large journal file), and *then* do another incremental with journaling enabled so we get the journaling engine initialised? >> Are you running virus scan software and if so what type and version ? >> (example: Norton Anti-Virus Corporate Edition Version 8.00) >> Some virus protection software touches every file processed during virus scan processing, >> and this in turn floods the journal with change notifications and grows the journal. Okay, I'm running Sophos Antivirus 3.69. My include exclude list means I'm only backing up one filepath on a drive (e.g. g:\file_data\...\*), but I guess that the journal engine records all changes, regardless of include/exclude list specification. I'd be very interested in having a look at the journal proofing utility - please feel free to point me at it/mail it off-list if necessary. Pete - thanks for all your help so far... Rgds, David McClelland Management Systems Integrator Global Management Systems Reuters 85 Fleet Street London EC4P 4AJ E-mail - [EMAIL PROTECTED] Reuters Messaging - [EMAIL PROTECTED] -----Original Message----- From: Pete Tanenhaus [mailto:[EMAIL PROTECTED] Sent: 24 September 2003 14:26 To: [EMAIL PROTECTED] Subject: I'll try to answer/address your questions as best I can. >>> My TSM client is a file server, on its first full incremental backup >>> (with journaling turned on) stowed away nearly 9 million files on >>> the TSM server - a perfect candidate for the TSM journaling engine I >>> thought. However, the tsmjbbd.exe process bombed just before the >>> end>> >>> with a 'DB Access Critical Thread Return code 215' type error, although >>> the backup continued. The error you are seeing in the journal daemon is probably caused because the journal db has exceeded the supported maximum of 2 gig. If you look in your journal errorlog (jbberror.log) you'll probably see the following message: Error updating the journal for fs C:', dbUpdEntry() rc = 27 There is a bug the journal service which causes the process to shutdown when this error occurs and apar IC37040 has been opened and the fix will be included in an upcoming fixtest. That having been said, the real problem to look at is why the journal grew so large. Keep in mind that each journal entry represents a the most recent change for a file/directory, and that journal entries are unique, meaning the there can only be one entry for each object on the file system. Are you running virus scan software and if so what type and version ? (example: Norton Anti-Virus Corporate Edition Version 8.00) Some virus protection software touches every file processed during virus scan processing, and this in turn floods the journal with change notifications and grows the journal. There are circumventions from at least one of the virus protection vendors (Symantec) for this problem. >>>Now, 9 million files, at an average of maybe 500K per TSM database entry >>>equals roughly 4.5GB. Was TSM trying to send the *whole* 4.5GB inventory >>>for this node to the dsmc.exe process on the client? Needless to say, at >>>2GB (I believe the limit that Win2K places on a single process) the >>>TSM client had had enough and ended with an 'ANS1030E System ran out >>>of memory. Process ended'. >>> So, what shall I do - is MEMORYEFFICIENTBACKUP YES my only get out >>>of jail card here, and exactly what does this do differently? Is my >>>understanding above what is actually happening? Keep in mind that a full progressive incremental backup must be done (one that results in the Last Backup Complete Data being updated on the server) before backups will be journal based. Once the intial backup has been completed and the journal is validated the next backup should be journal based. So you may want to use MEMORYEFFICIENTBACKUP for the initial backup at least. Journal Based Backup should use much less memory since the only objects inspected are those obtained from the journal. >>> Now, 9 million files, at an average of maybe 500K per TSM database entry >>> equals roughly 4.5GB. Was TSM trying to send the *whole* 4.5GB inventory >>> for this node to the dsmc.exe process on the client? Needless to >>> say, at >>> 2GB (I believe the limit that Win2K places on a single process) the TSM >>> client had had enough and ended with an 'ANS1030E System ran out of >>> memory. Process ended'. >>> So, what shall I do - is MEMORYEFFICIENTBACKUP YES my only get out >>> of jail card here, and exactly what does this do differently? Is my >>> understanding above what is actually happening? >>> I'd be most grateful to hear of anyone else's positive or negative >>> experiences of using the Journaling Engine, as it seems just so *ideal* >>> for some of our file servers, yet my experiences so far suggest it might >>> not be as easy and robust as I would ideally like it to be (i.e. >>> cancelled backups forcing restart of journal, process bombing out midway >>> through backup etc.), especially as a full or normal incremental backup >>> can run into days to complete.. Aborting a backup doesn't cause the journal process to be restarted or the journal to be invalidated, but certain other circumstances can. The most likely cause of this is when the file system is flooded with a large amount of change activity which either fills up the journal or can't be processed fast enough by the journal file system monitor. The process should never shutdown when these problems occur (again, there is an apar opened against it shutting down when the journal grows larger than 2 gig), but the journal has to be invalidated which means that backup's can't be journal based until other full incremental is performed. Another thing to keep in mind is that journals are always invalidated when the journal daemon process is recycled unless the PreserveDbOnExit flag is specified. All this having been said, Journal Based Backup is only a viable solution for environments in which the amount of file system activity is light to moderate, and that the activty is somewhat well distributed. Running applications which touch every file (or a very large percentage of files) on the file system, or which flood the file system with changes in a very short period of time (such as copying a very large directory tree) will make journaling unusable. I have developed a file system monitoring/profiling tools which can useful in determining if journaling is viable for a particular file system, and I am more than willing to provide it to anyone who is interested. Hope this helps ... Pete Tanenhaus Tivoli Storage Solutions Software Development email: [EMAIL PROTECTED] tieline: 320.8778, external: 607.754.4213 "Those who refuse to challenge authority are condemned to conform to it" -------------------------------------------------------------- -- Visit our Internet site at http://www.reuters.com Get closer to the financial markets with Reuters Messaging - for more information and to register, visit http://www.reuters.com/messaging Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Reuters Ltd.