Hello, I forgot to mention something very IMPORTANT: I discovered that in *all* of such cases (restored files with larger size), if we don't perform full restore, but restore a SINGLE file, it is restored OK with *correct* size and content. It is OK even if we restore the directory where it is (with the other files in it).
Which proves its is not a problem with the FS, kernel, xen, lvm, hardware, etc, but it is a problem with Bacula. Regards Monday, July 23, 2007, 9:57:40 PM: DS> Hello, DS> I've filed this as a bug, but while Kern couldn't reproduce it he gave DS> up. So let us find here what could be the problem. There are actually DS> two problems, they could be linked. DS> Here is the history: DS> Initially we were using 2.0.3. Running backups for several weeks I DS> wanted to restore a file and was surprised that I can't restore it. It DS> was listed in the catalog, I could select it and run a restore job, DS> but the file didn't come up. Investigating what happened I run a full DS> restore job and was surprised that in that directory (where the file DS> is) several files are missing. Also the error message similar to the DS> one in my first post here were present. In addition to it there was a DS> big difference between marked files and actually restored files (sure DS> not hard links, sockets or anything else that is ignored by Bacula - DS> at one of the tests the whole /home/ directory was missing). DS> After that we startd with tests (backup full/diff/inc, restore etc) DS> for a week. Every time (but at random places/files) similar error DS> happen. Sometimes there are errors, sometimes not. Haven't run so much DS> tests so I could come up with a decision when this happens. But IT DS> HAPPENS and as a result we don't have a reliable backup. I know a lot DS> of people run backups w/o testing restores and that's why (if this is DS> not related to our specific setup) those problem could appear only if DS> they have emergency which actually doesn't happen often. Anyway, here DS> are the hardware and setup details: DS> *** Bacula: 2.1.28 on all servers. >>From yesterday we cleaned everything (bacula DB and volumes) and DS> installed everywhere the latest beta *2.1.28* (note this is not the DS> problem of the beta as we discovered when we had 2.0.3). 2.1.28 fixed DS> 2 other problems we discovered with 2.0.3, but this one is still DS> there. DS> Director and most of the servers are 64 bit, two of the servers are 32 DS> bit. DS> *** OS: Linux CentOS 4.5 DS> *** MySQL: 5.0.37 DS> *** Servers (all are almost identical): Supermicro, PDSME - Intel DS> E7230 (Mukilteo) chipset, Intel Pentium D 930 Dual Core 3.0GHz, 3Ware DS> IDE RAID Controller Escalade 9550SX. Servers have 4 disks each in RAID DS> 1+0, only the Bacula server has many disks in RAID 5. DS> *** Some servers are plain CentOS, some have Xen with virtual servers, DS> the Bacula server itsels also has Xen, but the Bacula is running in DS> Dom0, no other virtual machines at this time are running on it. DS> *** Those servers with Xen als have LVM. DS> *** We run (and I guess here is the problem of Bacula) concurrent DS> jobs. DS> *** GZIP compression is enabled. DS> *** we save volumes on harddisk, their size is set to 4480MB DS> --- How to get an error: DS> As initially we discovered the error after several weeks of backups, DS> We guessed that this could ba caused by us by a wrong setting of DS> Volume Retention or any other Retention time and some files are DS> purged. DS> We started everything from zero again, and after 3 days (it happened DS> that the first was Full, the next Differential and the last DS> Incremental) we performed a test and that error happened again! So we DS> were sure this is not caused by purge of some files accidentally. DS> After that we could get that error even after just a full backup, DS> trying to restore immediately after it is finished. DS> Yesterday we cleaned everything again and compiled (from SRPMs) the DS> latest 2.1.28. DS> We run again full backup (again all concurret jobs) and the errors DS> described here happen when we try to restore files from every job DS> (except one where there are just 150 files). DS> So the problems are two: DS> - sometimes some files are restored with higher size, while the first DS> part of the file matches exactly the original file (not log files or DS> dynamic files) This happens on very rare cases (~one case per 5 jobs) DS> - sometimes not all files are restored, but tens of thousands are DS> missing, an example: DS> Files Expected: 190,718 DS> Files Restored: 166,097 DS> This happens more often (~one case per 2 jobs). DS> Note that once the error happens we can reproduce it on every restore DS> at the same place for the same file and the same number of missing DS> files (i.e. this is not a problem of restore, it is most a problem of DS> volumes). DS> What are our future tests: DS> 1. we will do the same (concurrent jobs) but w.o using GZIP DS> 2. if it happens again we will set max jobs to 1 so every job is run DS> alone. Because when testing AFAIR we didn't get errors when we run DS> just one full backup job. This always happen when we do several at DS> once (but I am not 100% sure, thats why we will test this) DS> 3. if it still happens we will run it with normal kernel (so to exclude DS> the Xen influence) DS> 4. last we will try w/o LVM (which would be harder) DS> Regards DS> P.S. sorry for my English :) DS> Monday, July 23, 2007, 9:03:45 PM: RN>> -----BEGIN PGP SIGNED MESSAGE----- RN>> Hash: SHA1 RN>> Doytchin Spiridonov wrote: >>> Hello, >>> >>> trying to identify a bug in bacula and/or our system setup. >>> >>> Is there anyone that on restore had errors like this: >>> >>> Error: attribs.c:410 File size of restored file >>> /home/bacula/res/b3/usr/src/redhat/RPMS/i686/glibc-2.2.5-44.i686.rpm >>> not correct. Original 3826291, restored 10620921. >>> >>> - the file is not a log file or any file that has changed during the >>> backup (in which cases an error like the one above should be normal) >>> >>> - the wrong file size is always larger that the original; if we cut >>> the first N bytes, where the N is the correct file size, the original >>> and restored files match; we noted that the appended data is part of >>> another file from the backup, not a garbage data. Note that this other >>> file (from which some part has been appended to the file with wrong >>> size) is restored correctly, so the only problem is wrong file size >>> decision by bacula and reading further than its end (seems this is >>> some internal buffer of Bacula as the data is stored in the volumes >>> using GZIP and just reading further would break everything and the >>> appended data should be garbage, not unzipped data). RN>> This has been brought up several times within the last week, but never RN>> with the explanation and examination. I wonder if some of the other who RN>> have experienced it (I do not know their names -- hopefully they can RN>> chime in) can do the same thing for us. This is potentially serious, RN>> seems like, if it is a widespread problem. RN>> I think if the others can verify it, this should also be copied to RN>> Bacula devel. I think I will try a large restore of my own today to see RN>> what happens. RN>> Please give the rest of the details of your setup, however -- you don't RN>> even include the Bacula version, and that is a very basic piece of RN>> information. Operating system (presumably RedHat Linux from the file you RN>> backed up, but who knows), architecture... all would be useful. DS> ------------------------------------------------------------------------- DS> This SF.net email is sponsored by: Splunk Inc. DS> Still grepping through log files to find problems? Stop. DS> Now Search log events and configuration files using AJAX and a browser. DS> Download your FREE copy of Splunk now >> http://get.splunk.com/ DS> _______________________________________________ DS> Bacula-users mailing list DS> Bacula-users@lists.sourceforge.net DS> https://lists.sourceforge.net/lists/listinfo/bacula-users ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users