Hi,

27.09.2007 01:17,, Ross Boylan wrote::
> I've been having really slow backups (13 hours) when I backup a large
> mail spool.  I've attached a run report.  There are about 1.4M files
> with a compressed size of 4G.  I get much better throughput (e.g.,
> 2,000KB/s vs 86KB/s for this job!) with other jobs.

2MB/s is still not especially fast for a backup to disk, I think. So 
your storage disk might also be a factor here.

> First, does it sound as if something is wrong?  I suspect the number of
> files is the key thing, and the mail  spool has lots of little files
> (it's used by Cyrus).  Is this just life when you have lots of little
> files?
> 
> Second, how can I figure out what the problem is?  I do have some
> suspicions, but first some basics:
> ------------------------------------------------
> everything is running on the same box
> 3GHz P4 with one SATA drive as the main drive and 4 older drives, one of
> which is the backup target.
> No noticeable CPU load or disk activity during the backup.  I was
> compressing, but that doesn't show up noticeably for CPU use.

How much memory, and how is the memory usage during backups?

> Debian GNU/Linux 2.6.18 with postgresql 8.1, bacula 2.2.13.
> Disk is managed by evms, using LVM.
> The partion being backed up is ext3, and the backup is going to disk (a
> different physical disk, IDE) using Reiser.

That's definitely a good thing.

> I am not using snapshotting because that feature is broken right now
> (nothiing to  do with bacula).  I shut down the cyrus server during the
> backup (desspite some errors in the log around my attempted shutdown, it
> seemed to have worked).
> 
> My suspicion is that the TCP/IP transactions are all getting delayed
> (maybe to batch for sending) in a way that usually isn't noticeable, but
> is noticeable when doing lots of quick exchanges locally.

I don't know anything about issues with TCP delays, and I know Bacula 
installations running smoothly on all sorts of hardware and different 
OSes.

I rather suspect the catalog to be the bottle-neck.

Verifying this might be as easy as running vmstat while the job is 
backed up and seeing if there is lots of iowait happening - this does 
not necessarily show as hard disk activity.

Are your database and the mail spool on the same disk? This might 
explain the slowness you encounter.

In this case, I'd suggest to upgrade to Bacula 2.2.4. For two reasons, 
actually: There is a serious bug that will hit you one day, and which 
is fixed in the current version. Second, the new batch inserts feature 
would gain lots of speed if the database throughput really is the 
bottle neck for you.

>  Not only are
> my bacula components using TCP (I think), but I'm communicating with
> postgres by TCP (I couldn't get authentication working properly with
> unix domain sockets).
> 
> While populating the cyrus server I also encountered very slow
> transaction speeds.  I think the TCP problem was the cause, though I
> don't have definite confirmation.  I ran multiple jobs in parallel to
> populate the cyrus server to get around the slowness of the individual
> parts (I think that at least rules out things like db contention or disk
> contention as culprits in that case).

As I don't know about the TCP delay I can't comment on this...

> Unfortunately, AFAIK the tcp delay is not tuneable on Linux; it is with
> BSD.
> 
> Here are some relevant parts of bacula-dir.conf:
> 
> JobDefs {
>   Name = "CornDefaults"
>   Type = Backup
>   Level = Incremental
>   Client = corn-fd
>   Storage = File2Storage
>   Messages = Standard
>   Pool = Default
>   Full Backup Pool = Full
>   Differential Backup Pool = Differential
>   Incremental Backup Pool = Incremental
>   Priority = 10
>   Write Bootstrap = "/usr/local/var/spool/bacula/%n.bsr"
> }
> 
> ######## Cyrus
> ## really this needs more care: use snapshot, dump db to ascii

As far as I know, it's sufficient to dump cyrus' database. Given that 
dump and a backup of your mail files, a correct cyrus database can be 
easily regenerated. Snapshots would be a good thing, perhaps, but 
you'd still have to explicitly dump the database as there is no 
guarantee that the disk files of the database are always in a 
consistent state.

> Job {
>   Name = Cyrus
>   JobDefs = CornDefaults
>   FileSet = "Cyrus Set"
>   Priority = 20
>   Schedule = "MonthlyCycle"
>   Run Before Job = "super cyrus-prep"
>   # restart cyrus regardless of success or failure
>   RunScript {
>     RunsWhen = After
>     RunsOnFailure = yes
>     Command = "super cyrus-resume"
>   }
> }
> 
> FileSet {
>   Name = "Cyrus Set"
>   Include {
>     Options {
>       WildFile = "*/cyrus.squat" # about 1/4 of space in spool
>       WildDir = "*/junk"
>       WildDir = "*/spam"
>       WildDir = "*/trash"
>       WildFile = "/var/lib/cyrus/proc/*"
>       ignorecase = yes
>       Exclude = yes
>     }
>     Options {
>       Compression = GZIP
>       Signature = SHA1

Compression and SHA1 both need a bit of CPU; If the CPU load does not 
increase during backup, you have the bottle neck somewhere else. The 
network stack, in my experience, is less critical than the catalog 
database.

>     }
>     # these are on separate volumes
>     File = /var/spool/cyrus
>     File = /var/lib/cyrus
> 
>     File = /etc/cyrus.conf
>     File = /etc/imapd.conf
>   }
> }
> 
> 
> 
> 
> ------------------------------------------------------------------------
> 
> Betreff:

A German locale for your mail client? I'm astonished :-)

> Bacula: Backup OK of corn-fd Full
> Von:
> (Bacula) [EMAIL PROTECTED]
> Datum:
> Sun, 02 Sep 2007 14:58:46 -0700
> An:
> [EMAIL PROTECTED]
> 
> An:
> [EMAIL PROTECTED]
> 
> 
> 02-Sep 01:44 corn-dir: BeforeJob: run command "super cyrus-prep"
> 02-Sep 01:44 corn-dir: BeforeJob: Stopping Cyrus IMAPd: cyrmaster.
> 02-Sep 01:44 corn-dir: BeforeJob: fatal error: must run as the Cyrus user
> 02-Sep 01:44 corn-dir: Start Backup JobId 118, Job=Cyrus.2007-09-02_01.05.02
> 02-Sep 01:44 corn-sd: Volume "Full2007-09-02_1065" previously written, moving 
> to end of data.
> 02-Sep 02:44 corn-sd: User defined maximum volume capacity 681,574,400 
> exceeded on device "File2Storage" (/usr/local/var/spool/bacula).
> 02-Sep 02:47 corn-sd: End of medium on Volume "Full2007-09-02_1065" 
> Bytes=681,557,381 Blocks=10,565 at 02-Sep-2007 02:47.
> 02-Sep 02:47 corn-dir: Created new Volume "Full2007-09-02_1066" in catalog.
...
> 02-Sep 14:30 corn-sd: New volume "Full2007-09-02_1071" mounted on device 
> "File2Storage" (/usr/local/var/spool/bacula) at 02-Sep-2007 14:30.
> 02-Sep 14:56 corn-fd: ClientAfterJob: run command "super cyrus-resume"
> 02-Sep 14:56 corn-fd: ClientAfterJob: Starting Cyrus IMAPd: cyrmaster.
> 02-Sep 14:56 corn-sd: Job write elapsed time = 13:11:44, Transfer rate = 
> 90.52 K bytes/second
> 02-Sep 14:58 corn-dir: Bacula 2.0.3 (06Mar07): 02-Sep-2007 14:58:46
>   JobId:                  118
>   Job:                    Cyrus.2007-09-02_01.05.02
>   Backup Level:           Full
>   Client:                 "corn-fd" 2.0.3 (06Mar07) 
> i486-pc-linux-gnu,debian,4.0
>   FileSet:                "Cyrus Set" 2007-08-10 11:53:54
>   Pool:                   "Full" (From Job FullPool override)
>   Storage:                "File2Storage" (From Job resource)
>   Scheduled time:         02-Sep-2007 01:05:01
>   Start time:             02-Sep-2007 01:44:28
>   End time:               02-Sep-2007 14:58:46
>   Elapsed time:           13 hours 14 mins 18 secs
>   Priority:               20
>   FD Files Written:       1,393,569
>   SD Files Written:       1,393,569
>   FD Bytes Written:       4,101,078,152 (4.101 GB)
>   SD Bytes Written:       4,300,355,440 (4.300 GB)
>   Rate:                   86.1 KB/s
>   Software Compression:   64.8 %
>   VSS:                    no
>   Encryption:             no
>   Volume name(s):         
> Full2007-09-02_1065|Full2007-09-02_1066|Full2007-09-02_1067|Full2007-09-02_1068|Full2007-09-02_1069|Full2007-09-02_1070|Full2007-09-02_1071
>   Volume Session Id:      3
>   Volume Session Time:    1188652643
>   Last Volume Bytes:      306,193,035 (306.1 MB)
>   Non-fatal FD errors:    0
>   SD Errors:              0
>   FD termination status:  OK
>   SD termination status:  OK
>   Termination:            Backup OK
> 

I'm really unsure about TCP problems, but the situation more or less 
looks like the catalog backend would be your problem. Could you try to 
have the catalog db on another machine?

Arno

-- 
Arno Lehmann
IT-Service Lehmann
www.its-lehmann.de

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to