Re: [Bacula-users] Performance options for single large (100TB) server backup?

Steve Costaras Mon, 04 Jul 2011 07:56:21 -0700

Ok, still running into some troubles here following the previoussuggestions. I now have multiple jobs that are being kicked off atthe same time. In order to do this I had to create a Job & Filesetpair for each directory as below. However now that I have both jobsin the queue, they are still being run sequentially, NOT interleaved(only one job is writing to the spool at a time when I set up spoolingto see what was going on)

2011-07-04 09loki-dir JobId 747: Start Backup JobId 747,Job=JOB-loki_.2011-07-04_09.30.00_03

2011-07-04 09loki-dir JobId 747: Using Device "LTO4"

2011-07-04 09loki-dir JobId 748: Start Backup JobId 748,Job=JOB-loki_opt_mldonkey.2011-07-04_09.30.00_042011-07-04 09loki-sd JobId 747: Recycled volume "AA0028" on device"LTO4" (/dev/nst0), all previous data lost.2011-07-04 09loki-dir JobId 747: Max Volume jobs=1 exceeded. MarkingVolume "AA0028" as Used.

2011-07-04 09loki-sd JobId 747: Spooling data ...

2011-07-04 09loki-fd JobId 747: /opt/vmware is a differentfilesystem. Will not descend from / into it.2011-07-04 09loki-fd JobId 747: /opt/smbshare is a differentfilesystem. Will not descend from / into it.2011-07-04 09loki-fd JobId 747: /opt/mldonkey is a differentfilesystem. Will not descend from / into it.

Also the above message of 'Max Volume Jobs=1' is strange as I DO NOThave that coded anywhere in the configs. Is that a hard coded default?

Does anyone have a working example of multiple jobs & filesets going tothe same client at the same time that I can look at?


---
Job {
  Name = "JOB-loki_"
  Accurate = yes
  Client = loki-fd
  FileSet = "FS-loki_"
  Level = Full
  Messages = Standard
  Pool = BackupSetAA
  Priority = 10
  Schedule = "SCHD-loki"
  SpoolData = yes
  Storage = LTO4
  Type = Backup
  Write Bootstrap = "/opt/bacula/var/bacula/working/%c-%i-%t.bsr"
}
Job {
  Name = "JOB-loki_opt_mldonkey"
  Accurate = yes
  Client = loki-fd
  FileSet = "FS-loki_opt_mldonkey"
  Level = Full
  Messages = Standard
  Pool = BackupSetAA
  Priority = 10
  Schedule = "SCHD-loki"
  SpoolData = yes
  Storage = LTO4
  Type = Backup
  Write Bootstrap = "/opt/bacula/var/bacula/working/%c-%i-%t.bsr"
}
FileSet {
  Name = "FS-loki_"
  Ignore FileSet Changes = yes
  Include {
    Options {
      accurate=mcs5
      checkfilechanges=yes
      hardlinks=yes
      noatime=yes
      onefs=yes
      recurse=yes
      signature=MD5
      sparse=yes
      verify=pns5
    }
    File = /
    File = /boot
  }
}
FileSet {
  Name = "FS-loki_opt_mldonkey"
  Ignore FileSet Changes = yes
  Include {
    Options {
      accurate=mcs5
      checkfilechanges=yes
      hardlinks=yes
      noatime=yes
      onefs=yes
      recurse=yes
      signature=MD5
      sparse=yes
      verify=pns5
    }
    File = /opt/mldonkey
  }
}
----------------------------






On 2011-06-28 19:45, Steve Costaras wrote:

Yes, in this case the 'client' is the backup server, as I had a freeslot for the tape drives and due to the size didn't want to carry thisover the network.

If I split this up to separate jobs, say one job per mount point (have~30 mount points at this time) that may work however I may be doingsomething wrong as the jobs are run sequentially not concurrently,though as you mentioned I may be missing some setting in another fileto accomplish that.

improved hashing would help though frankly the biggest item would beto get rid of the 'double backup' once to spool and then to tape(rolling window of spooling or something like that) would be /much/larger.

Right now I try to do full backups every 6 months or when a largeingress happens and a delta change is greater than ~1/5 the total sizeof storage. My goal would be to try and get backups down to say ~7days on a single LTO4 or about 4 days with two LTO4 drives.(similar for a complete restore).

Only issue with really with multiple jobs opposed to multi-streamingin a single job would be the restore process having to restore fromeach file set separately opposed to just having a single index for theentire system and have bacula figure out what jobs/file sets areneeded. Or is there a way to accomplish this that I'm not seeing?



On 2011-06-28 19:04, Bob Hetzel wrote:

Steve,

You should be able to run multiple file daemons on the storage device, but
a better idea might be to run the backups (and restores) off the clients,
as many in parallel as your system can handle.  Look into concurrency.

If you split up the fileset into separate jobs you can have them go to
separate tape drives at the same time (look up concurrent settings in the
manual--you'll have to set that in multiple places).

Years ago I ran into a similar issue... we had a SAN with e-mail stored on
it.  The backup was done off a snapshot of the SAN mounted on the computer
driving the tape drives.  Had this been the most powerful computer in the
room that would have been great but unfortunately it was not up to the task
of both taking data off the SAN and processing the tape drives as fast as
they could go.  I never got around to moving whole scheme back to strictly
client based backups (i.e. from the mail servers directly instead of from
the backup server mounting them) but if I had it would have been better.
The downside to that is that your system then becomes more complex and you
have to make sure you don't back up anything twice as well as make sure you
aren't missing the backup of anything important.

The next version of bacula (in the last week Kern said he'd have a beta in
the next few weeks, so hang on to your hat!) one of the improvements is
supposed to be a more efficient hashing algorithm, to boot.  It sounds like
that will give a substantial increase in performance but that alone
probably will not solve your problem.  I think you're going to have to do a
lot of different configurations and test which ones work best for your
design parameters (i.e. questions like "How long can I go w/o a full
backup" and "How long can I stand a complete disaster recovery restore
taking").

From: "Steve Costaras"<stev...@chaven.com>
Subject: [Bacula-users] Performance options for single large (100TB)
        server  backup?
To:bacula-users@lists.sourceforge.net
Message-ID:<W210986168202161309221804@webmail17>
Content-Type: text/plain; charset="utf-8"

I have been using Bacula for over a year now and it has been providing 
'passable' service though I think since day one I have been streching it to 
it's limits or need a paradigm shift in how I am configuring it.

Basically, I have a single server which has direct atached disk (~128TB / 112 drives) 
and Tape drives (LTO4). It's main function is a centralized file server&  archival 
server. It has several mount points (~20) (ZFS) to break down some structures based on 
file size and intended use basically spawning a new mountpoint for anything>  a 
couple TB or 100,000 files. Some file systems are up to 30TB in size others are only a 
handful of GB. With ~4,000,000 files anywhere from 4KiB up to 32GiB in size.

Data change is about 1-2TiB/month which is not that big of an issue. The 
problem is when I need to do full backups and restores (restores mainly ever 
1-2 years when I have to do forklift replacement of drives). Bottlenecks that I 
see are:

  - File daemon is single threaded so is limiting backup performance. Is there 
was a way to start more than one stream at the same time for a single machine 
backup? Right now I have all the file systems for a single client in the same 
file set.

  - Tied in with above, accurate backups cut into performance even more when 
doing all the md5/sha1 calcs. Spliting this perhaps with above to multiple 
threads would really help.

  - How to stream a single job to multiple tape drives. Couldn't figure this 
out so that only one tape drive is being used.

  - spooling to disk first then to tape is a killer. if multiple streams could 
happen at once this may mitigate this or some type of continous spooling. How 
do others do this?

At this point I'm starting to look at Arkeia&  Netbackup both with provide 
multistreaming and tape drive pooling, but would rather stick or send my $$ to open 
source if I could opposed to closed systems.

I'm at a point where I can't do a 20-30day full backup. And 'virtual fulls' are 
not an answer. There's no way I can tie up tape drives for the hundreds of 
tapes at 2.5 hours per tape assuming zero processing overhead. I have plenty of 
cpu on the system and plenty of disk subsystem speed, just can't seem to get at 
it through bacula.

So what options are available or how are others backing up huge single servers?

-------------- next part --------------
An HTML attachment was scrubbed...

------------------------------

------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2

------------------------------

_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

End of Bacula-users Digest, Vol 62, Issue 21
*******************************


------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2


_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2

_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Re: [Bacula-users] Performance options for single large (100TB) server backup?

Reply via email to