Re: [Bacula-users] Some Operational Questions: Backing up lots of stuff

K. M. Peterson Thu, 20 Aug 2009 06:24:26 -0700

Hi Arno,

Thanks for your response.  I'll try to trim things down a bit so as not to
clog those Internet pipes...

On Wed, Aug 19, 2009 at 15:35, Arno Lehmann <a...@its-lehmann.de> wrote:

> Hi,
>
> 14.08.2009 23:16, K. M. Peterson wrote:
> > Hi everyone,
> >
> > ...
>
> > We have a Network Applicance filer (previously produced under their
> > "StoreVault" division), with both CIFS and NFS exports.  Backing up NFS
> > mounts on our backup server is kind of slow - 3-5MB/sec.
>
> Hmm... this seems *really* slow... does your file set contain many
> small files?

Well, yes, probably.  Like many environments, we don't really know, but much
of our work does revolve around data collection, and software engineers
aren't always interested in understanding how 2 million files in a
filesystem aren't conducive to operational efficiencies.  That's just life,
unfortunately.

>
>
> >  Not having the
> > hardware to put in to mount and back up the CIFS shares,
>
> Any linux/unix machine should be good for that - you just need samba
> installed and I don't see why the shares shouldn't be available
> CIFS-mounted. In other words, your backup server itself would probably
> be able to mount the CIFS shares. Can you tell us what's the problem
> at your site?

Yes.  I am, perhaps simply out of ignorance, concerned about whether Bacula
accessing CIFS shares captures all available filesystem metadata
(permissions/ownership, and all of the other myriad NTFS bits).  We are
primarily concerned with data protection in the event of large-scale (hate
to use the term "catastrophic") failure.  We have very few requests for
recovery, so the process in place seems most efficient should we have to
bring an entire filesystem back.

But if this turns out to be a moot point, I'd happily run some tests to see
how efficient we can make the process.

>
>
> ...
>
> > However, backing up the whole thing as one backup job is problematic.
> > It takes a long time, it's opaque, and it's the 600 lb (272kg) gorilla
> > in the backup workflow.  And a restore is going to be even more painful
> > from a single backup job of the root of the device.
>
> One of the big problems with dump-based backups, IMO... I'm sure
> others disagree here.

Well, again, I'm not sure it's not going to be easier to recover an entire
filesystem from a dump file... and I would rather reap speed benefit on the
backup end and pay the penalty on the recover side.

>
>
> > I should point out that I have scripts currently to run through a list
> > of CIFS shares, set up the rsh jobs and pipes, and generate a report of
> > what got backed up and when and how.  It's still one job, though, even
> > though each share is a separate "file" in Bacula.  It's a problem
> > because these jobs create snapshots when they are submitted, and so
> > there are snapshots sitting around for the entirety of the job, and I'm
> > never sure whether they are going to be cleaned up properly if the job
> > gets canceled.  And if it does get canceled, I have to re-run everything
> > again.  Painful.
> >
> > This isn't the real question, though I'd love it if someone has
> > something I haven't thought of.
>
> Well, just two suggestions: Use FIFOs to pass data from the client to
> Bacula, and initiate the snapshot only before you start reading the
> actual volume. Thus you save disk space and don't have the snapsots
> wasting space.

Sorry, I should have been clearer.  That's what I'm doing.  The issue is
that it's easy to initiate a snapshot if each filesystem is a separate job -
but my attempts to script something to detect when Bacula actually wants to
start reading a fifo and then initiate the snapshot and data stream have
come to naught, since as far as /proc and anything else I can "see" is
concerned, the fifo isn't "open" until both sides of the pipe are set up: I
can't figure out how to detect when Bacula opens one side if the other isn't
open (and has data queued).  Again, looks like the best way is to generate
jobs of a filesystem by filesystem basis, but...

>
>
> >  The real question is a more general
> > one: I need to figure out a way to dynamically create jobs.  I really
> > want one job per filesystem - but what's the consensus of the best way
> > to do this?  Should I just write job definitions and fileset definitions
> > to a file that's included in my director's config, then issue a reload?
>
> Yup, that's todays way to do things. There is no API to create jobs
> dynamically.
>

I was hoping that there was a way to do this with a command in bconsole or
something...

>
> > Is there an API that I've missed?  Is there something in 3.x that is
> > going to make this better?
>
> A first step towards your goal is developed now - an API to create
> file sets by browsing a client. In the end, I would be surprised if
> writing that file set to the configuration and dynamic job creation
> was not added to that.
>
> >  I want something that is as transparent as
> > possible, and that can be set up so that when a new share/export gets
> > created on the thing the backups get updated.  I can run something in
> > cron, or RunBeforeJob, but it just seems wrong.   (By the way, it would
> > be cool to have a plugin that would take the output from 'tar' or
> > 'dump', and feed it to Bacula as if it were coming from an FD so Bacula
> > would store the files and metadata... but I digress.)
>
> Well, your digression is not too far off, I think. You can do the
> above - not with dump, which isn't portable enough for that purpose,
> but with tar:
> - Untar to a temporary directory
> - Back up that directory
> - skip the leading paths, so you wouldn't capture
> /tmp/bactmp/clientXY/etc/services but only /etc/services
> - Modify the catalog information so that this backup looks like it was
> done on the proper client and at the time the original tar was created.
>

Oh boy, that's a bit much even for me.  Sorry :-)

I say "dump" rather than "tar" because the NetApp backup application is
based on [Solaris] dump.  I guess what I was asking was whether it might be
feasible to have a pipe to take a tar/dump datastream as input and output a
bacula-fd datastream out (or two datastreams - one for filedata and the
other for attribute data).

>
> For the original problem, I would probably simply handle it as a
> management issue: Just instruct your admins to tell the backup admion
> to add a job if they create new volumes.
>
> If you really do this very often, a shell script which creates the
> necessary job resource with a default fileset and schedule and adds
> some run script to create a snaphot, mount that, and destroy it after
> the backup would be worth the effort.
>
> If you can get a list of the shares, a script which synconizes this
> list with the jobs in Bacula wouldn't be too hard to create, I think.
>

You're right, and that seems the best way to go.  I like, as we say here, to
have a belt *and* suspenders.  Again, the issue was whether there was a more
reasonable way to create jobs dynamically.  Given there really isn't right
now, I'll go with the writing-config-files-and-reload process.

>
> > I know I can dynamically define a fileset.  But, again, what I need is a
> > more granular way to break down a large /job/.  I can figure out how to
> > kludge it - and I've shown the current NetApp backup system to a few
> > people who've considered I should get some therapy - but I'm at the
> > point where I think I need to ask for directions.
>
> Without going much more into detail I really can't suggest a solution,
> I fear. The problem sounds interesting, though. I'm sure that I would
> like to work on it :-) (And you can get my work through Bacula
> Systems, too...)

I'd love to... but I don't think I necessarily have the budget right now....

>
> > So, the question here is: is there a better way to plan for a likely
> > inability to back up a large-ish filesystem in one job without resorting
> > to having to enumerate all of the n level directories and break the task
> > up into multiple jobs?
>
> Not now... but, again, the solution is, as far as I know, in the
> developers' queue: continue stopped jobs.
>

Let me ask a question then: if I have a job that dynamically gets a list of
files for its fileset, that particular job is "snapshot" when it's
submitted, isn't it?  Does it make sense to consider a script to have a
script to submit the same job, but each time with a different (dynamically
generated) list of files in its fileset? So I might have 20 jobs queued with
the same "name" but otherwise differing only in the files part of the
fileset (this clearly would only work in this fashion for full backups, but
that's kind of the problem right now...)?  Or is there something rather
patently wrong with that...

Again, I feel the need to work within the architecture of Bacula as much as
possible.  Don't like being way out there at the edge of the caravan,
especially as it's always moving (as with all technology projects).

>
> I found OpenVPN to be a good solution - often, a broken network
> connection resulted in OpenVPN re-creating its tunnel transparently to
> the applications that used that tunnel. In other words, you only see a
> moment with a really slow connection and some dropped packets, but TCP
> handles that quite well.
>
>
Ah, another area of our architecture that I am not in a position to change
in order to make the backups work... but, it does occur to me that perhaps I
could look at some timeout values to preserve the jobs being run.  Thanks
for that hint.

And everything else...

_KMP

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july

_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Re: [Bacula-users] Some Operational Questions: Backing up lots of stuff

Reply via email to