Re: [Bacula-users] Re: [Bacula-devel] Combined Tape & Disk Storage Solution

Kern Sibbald Fri, 14 Oct 2005 11:55:14 -0700

On Wednesday 05 October 2005 17:59, David Boyes wrote:
> > > Hmm. Does the second job transfer the data from the FD
> >
> > again? If so,
> >
> > > then that doesn't (IMHO) quite do what I want to do here. I really
> > > want to transfer the data only once (the only guarantee we have of
> > > getting the same data on all the copies) and create the
> >
> > replicas on the server side.
> >
> > Yes, it starts a second job.  The disadvantage of this is
> > that the data is not 100% identical if anything is changing
> > on the FD.  The advantage is that it avoids a whole bunch of
> > complications that I have not logically resolved concerning
> > having two backups of the same thing in the same job.
>
> Hmm. I don't think that would pass our auditors. If there's a significant
> chance that the copies are not identical (and it sounds like this approach
> pretty much guarantees that the copies will not be identical), I don't
> think it would be sufficient or useful for this purpose. It does however
> make implementation easier, as you said.


Well, it may not pass your auditors, I cannot argue with that, but I can say 
that if it doesn't pass, they would probably be horrified to learn that the 
copies that they are currently getting are not true instantaneous "snapshots" 
of the system (except the Win32 VSS backups), and so the minor differences 
produced by this procedure are probably insignificant compared to the 
inconsistencies within a single backup. 

>
> > > (As a side issue, I'm beginning to wonder if overall we need a more
> > > generalized job manager. This is sort of sounding like we need
> > > something like JCL, and then this could all be handled in a
> >
> > more systematic way.
> >
> > > That's a much bigger project, though.)
> >
> > Perhaps if I were starting to design Bacula with the
> > knowledge I have today, I would have a different structure.
> > However, I have to live with the current code, and at the
> > current time, I am, unfortunately, the only one who
> > understands it and who is continuously working on the
> > project.
>
> Don't you feel lucky and needed? 8-)

No, I would feel better knowing there was already someone ready to take my 
place.  

>
> >  Making any major design changes is not something I
> > can handle without a team of programmers.  By myself, I can
> > continue the same path I have taken over the years -- slowly
> > evolve it to provide all the functionality we want.
>
> As I said, it's a MUCH bigger project. Not on the radar for today or
> tomorrow, just musing a bit on something I was thinking about. What we've
> got works; it's more thinking about where future simplifications might go.

OK, that's fair.

>
> > This could be a way to do it, but it doesn't fit in with the
> > current Bacula scheme.  Any restore can have Volumes from
> > multiple pools (typically not from a single job).  Many users
> > separate their Volumes into Full, Diff, Inc pools.
> >
> > So, IMO, unless I am missing something you are saying, a Pool
> > is not a good way to separate multiple copies.  I do have a
> > database column designed to indicate what copy a particular
> > Volume record is from (I also have a stripe database column).
> >  Since they are not yet fully implemented, they are not yet
> > stored in the DB to conserve space, but this info is passed
> > from the SD to the DIR.
>
> I probably didn't explain it well. I don't think the approach conflicts at
> all with current usage -- the primary pools can still be anything the user
> designates. Copy pools are horizontal in nature, in *addition* to the
> existing primary pool structure -- basically they provide a way of grouping
> volumes so that copies of volumes in the primary pool are selected from a
> designated set of volumes. So, example:
>
> Pool A (primary Full) --> Pool B (copypool 1 for pool A) --> Pool C
> (copypool 2 for Pool A) --> etc.
> Pool D (primary Diff) --> Pool E (copypool 1 for pool D) --> Pool F
> (copypool 2 for Pool D) --> etc.
>
> Let's say that Pool A is a pool containing volumes A1, A2, and A3.  Pool B
> is a different pool, containing volumes B1, B2, and B3. Pool C contains
> volumes C1, C2 and C3, and so forth.
>
> In a backup job, data is written to a volume selected from the primary
> pool, say A2. If copypools are defined for the primary pool, the same data
> is written to volumes selected from the designated copypool(s), say B1 and
> C3. The idea of a SD-mux would allow this to be implemented w/o changing a
> lot of the SD code -- jobs talk to the SD-mux, and the SD-mux would look at
> the definition of the primary pool, and then establish N sessions with the
> SDs manageing the primary and copypools, 1 per pool. The SD-mux accepts the
> write from the FD, and returns a "write complete" when all the end SDs
> acknowledge the write complete. The end SDs use the same volume selection
> method they do now, selecting a volume from the appropriate pool using the
> same logic used today. If multiple jobs are active, that's fine -- the
> SD-mux doesn't care, and the end SD will not try to select the same volume
> in use for another job, eg. Job 2 will get either A1 or A3 from Pool A,
> since A2 is already in use for another job and the SD for Pool A already
> knows that. Same logic occurs for the copypools.
>
> The above should handle the consistency issue for the volumes neatly. As
> you say, the problem is associating multiple volume residence records with
> a file record in the database. What I was trying to suggest was that the
> volume residence field (probably not the right name, but I'm talking about
> the entry in the file record that indicates what volume the file is on)
> could become a list of (1...n) volume records instead. Same data, just that
> there can be more than 1 volume record associated with a file, defaulting
> to the current 1 volume per file. In our above example, the database would
> reflect one file record and three volume records for A2, B1, and C3 -- all
> with the same data.
>
> On a restore, you could examine the file record in the database, which
> would tell you what volumes have copies of this file based on the list
> above. You can then sort the list of files to be restored by volume
> (minimizing mounts), check to see if any of the volumes in your list are
> already mounted, and proceed from there, removing files from the restore
> list as you successfully restore them. If you're unable to restore from one
> volume in the list for a file, then try the next volume in the list from
> the next copypool in the list for the primary pool. (Note that the
> parallelism cited above would also apply to multiple restore jobs, allowing
> multiple jobs to restore the same file at the same time (subject to # of
> copypools and hardware availability -- mounted volumes in use in another
> job are already considered busy/unavailable by the volume selection
> algorithm)).
>
> Using the example above, if the volume A2 containing the file to be
> restored is missing or broken, the restore process looks at the copypool
> definition for the primary pool, decides that volume B1 in Pool B is the
> next likely candidate, retrieves the volume info for B1, and initiates a
> mount for B1, repeating the restore process on B1. If that volume is also
> broken/gone, then we try volume C3 from Pool C, usw until we hit the end of
> the copypool chain. If we still haven't successfully restored the file,
> then we return an error.
>
> Does that explain it a bit better? If not, please tell me where I'm not
> making sense.

I need to think about it a lot more before I can answer.  The current 
mechanism that I have defined is a "copy" field in the JobMedia record so 
that Bacula can write as many copies of a file as it wants, and it can then 
choose what "copy" should be used for the restore.  All the details are not 
yet worked out.

I think Migration comes first, which is a sort of Move this job from one Pool 
to another.  Then I'll think about how we do multiple copies ...  For the 
moment the "clone" mechanism should work for most cases.

>
> (Also as an aside, if we do implement pool migration, then each of these
> pools (primary and copypools) can be treated as a separate migration chain
> with individual volume and pool migration thresholds, and the program logic
> necessary to implement migration is identical for each chain.  Job spooling
> also becomes a special case of migration from a disk pool to a tape pool,
> and we get multiple copies of the output tapes for free in the process if
> we do copypools with identical migration thresholds.)

Concerning pool migration: yes, I am in complete agreement with what you say. 
The scheme I will be implementing is based on your input and what I was able 
to glean about the functionality the IBM product (Tivoli, if I remember the 
name right).  Obviously, it will all be adapted to the Bacula way of doing 
things rather than trying to copy what someone else has done.

>
> > > With the approach above, just taking the volumes in and out of the
> > > changers does the job for you. No new wheels needed.
> >
> > Yes, this would work for big shops where everything is in the
> > changer, but for the other 99.9% of us who either don't have
> > changers or who are obligated to remove Volumes from the
> > changers, it would leave the problem of deciding what Volume
> > to take, and how to tell Bacula in a user friendly way that
> > certain Volumes may be offsite.
>
> One conceptual way around that difference is to treat a manual tape drive
> as a 1 slot changer with 1 drive, and the changer script would become a
> small program that says "insert volume XXX. Type 1 when ready" or something
> like that to a designated location (/dev/console or bconsole or something
> like that). Then everyone *has* changers, and the conceptual problem is
> diminished.
>
> In my earlier note, along with the above idea, I assumed that removing the
> volumes physically would be accompanied by updating the InChanger flag to
> indicate that the volume is not physically available. If the InChanger flag
> was 0 for a specific volume, the volume is marked unavailable, and the
> process I described above would automatically fall back to the next
> copypool volume in the chain until we find one that *is* available.
>
> BTW, this is the function I was talking about in implementing a DR manager
> -- determine which volumes were used in a backup sequence, generate a list
> to remove, update the appropriate DB fields for those volumes, and
> (optionally) eject them from the changer if stored in one. The DR manager
> would have to track movement from location to location (possibly express
> that as moving from changer to changer, or adding a location field to the
> volume record if there isn't one already), but I think that combines two
> things: the discipline to *use* such a tool in a consistent manner, and
> matching that to a business process, neither of which Bacula can force
> someone to do.

I think that once Python scripting has matured a bit, a DR manager as you want 
and describe will be rather "easy" or perhaps I should say "straight 
forward".   All the hooks you need will be there ...


-- 
Best regards,

Kern

  (">
  /\
  V_V


-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Re: [Bacula-users] Re: [Bacula-devel] Combined Tape & Disk Storage Solution

Reply via email to