[Bacula-users] RE: [Bacula-devel] Migration jobs

David Boyes Thu, 01 Dec 2005 13:08:28 -0800

> What I have implemented already is (passes regression 
> testing, so all existing features work despite the new code):
> - Separation of read/write descriptors in the Storage daemon.
> - Separation of the read/write Storage names in the Director that
>   are sent to the Storage daemon (both a read and a write storage
>   device can now be sent).


Neat.

> - Implementation of a skeleton of Migration/Copy job code in 
> the Director.
> - Implementation of the following new Pool Directives:
>     Migration Time = <duration>
>     Migration High Bytes = <size>
>     Migration Low Bytes = <size>
>     Next Pool = <Pool-res-name>
>   (nothing is done with them yet, but all the Catalog variables
>    exist already in 1.38.x).

And these values apply to all volumes in the pool, right? Did you want
to treat the values for the entire pool (eg number of volumes * size of
volume), or for individual volumes? Operationally, I think you'll be
more concerned with the entire pool case, using the volume occupancy
information to select what volumes are eligible for migration. 

> How does it work?  Much like a Verify job.
> You define a Migrate or Copy job much the same as you do a 
> Verify job, with the exception that you specify a Migration 
> Job (target) rather than a Verify Job name (i.e. you tell it 
> what job you want migrated).  The from Storage daemon is 
> defined in the specified target Migration Job. The from Pool 
> is specified in the target Job's Pool's Next Pool, and if not 
> specified, it is taken from the Pool specified in the current job. 
> 
> You then schedule this Migration job using a schedule.  When 
> it runs, it will check that either the Migration Time is 
> depassed (it is if it is zero) or the Migration High Bytes 
> are exceeded in the target's Pool.  If one of those is true, 
> the job will start and will migrate the last target job run 
> (this needs to be improved) by reading that job, much like a 
> restore, and writing it to the destination pool, then for a 
> Migration, the old job is deleted from the catalog (perhaps 
> the Volume will be removed -- another Feature Request), or in 
> the case of a Copy, the old Job information will be left unchanged.

Hmm. I really think migration should be volume-oriented, not job
oriented. You really want to use this to clear entire volumes, not
job-by-job. 

I'd suggest that the migration code check whether the pool exceeds the
"Migration High Bytes" value, and then select volumes for migration
starting with the ones with the least space in use (it's faster to clear
an almost empty volume, minimizing the time the volume is unavailable
for appends) that is not in use in another job. The migration code
should then move ALL the jobs off the volume to the volumes in the next
pool, and release the original volume as available for use. The
migration code could then continue with the next volume if pool
utilization is still above the threshold, or stop if below the
threshold. If there are absolutely no volumes available in the source
pool for whatever reason, dip into your scratch pool and log the event. 

That way, you don't have to tie anything to the job itself, and you can
run regularly scheduled jobs that just "do the right thing" at regular
intervals. It also lends itself easily to later adding a trigger-based
process that would fire automatically if a threshold in a pool was
exceeded. 


> - You need a different Migration job for each job to be 
> migrated.  This is a bit annoying but is mitigated by JobDefs.

See above. IMHO, this really isn't job management, it's volume
management. All we're doing is rearranging where the server stored the
data, not the characteristics of an individual job. 



> - I haven't worked out exactly what to keep in the catalog 
> for Migration jobs (a job that runs but does nothing should 
> probably be recorded, a job that runs and migrates data 
> should probably be labeled as a Backup job to simplify the 
> restore code ...).

Start/stop times, amount of data migrated, number of volumes processed,
pool occupancy at start, pool occupancy at end. 

I think labeling it as a Backup job would be confusing. It's not really
a backup; it's a migration job. 

> - The last 20% of the programming effort requires 80% of the work :-)

And most of the grief. 

> - I'm thinking about adding an interactive migration console 
> command, similar to the restore command, except that the 
> files selected will be written to the output. This is a way 
> to "migrate" multiple jobs (i.e. the current state of the 
> system) or in other words do a "vitual Full backup" or a 
> "consolidation". 
> To be consistent this command would not allow selection of 
> individual files, i.e. it will take all files from the 
> specified jobs. 

There's two cases, I think: migration and data movement.  I'd add two
commands: MOVE DATA and MIGRATE DATA. MOVE DATA just moves data from
volume A to volume B, using the standard Bacula append rules, updating
the database as it goes along so that the active location for a file in
the database is now recorded as volume B rather than volume A. Takes two
specific volume names as arguments. 

MIGRATE DATA would take a pool name as input, and perform the equivalent
of a migrate job. The input pool name is required, and the default
behavior would be to use the nextpool attribute. If an output pool is
specified, the nextpool attribute in the database is ignored, and the
output pool is used as specified. 

> - An Archive feature can also be implement from this -- it is 
> simply a Copy with no record being put in the catalog and the 
> output Volume types being Archive rather than Backup.  (Note, 
> my concept of Archive is that no record of them remains in 
> the catalog -- this may be a subjet of discussion ...)

I'd say that you absolutely *want* records of where you archive stuff.
You could add a COPY DATA command that would behave like the MOVE DATA
command I described above, and give it an option as to whether to record
the copy in the database or not (default being yes). There you probably
want to allow either volume names or specifying individual job numbers
to copy to the new volume. 

Coupled with my feature request for multiple copypool processing, I
think this removes the last few barriers to calling Bacula
enterprise-grade -- this is a really seriously cool step forward. 

-- db











> 
> Comments?
> 
> --
> Best regards,
> 
> Kern
> 
>   (">
>   /\
>   V_V
> 
> 
> -------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc. Do you grep 
> through log files for problems?  Stop!  Download the new AJAX 
> search engine that makes searching your log files as easy as 
> surfing the  web.  DOWNLOAD SPLUNK!
> http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
> _______________________________________________
> Bacula-devel mailing list
> [EMAIL PROTECTED]
> https://lists.sourceforge.net/lists/listinfo/bacula-devel
> 
>  
>  ** ACCEPT: CRM114 PASS Markovian Matcher ** CLASSIFY 
> succeeds; success probability: 1.0000  pR: 89.8182 Best match 
> to file #0 (nonspam.css) prob: 1.0000  pR: 89.8182 Total 
> features in input file: 20240 #0 (nonspam.css): features: 
> 4701816, hits: 27522279, prob: 1.00e+00, pR:  89.82
> #1 (spam.css): features: 3751671, hits: 33422044, prob: 
> 1.52e-90, pR: -89.82 
>  
> 
> 
> 


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_idv37&alloc_id865&op=click
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

[Bacula-users] RE: [Bacula-devel] Migration jobs

Reply via email to