Good morning, 

I know Bacula enterprise provides deduplicacion plugins, but sadly we
can't afford it. No problem, we will try to create an open source
deduplication plugin for bacula file daemon. I would use rdiff (part of
librsync) for delta patching and signature generation. 

I would love to create a Bacula plugin for deduplicating content at fd
level. This way, even if the backup is sent crypted by fd to sd, the
deduplication could be done obtaining the best results as the
deduplication takes place when the files are not crypted yet. The
deduplication, would only be applied to files, let's say larger than
10GB. 

If you don't mind, I would like to share with you my ideas, in order to
at least know, "this all" is a possible way. 

My idea is basically : 

- WHEN DOING A BACKUP : 

++ Check the backup level we are running. I suppose that asking
bVarLevel to getBaculaValue() 

++ In startBackupFile() I suppose it gives me file size info (or if at
least gives me the name and I'll do an stat() in some manner), get the
file size. 

+++If it's a full level and bigger than 10GB, obtain the file signature
and finally store that new (previously non existing) signature (written
in a file with a known nomenclature based on ORIGINAL_FILE's name), plus
the whole ORIGINAL_FILE (the one we have generated the signature from)
in Bacula tapes. Should I need to say to Bacula, to re-read the
directory for being able to backup generated file signatures?. They
weren't until know we have generated a file that contains ORIGINAL_FILE
signature. 

+++If it's an inc level and a previous signature of ORIGINAL_FILE file
exists (I would know because they will have a known nomenclature based
on ORIGINAL_FILE's name), with the previous signature plus the new state
of the file (the new file state I mean), create a patch. Later obtain
again, the file signature in the new status. Finally store that new
signature plus the patch in Bacula tapes. Finally return a bRC_Skip of
the ORIGINAL_FILE (because we are going to copy a delta patch and a
signature). If I return a bRC_Skip to here... would the fd, skip this
file, but see the signatures and delta patches generated before retuning
the bRC_Skip?. Or should I ask to fd, in some manner, to re-read the
directory?. 

As you would assume in the incremental backups, I'm not storing the
filename as its in the filesystem. It should more or less the following
way : 

In a full level backup : 

++ BEFORE THE BACKUP  : 

_BACKED SERVER'S FS <----> BACULA "VIRTUAL TAPE" CONTENT_ 

ORIGINAL_FILE            <--->  

++ AFTER THE BACKUP : 

_BACKED SERVER'S FS <----> BACULA "VIRTUAL TAPE" CONTENT_ 

ORIGINAL_FILE + SIGNATURE FILE           <--->  ORIGINAL FILE +
SIGNATURE FILE 

In the next incremental level backup : 

++ BEFORE THE BACKUP  : 

_BACKED SERVER'S FS <----> BACULA "VIRTUAL TAPE" CONTENT_ 

NEW_STATE_ORIGINAL_FILE + SIGNATURE FILE GENERATED THE LAST FULL DAY 
<--->  _FROM THE FULL BACKUP_(ORIGINAL FILE + SIGNATURE FILE) 

++ AFTER THE BACKUP :   

_BACKED SERVER'S FS <----> BACULA "VIRTUAL TAPE" CONTENT_ 

NEW_STATE_ORIGINAL_FILE + SIGNATURE FILE OF NEW_STATE_ORIGINAL_FILE
<--->  _FROM THE FULL BACKUP_(ORIGINAL FILE + SIGNATURE FILE) + PATCH
FILE + SIGNATURE FILE OF NEW_STATE_ORIGINAL_FILE 

- WHEN RESTORING A BACKUP : 

If the restored files nomenclature is  (for example...)
ORIGINAL_FILE-SIGNATURE- OR ORIGINAL_FILE-PATCH that would mean (I
assume I could see in the filename to be restored in startRestoreFile()
because it has accesible the filename), we have backed up deltas of
ORIGINAL_FILE in the incremental backups. 

So, let's write to a plain text file with this path inside it, in order
for later, in a post restore job (or even bEventEndBackupJob event of
the api?), to apply the patches in that path, to the ORIGINAL_FILE
obtainted from the own name of the patch files. Finally after patching
job done, remove signature files and patch files. Obviously leaving the
last status of ORIGINAL_FILE at the restored date. 

So, at this point, I would be very very thankful :) :) :) if some
experienced developer, could give me some idea or if can see something
is wrong or should achieved in some other manner or with other plugin
api functions..... 

Thank you :) :) 

Cheers!!
_______________________________________________
Bacula-devel mailing list
Bacula-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-devel

Reply via email to