Hi Christopher!
Thanks a lot for your time!!. Answering below in blue for better
discerning.
El 2022-03-03 15:03, webmaster escribió:
> ATENCION: Este correo se ha enviado desde fuera de la organización. No pinche
> en los enlaces ni abra los adjuntos a no ser que reconozca el remitente y
> sepa que el contenido es seguro.
>
> Hello
>
> I was reading this and had a thought about deduplication,
>
> I WANTED TO REFER TO DELTA ENCODING SORRY.... NOT BYTES DEDUP IN A STORAGE...
>
> The zfs filesytem has inbuilt deduplication (and compression) support
>
> so you could when creating a new backup volume
> create a virtual zfs pool/filesystem
> Write all backuped files to the zfs pool
> Which automatically does deduplication
>
> WE RUN ZFS AS THE FILESYSTEM OF OUR FILE STORAGES...
>
> You then write the virtual zfs file system to your bacula volume
>
> Though Not sure how well this would work in practice, but seems like a
> "simple" way to implement basic deduplication
>
> YES, ZFS IS NICE... BUT WE ARE LOOKING FOR TRANSFER AND STORE THE LESS
> POSSIBLE INFO THAT A FD CAN SEND US....
>
> Christopher tyerman
>
> CHEERS!!!
>
> Sent from my Galaxy
>
> -------- Original message --------
> From: egoitz--- via Bacula-devel <bacula-devel@lists.sourceforge.net>
> Date: 03/03/2022 12:36 (GMT+00:00)
> To: Radosław Korzeniewski <rados...@korzeniewski.net>
> Cc: bacula-devel@lists.sourceforge.net
> Subject: Re: [Bacula-devel] Open source Bacula plugin for deduplication
>
> Hello Radoslaw,
>
> I will answer below in green color for instance... just for discerning better
> what both have spoke... :)
>
> El 2022-03-03 12:46, Radosław Korzeniewski escribió:
>
> ATENCION: Este correo se ha enviado desde fuera de la organización. No pinche
> en los enlaces ni abra los adjuntos a no ser que reconozca el remitente y
> sepa que el contenido es seguro.
>
> Hello,
>
> czw., 3 mar 2022 o 12:09 egoitz--- via Bacula-devel
> <bacula-devel@lists.sourceforge.net> napisał(a):
>
> Good morning,
>
> I know Bacula enterprise provides deduplicacion plugins, but sadly we can't
> afford it. No problem, we will try to create an open source deduplication
> plugin for bacula file daemon. I would use rdiff (part of librsync) for delta
> patching and signature generation.
> What signatures rdiff is using?
>
> BASICALLY HERE IS DOCUMENTED EXACTLY...
> HTTPS://LIBRSYNC.GITHUB.IO/PAGE_FORMATS.HTML
>
> IT'S FOR BEING ABLE TO GENERATE DELTA PATCHES, WITHOUT THE NEED OF HAVING OLD
> AND NEW VERSION OF A FILE... AND SO... FOR AVOID DOUBLING THE SPACE USED OR
> REQUIRED FOR BACKING UP...
>
> I would love to create a Bacula plugin for deduplicating content at fd level.
> This way, even if the backup is sent crypted by fd to sd, the deduplication
> could be done obtaining the best results as the deduplication takes place
> when the files are not crypted yet.
> Yes, for proper encryption you would always get different bits for the same
> data block making deduplication totally useless. :)
>
> I THINK THAT TOO.. YES...
>
> The deduplication, would only be applied to files, let's say larger than
> 10GB.
> ???
>
> I designed Bacula deduplication to handle blocks (files) larger than 1k
> because indexing overhead for such small blocks was too high. The larger the
> block you use the lower chance to get a good deduplication ratio. So it is a
> trade-off - small blocks == good deduplication ratio but higher indexing
> overhead; larger blocks == weak deduplication ratio but lower indexing
> overhead. So it was handling block levels from 1K up to 64k (the default
> bacula block size, but could be extended to any size).
>
> I UNDERSTAND WHAT YOU SAY BUT THE PROBLEM WE ARE FACING IS THE FOLLOWING ONE.
> IMAGINE, A MACHINE WITH A SQL SERVER AND 150GB OF DATABASES. OUR PROBLEM IS
> TO HAVE TO INCREMENTALLY COPY THAT EACH DAY. WE DON'T REALLY MIND COPYING 5GB
> OF "WASTED" SPACE PER DAY... EVEN WHEN NON NECESSARY (JUST FOR
> UNDERSTANDING).... BUT OBVIOUSLY 100GB PER DAY OR 200GB... ARE DIFFERENT
> TERMS....
>
> I WAS THINKING IN APPLYING THIS DEDUPLICATION ONLY FOR IMPORTANT FILES
> REALLY.... HOPE YOU CAN UNDERSTAND ME NOW.. :)
>
> If you don't mind, I would like to share with you my ideas, in order to at
> least know, "this all" is a possible way.
>
> My idea is basically :
>
> - WHEN DOING A BACKUP :
>
> ++ Check the backup level we are running. I suppose that asking bVarLevel to
> getBaculaValue()
> Deduplication should be totally transparent to the backup level. You want to
> deduplicate data, especially for largest full level backups, right?
>
> WELL... REALLY... THE PROBLEM FOR US IS WHAT I TOLD JUST BEFORE SO... WE
> DON'T REALLY MIND COPYING A BIG FILE ONCE A MONTH, BUT WE WANT TO AVOID
> COPYING IT IN INCREMENTAL BACKUPS (AT LEAST THE WHOLE OF THE FILE...). APART,
> WHEN RESTORING (AND NOT IN VIRTUAL BACKUPS), YOU RESTORE A FULL PLUS
> INCREMENTALS. SO THIS WAY, WE WOULD RESTORE THE FULL ORIGINAL_FILE PLUS THE
> PATCHES AND WE WOULD APPLY THEM TO ORIGINAL_FILE AT THE END OF THE RESTORING
> JOB.
>
> ++ In startBackupFile() I suppose it gives me file size info (or if at least
> gives me the name and I'll do an stat() in some manner), get the file size.
> No. The standard "Bacula command Plugin API" expects that a plugin will
> return a file stat info to backup.
>
> OK, NO PROBLEM... IF I GET IN SOME MANNER FILENAME AND PATH I COULD ALWAYS DO
> A STAT()
>
> +++If it's a full level and bigger than 10GB, obtain the file signature and
> finally store that new (previously non existing) signature (written in a file
> with a known nomenclature based on ORIGINAL_FILE's name), plus the whole
> ORIGINAL_FILE (the one we have generated the signature from) in Bacula tapes.
> Should I need to say to Bacula, to re-read the directory for being able to
> backup generated file signatures?. They weren't until know we have generated
> a file that contains ORIGINAL_FILE signature.
> Why do you call it a "deduplication plugin"? Above is a functionality
> described by the Delta plugin which supports so-called "block level
> incremental". Which is _NOT_ deduplication. This "block level incremental"
> tries to backup blocks inside a single file which changed between backups. It
> does not deduplicate the backup stream in any sense. For two identical files
> which change in the same way Delta plugin will do data backup twice leaving
> data duplication in place.
>
> YES MATE, YOU ARE RIGHT. WHAT I NEEDED IS TO AVOID UPLOADING TO BACKUP EACH
> DAY BIG FILES WITH VERY LITTLE CHANGES. NOT TO AVOID WRITTING TWO EQUAL FILES
> IN THE BACKUP.
>
> In the case of the Delta plugin which uses the exact procedure and library
> which you describe above you should use an "Option Plugin API".
>
> I SEE. I'LL READ ABOUT IT...
>
> +++If it's an inc level and a previous signature of ORIGINAL_FILE file exists
> (I would know because they will have a known nomenclature based on
> ORIGINAL_FILE's name), with the previous signature plus the new state of the
> file (the new file state I mean), create a patch. Later obtain again, the
> file signature in the new status. Finally store that new signature plus the
> patch in Bacula tapes. Finally return a bRC_Skip of the ORIGINAL_FILE
> (because we are going to copy a delta patch and a signature). If I return a
> bRC_Skip to here... would the fd, skip this file, but see the signatures and
> delta patches generated before retuning the bRC_Skip?. Or should I ask to fd,
> in some manner, to re-read the directory?.
> It sounds like an exact step by step description of the Delta plugin.
>
> So, now I understand why you want to handle files > 10G only. :)
>
> THATS IT :) :)
>
> As you would assume in the incremental backups, I'm not storing the filename
> as its in the filesystem. It should more or less the following way :
>
> In a full level backup :
>
> ++ BEFORE THE BACKUP :
>
> _BACKED SERVER'S FS <----> BACULA "VIRTUAL TAPE" CONTENT_
>
> ORIGINAL_FILE <--->
>
> ++ AFTER THE BACKUP :
>
> _BACKED SERVER'S FS <----> BACULA "VIRTUAL TAPE" CONTENT_
>
> ORIGINAL_FILE + SIGNATURE FILE <---> ORIGINAL FILE + SIGNATURE
> FILE
>
> In the next incremental level backup :
>
> ++ BEFORE THE BACKUP :
>
> _BACKED SERVER'S FS <----> BACULA "VIRTUAL TAPE" CONTENT_
>
> NEW_STATE_ORIGINAL_FILE + SIGNATURE FILE GENERATED THE LAST FULL DAY <--->
> _FROM THE FULL BACKUP_(ORIGINAL FILE + SIGNATURE FILE)
>
> ++ AFTER THE BACKUP :
>
> _BACKED SERVER'S FS <----> BACULA "VIRTUAL TAPE" CONTENT_
>
> NEW_STATE_ORIGINAL_FILE + SIGNATURE FILE OF NEW_STATE_ORIGINAL_FILE <--->
> _FROM THE FULL BACKUP_(ORIGINAL FILE + SIGNATURE FILE) + PATCH FILE +
> SIGNATURE FILE OF NEW_STATE_ORIGINAL_FILE
>
> - WHEN RESTORING A BACKUP :
>
> If the restored files nomenclature is (for example...)
> ORIGINAL_FILE-SIGNATURE- OR ORIGINAL_FILE-PATCH that would mean (I assume I
> could see in the filename to be restored in startRestoreFile() because it has
> accesible the filename), we have backed up deltas of ORIGINAL_FILE in the
> incremental backups.
>
> So, let's write to a plain text file with this path inside it, in order for
> later, in a post restore job (or even bEventEndBackupJob event of the api?),
> to apply the patches in that path, to the ORIGINAL_FILE obtainted from the
> own name of the patch files. Finally after patching job done, remove
> signature files and patch files. Obviously leaving the last status of
> ORIGINAL_FILE at the restored date.
>
> So, at this point, I would be very very thankful :) :) :) if some experienced
> developer, could give me some idea or if can see something is wrong or should
> achieved in some other manner or with other plugin api functions.....
> IMVHO, the Delta plugin should be best handled with "Options Plugin API" (as
> it is with current Delta Plugin) and not the "Command Plugin API" as most of
> the backup functionality will be provided by Bacula itself.
>
> I WILL READ ABOUT THIS TOO....
>
> best regards
>
> BTW. I think a Delta plugin available in BEE is fairly cheap compared to full
> deduplication options.
>
> I HAVE ASKED PRICE TO ROB MORRISON :) :)
>
> CHEERS!!! --
> Radosław Korzeniewski
> rados...@korzeniewski.net
_______________________________________________
Bacula-devel mailing list
Bacula-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-devel