Hi Bob, 

Thanks a lot in advance for your time and help. Absolutely appreciated
mate!. Answering below, in green bold for instance :) :) ...

El 2022-03-08 16:42, Bob Hetzel escribió:

> ATENCION: Este correo se ha enviado desde fuera de la organización. No pinche 
> en los enlaces ni abra los adjuntos a no ser que reconozca el remitente y 
> sepa que el contenido es seguro.
> 
> Hi, 
> 
> I'm not a developer but I have a lot of familiarity with Microsoft SQL 
> Server. I'm not sure if you meant Microsoft or not.  In Microsoft SQL Server 
> you normally back up the db using using a SQL call.  That can be set to 
> compress the backup so a 150gb db file will typically only produce backup 
> files of 5gb to 15gb in size.  Additionally, SQL Server supports log and 
> differential backups so you would not need to do a full backup every time. 
> 
> I KNOW BUT AS FAR AS I KNOW, IT'S POSSIBLE TO BACK UP SQL SERVER WITH VSS 
> WITHOUT DOING DUMPS BEFORE. I THINK IT IS... I WOULD DO TOO THE DUMP. YES I 
> KNEW YOU CAN DO INCREMENTAL BACKUPS AND SO.... 
> 
> WAS JUST... AS WE TRY TO BACKUP THE WHOLE FILESYSTEM.... FOR SAVING SOME 
> SPACE IN THE BACKUP.... 
> 
> So you could basically just run db engine backups then skip the in-use db 
> files and back up the backups files instead.    
> 
> I AGREE WITH YOU... 
> 
> I realize this is basically a 2-step backup rather than a simple 1-step so 
> definitely has some drawbacks, but figured I'd mention that as an option 
> off-list.  
> 
> TRUE... 
> 
> The way enterprises datacenters have changed over the last 20 years, hardly 
> any backups are now using file system agent based scans.  Most everything is 
> a VM so I would not expect a lot of usage for your enhancement, because 
> backups in most cases are of the entire VM using the change block tracking 
> that was mentioned in this thread.   
> 
> WELL YES... YOU COULD USE CBT IN VMWARE OR SOME OTHER WAYS TOO IN 
> XCP-NG/XENSERVER AT VIRTUAL DISK LEVEL BUT YES... TODAY THIS METHOD HAS 
> BECOME MORE AND MORE POPULAR. EVEN THAT... I WOULD SAY I PREFER DOING THE 
> BACKUP FROM "INSIDE" THE MACHINE... INSTEAD OF USING VIRTUAL BLOCK DISKS.... 
> OR ELSE... USE BOTH... A WHOLE BACKUP FROM VIRTUAL DISK SIDE... BUT ANOTHER 
> BACKUP TOO AT FILE LEVEL INSIDE THE VM, WITH FOR INSTANCE THE DATABASE DUMPS 
> ONLY... 
> 
> I could go into more detail on any of this if you have questions about any of 
> the things I've mentioned.   
> 
> THANK YOU SO MUCH BOB. I TAKE NOTE AND WOULD TELL YOU IF I NEEDED TO SHARE 
> WITH YOU AGAIN SOME PLAN... 
> 
> CHEERS!! 
> 
> Bob
> 
> Begin forwarded message:
> 
>> FROM: egoitz--- via Bacula-devel <bacula-devel@lists.sourceforge.net>
>> DATE: March 3, 2022 at 6:33:25 AM CST
>> TO: Radosław Korzeniewski <rados...@korzeniewski.net>
>> CC: bacula-devel@lists.sourceforge.net
>> SUBJECT: RE: [BACULA-DEVEL] OPEN SOURCE BACULA PLUGIN FOR DEDUPLICATION
>> REPLY-TO: ego...@ramattack.net
> 
> Hello Radoslaw, 
> 
> I will answer below in green color for instance... just for discerning better 
> what both have spoke... :)
> 
> El 2022-03-03 12:46, Radosław Korzeniewski escribió: 
> 
> ATENCION: Este correo se ha enviado desde fuera de la organización. No pinche 
> en los enlaces ni abra los adjuntos a no ser que reconozca el remitente y 
> sepa que el contenido es seguro.
> 
> Hello, 
> 
> czw., 3 mar 2022 o 12:09 egoitz--- via Bacula-devel 
> <bacula-devel@lists.sourceforge.net> napisał(a): 
> 
> Good morning, 
> 
> I know Bacula enterprise provides deduplicacion plugins, but sadly we can't 
> afford it. No problem, we will try to create an open source deduplication 
> plugin for bacula file daemon. I would use rdiff (part of librsync) for delta 
> patching and signature generation. 
> What signatures rdiff is using?  
> 
> BASICALLY HERE IS DOCUMENTED EXACTLY... 
> HTTPS://LIBRSYNC.GITHUB.IO/PAGE_FORMATS.HTML 
> 
> IT'S FOR BEING ABLE TO GENERATE DELTA PATCHES, WITHOUT THE NEED OF HAVING OLD 
> AND NEW VERSION OF A FILE... AND SO... FOR AVOID DOUBLING THE SPACE USED OR 
> REQUIRED FOR BACKING UP... 
> 
> I would love to create a Bacula plugin for deduplicating content at fd level. 
> This way, even if the backup is sent crypted by fd to sd, the deduplication 
> could be done obtaining the best results as the deduplication takes place 
> when the files are not crypted yet. 
> Yes, for proper encryption you would always get different bits for the same 
> data block making deduplication totally useless. :)  
> 
> I THINK THAT TOO.. YES... 
> 
> The deduplication, would only be applied to files, let's say larger than 
> 10GB. 
> ??? 
> 
> I designed Bacula deduplication to handle blocks (files) larger than 1k 
> because indexing overhead for such small blocks was too high. The larger the 
> block you use the lower chance to get a good deduplication ratio. So it is a 
> trade-off - small blocks == good deduplication ratio but higher indexing 
> overhead; larger blocks == weak deduplication ratio but lower indexing 
> overhead. So it was handling block levels from 1K up to 64k (the default 
> bacula block size, but could be extended to any size). 
> 
> I UNDERSTAND WHAT YOU SAY BUT THE PROBLEM WE ARE FACING IS THE FOLLOWING ONE. 
> IMAGINE, A MACHINE WITH A SQL SERVER AND 150GB OF DATABASES. OUR PROBLEM IS 
> TO HAVE TO INCREMENTALLY COPY THAT EACH DAY. WE DON'T REALLY MIND COPYING 5GB 
> OF "WASTED" SPACE PER DAY... EVEN WHEN NON NECESSARY (JUST FOR 
> UNDERSTANDING).... BUT OBVIOUSLY 100GB PER DAY OR 200GB... ARE DIFFERENT 
> TERMS.... 
> 
> I WAS THINKING IN APPLYING THIS DEDUPLICATION ONLY FOR IMPORTANT FILES 
> REALLY.... HOPE YOU CAN UNDERSTAND ME NOW.. :) 
> 
> If you don't mind, I would like to share with you my ideas, in order to at 
> least know, "this all" is a possible way. 
> 
> My idea is basically : 
> 
> - WHEN DOING A BACKUP : 
> 
> ++ Check the backup level we are running. I suppose that asking bVarLevel to 
> getBaculaValue() 
> Deduplication should be totally transparent to the backup level. You want to 
> deduplicate data, especially for largest full level backups, right? 
> 
> WELL... REALLY... THE PROBLEM FOR US IS WHAT I TOLD JUST BEFORE SO... WE 
> DON'T REALLY MIND COPYING A BIG FILE ONCE A MONTH, BUT WE WANT TO AVOID 
> COPYING IT IN INCREMENTAL BACKUPS (AT LEAST THE WHOLE OF THE FILE...). APART, 
> WHEN RESTORING (AND NOT IN VIRTUAL BACKUPS), YOU RESTORE A FULL PLUS 
> INCREMENTALS. SO THIS WAY, WE WOULD RESTORE THE FULL ORIGINAL_FILE PLUS THE 
> PATCHES AND WE WOULD APPLY THEM TO ORIGINAL_FILE AT THE END OF THE RESTORING 
> JOB. 
> 
> ++ In startBackupFile() I suppose it gives me file size info (or if at least 
> gives me the name and I'll do an stat() in some manner), get the file size. 
> No. The standard "Bacula command Plugin API" expects that a plugin will 
> return a file stat info to backup.  
> 
> OK, NO PROBLEM... IF I GET IN SOME MANNER FILENAME AND PATH I COULD ALWAYS DO 
> A STAT() 
> 
> +++If it's a full level and bigger than 10GB, obtain the file signature and 
> finally store that new (previously non existing) signature (written in a file 
> with a known nomenclature based on ORIGINAL_FILE's name), plus the whole 
> ORIGINAL_FILE (the one we have generated the signature from) in Bacula tapes. 
> Should I need to say to Bacula, to re-read the directory for being able to 
> backup generated file signatures?. They weren't until know we have generated 
> a file that contains ORIGINAL_FILE signature. 
> Why do you call it a "deduplication plugin"? Above is a functionality 
> described by the Delta plugin which supports so-called "block level 
> incremental". Which is _NOT_ deduplication. This "block level incremental" 
> tries to backup blocks inside a single file which changed between backups. It 
> does not deduplicate the backup stream in any sense. For two identical files 
> which change in the same way Delta plugin will do data backup twice leaving 
> data duplication in place. 
> 
> YES MATE, YOU ARE RIGHT. WHAT I NEEDED IS TO AVOID UPLOADING TO BACKUP EACH 
> DAY BIG FILES WITH VERY LITTLE CHANGES. NOT TO AVOID WRITTING TWO EQUAL FILES 
> IN THE BACKUP. 
> 
> In the case of the Delta plugin which uses the exact procedure and library 
> which you describe above you should use an "Option Plugin API". 
> 
> I SEE. I'LL READ ABOUT IT... 
> 
> +++If it's an inc level and a previous signature of ORIGINAL_FILE file exists 
> (I would know because they will have a known nomenclature based on 
> ORIGINAL_FILE's name), with the previous signature plus the new state of the 
> file (the new file state I mean), create a patch. Later obtain again, the 
> file signature in the new status. Finally store that new signature plus the 
> patch in Bacula tapes. Finally return a bRC_Skip of the ORIGINAL_FILE 
> (because we are going to copy a delta patch and a signature). If I return a 
> bRC_Skip to here... would the fd, skip this file, but see the signatures and 
> delta patches generated before retuning the bRC_Skip?. Or should I ask to fd, 
> in some manner, to re-read the directory?. 
> It sounds like an exact step by step description of the Delta plugin. 
> 
> So, now I understand why you want to handle files > 10G only. :)  
> 
> THATS IT :) :) 
> 
> As you would assume in the incremental backups, I'm not storing the filename 
> as its in the filesystem. It should more or less the following way : 
> 
> In a full level backup : 
> 
> ++ BEFORE THE BACKUP  : 
> 
> _BACKED SERVER'S FS <----> BACULA "VIRTUAL TAPE" CONTENT_ 
> 
> ORIGINAL_FILE            <--->  
> 
> ++ AFTER THE BACKUP : 
> 
> _BACKED SERVER'S FS <----> BACULA "VIRTUAL TAPE" CONTENT_ 
> 
> ORIGINAL_FILE + SIGNATURE FILE           <--->  ORIGINAL FILE + SIGNATURE 
> FILE 
> 
> In the next incremental level backup : 
> 
> ++ BEFORE THE BACKUP  : 
> 
> _BACKED SERVER'S FS <----> BACULA "VIRTUAL TAPE" CONTENT_ 
> 
> NEW_STATE_ORIGINAL_FILE + SIGNATURE FILE GENERATED THE LAST FULL DAY  <--->  
> _FROM THE FULL BACKUP_(ORIGINAL FILE + SIGNATURE FILE) 
> 
> ++ AFTER THE BACKUP :   
> 
> _BACKED SERVER'S FS <----> BACULA "VIRTUAL TAPE" CONTENT_ 
> 
> NEW_STATE_ORIGINAL_FILE + SIGNATURE FILE OF NEW_STATE_ORIGINAL_FILE <--->  
> _FROM THE FULL BACKUP_(ORIGINAL FILE + SIGNATURE FILE) + PATCH FILE + 
> SIGNATURE FILE OF NEW_STATE_ORIGINAL_FILE 
> 
> - WHEN RESTORING A BACKUP : 
> 
> If the restored files nomenclature is  (for example...) 
> ORIGINAL_FILE-SIGNATURE- OR ORIGINAL_FILE-PATCH that would mean (I assume I 
> could see in the filename to be restored in startRestoreFile() because it has 
> accesible the filename), we have backed up deltas of ORIGINAL_FILE in the 
> incremental backups. 
> 
> So, let's write to a plain text file with this path inside it, in order for 
> later, in a post restore job (or even bEventEndBackupJob event of the api?), 
> to apply the patches in that path, to the ORIGINAL_FILE obtainted from the 
> own name of the patch files. Finally after patching job done, remove 
> signature files and patch files. Obviously leaving the last status of 
> ORIGINAL_FILE at the restored date. 
> 
> So, at this point, I would be very very thankful :) :) :) if some experienced 
> developer, could give me some idea or if can see something is wrong or should 
> achieved in some other manner or with other plugin api functions..... 
> IMVHO, the Delta plugin should be best handled with "Options Plugin API" (as 
> it is with current Delta Plugin) and not the "Command Plugin API" as most of 
> the backup functionality will be provided by Bacula itself. 
> 
> I WILL READ ABOUT THIS TOO.... 
> 
> best regards 
> 
> BTW. I think a Delta plugin available in BEE is fairly cheap compared to full 
> deduplication options.  
> 
> I HAVE ASKED PRICE TO ROB MORRISON :) :) 
> 
> CHEERS!!! -- 
> Radosław Korzeniewski
> rados...@korzeniewski.net
 _______________________________________________
Bacula-devel mailing list
Bacula-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-devel
_______________________________________________
Bacula-devel mailing list
Bacula-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-devel

Reply via email to