Hi Eric!! 

Sorry for answering so late. I answer between lines in color green bold
for instance, for better clarification.

El 2022-03-08 09:56, Eric Bollengier escribió:

> Hello,
> 
> On 3/7/22 12:44, egoitz--- via Bacula-devel wrote: 
> 
>> Hi!,
>> 
>> After digging in plugin building documentation and checking the provided
>> examples, I have some doubts I have not been able to clarify by my own.
>> I describe them below in case you could give me a hand :). I would be
>> very thankful if you could help me a little clarifying this doubts :)
> 
> I very pleased to see someone interested to develop Bacula plugins,
> specially around data compression. 
> 
> I WROTE SOMETIME AGO TOO A VERY LITTLE PATCH FOR BACULA, FOR SPEEDING UP THE 
> BVFS_CACHE ON POSTGRESQL (IT HAPPENED AT LEAST IN THE POSTGRESQL VERSION WE 
> WERE USING AT LEAST IN THAT MOMENT). VERY HAPPY AND PROUD OF DOING THAT AND 
> THIS NEW ONE :) . 
> 
> MY PLAN IS TO FIRST WRITE ABOUT DATA COMPRESSION BUT I WOULD LOVE TOO... TO 
> KNOW BETTER THE BACULA FD PLUGIN API, IN ORDER TO WRITE SOME OTHER CUSTOM 
> PLUGINS FOR US... THE DELTA ONE, IS IMPORTANT FOR US, BUT IT'S TOO THE FACT 
> OF LEARNING HOW THE API WORKS. YOU KNOW, KNOWLEDGE ENDS UP BEING FLEXIBILITY 
> AND SO... AND BECAUSE I WANTED TOO DOING IT TOO REALLY :) :) :) :) LOL 
> 
>> - I'm working in trying to create an open source version of the delta
>> encoding plugin by using the bacula-fd plugin api. When working on it I
>> have seen Bacula's source is aware of delta and delta file
>> sequentiation.
> 
> Yes, Bacula can manage "delta" or "patches" for a given file. The first
> Full backup should take the entire file, the delta_seq will be 0. 
> 
> YEP I SUPPOSED BY THE CODE READ... 
> 
> I ASSUME NOW IT'S JUST FOR REGISTRATION AND HELPING DEVELOPERS :) :) :) 
> 
> BUT BASICALLY PREVIOUSLY WANTED TO KNOW IF YOU WERE JUST STORING THE DELTA 
> SEQUENCE, BECAUSE DELTA ENCONDING IS USEFUL IF YOU NEED IT OR YOU WERE 
> STORING THAT DELTA DATA FOR LATER USING IT WITH SOME OTHER CODE THAT I DON'T 
> HAVE OR WHATEVER.... 
> 
> During the first Incr or the Differential, the plugin can generate a
> "patch" (can be done with "diff", or "xdelta", or rsync, or whatever). 
> 
> OK, YES I SUPPOSE I WILL USE RDIFF DIRECTLY. THE BINARY BUILT AND PROVIDED 
> WITH LIBRSYNC BY DEFAULT. IT SEEMS TO RUN PRETTY FINE EVEN ON WINDOWS WITH 
> MSYS2.ORG PACKAGE SO.... 
> 
> This new files is based on the original file, and the delta_seq will
> be set to 1 automatically (Accurate mode should be turned on).  
> 
> YEP I HAVE READ BEFORE THIS TOO... 
> 
> The plugin
> must set a variable in the save_pkt to indicates that it has saved a
> patch,  
> 
> I ASSUME, YOU ARE TALKING ABOUT THE PRESENCE OF THE FO_DELTA FLAG IN 
> SP->FLAGS? 
> 
> and at the restore time, bacula must send back the version of
> the file included in the Full, then in Diff (if any) and all Incrementals. 
> 
> YES THIS PART IT'S CLEAR TOO...
> 
> On the plugin side, the restore code will be called for each part of
> the file. 
> OK, SO EACH THE INITIAL FILE COPIED USING DELTA PLUS ALL IT'S PATCHES WILL 
> GET RESTORED AS TOTALLY NORMAL FILES. FOR INSTANCE, WHEN BACULA IS GOING TO 
> RESTORE A FILE WHICH IS BACKED UP USING DELTA ENCODING, BACULA WILL DO :

> - RESTORE INITIAL FILE

> - RESTORE FIRST PATCH

> - RESTORE SECOND PATCH

> - AND SO ON...?
> 
>> I have seen for instance, even a .bvfs command exists for
>> showing deltas of a file id. But, what  I have not found is that Bacula
>> works on that Delta files generation (patch generation, signatures,
>> etc...). I assume that Bacula in the non-fd part, acts just as a just
>> delta file holder keeping the files and stores the patch sequentiation
>> just that. Bacula keeps records of deltas in database (and file
>> storages) but only fd works with them (with probably a library like
>> librsync in the delta plugin) in the sense of applying patches over an
>> original file or even generating deltas when backup. Am I wrong?. Was
>> just for understanding the nice work done and what's already written and
>> free in Bacula's source for this purpose.
> 
> I think that you got the concept, the "delta" in Bacula means that you
> must restore all parts of a file, and not just the last copy.
> 
> With a program such as VMware, CBT helps the backup software to generate
> "patches" as well for example. 
> 
> YEP I HAVE READ ABOUT CBT TOO... AND EVEN I HAVE DONE SOME WORK FOR BACKING 
> IP XCP-NG USING DELTAS OF VM :) (THERE IN XCP-NG WITHOUT CBT :) ) 
> 
>> - By the way, I have one question about virtual files. I have not seen
>> very clear (perhaps my problem as don't understand it) how to work with
>> them. I understand the concept, but have not seen a clear example of how
>> for instance in the backup you create a virtual file, how do you see it
>> in bvfs and finally... what you get after restoring. In page 36/146 of
>> Bacula 11 for developers pdf, you say "This will create a virtual file."
>> but really you are entering in the structure :
>> 
>> sp->type = FT_REG;
>> sp->statp.st_mode = 0700 | S_IFREG;
>> 
>> FT_REG and S_IFREG both are for regular files.... what exactly causes a
>> virtual file to be created?. Perhaps st_size -1?.
> 
> A virtual file is generated by plugins, they don't have to exist on disk.
> The name can be anything, it can also point to an existing file.
> 
> The plugin code will be executed to restore the "virtual file", the result
> can be a real file on disk, or a virtual machine on Proxmox for example.
> 
> In this example, we have a regular file, but it's a virtual file that may
> or may not exist on the filesystem. 
> 
> BUT, ONE THING....  I DON'T HAVE CLEAR, HOW BACULA DISTINGUISHES IF IT'S 
> STORING A VIRTUAL FILE OR A NORMAL ONE. WHICH FIELD OR PARAMETER SHOULD YOU 
> SET AND HOW IN 
> THE SAVE_PKT STRUCTURE?. FIRST I THOULD IT COULD BE USING A SIZE OF -1 IN THE 
> SAVE_PKT STRUCTURE OR... USING SOME KIND OF FLAG OR... HOW DOES BACULA KNOW 
> IT'S DEALING IN THE STRUCTURES WITH A VIRTUAL OR NORMAL FILE? 
> 
> OK... I GOT THIS OTHER PART... SO I SUPPOSE IT DOES NOT APPLY FOR DELTA 
> ENCODING BECAUSE HERE BACULA, HAS ALREADY DEFINED VARIABLES LIKE DELTA 
> SEQUENCE AND SO... FOR HANDLING DELTA ENCONDING BACKUP AND RESTORE... BUT FOR 
> EXAMPLE IF YOU WANTED TO GROUP FOR INSTANCE FIVE FILES TO GET RESTORED... YOU 
> COULD SAY THEY ARE A VIRTUAL FILE AND WHEN YOU RESTORE THAT VIRTUAL FILE, THE 
> FIVE FILES WILL APPEAR IN THE DISK.... 
> 
> COULD YOU PLEASE ERIC, TELL ME HOW YOU SET IN THE STRUCTURE YOU ARE DEALING 
> WITH A VIRTUAL FILE?. 
> 
>> Are they relevant for what I'm trying to do?. It seems Bacula handles
>> delta sequentiation so... perhaps for this purpose I shouldn't need
>> "virtual files"?.
> 
> In your case, it will be virtual files that points to regular files. 
> 
> BUT YOU SAID BACULA WOULD CALL FOR INSTANCE THE ENDBACKUPFILE() FOR EACH FILE 
> BACKED WITH DELTA ENCODING. I MEAN, WOULD IT RESTORE THE WAY DESCRIBED SOME 
> LINES ABOVE?. BECAUSE IF IT WOULD DO THIS WAY : 
> 
>> - RESTORE INITIAL FILE
> 
>> - RESTORE FIRST PATCH
> 
>> - RESTORE SECOND PATCH
> 
>> - AND SO ON...?
> 
> I ASSUME THERE'S NO SENSE HERE ON USING VIRTUAL FILES, AM I WRONG?. HAVE I 
> UNDERSTAND SOMETHING WRONG?. 
> 
>> - I'm planning to implement delta encoding by checking the previous day
>> file signature done by librsync. Instead of looking at the filesystem it
>> would be nice if I could take a look at that signature in the last
>> backup done (yesterday backup). Could it be possible in some manner,
>> that if I see a file passed in EventHandleBackupFile() to check if
>> yesterdays signature exists in the backup of yesterday, and then read
>> the yesterday signature from the own backup?. I mean, instead of having
>> to leave the signature in the being backed server's filesystem.
> 
> HERE COMES AN INTERESTING PART :) 
> 
> You can store information in the save_pkt structure and the plugin can check
> the last version of that information with the accurate mode. 
> 
> I WILL CHECK THE SIGNATURE FILE IS THE EXPECTED WAY WITH A SHA256 SUM. I'M 
> NOT GOING TO COPY THE WHOLE SIGNATURE FILE... 
> 
> OK, SO IF I UNDERSTAND YOU PROPERLY... YOU MEAN : 
> 
> IF I'M GOING TO BACKUP SOMETHING PREVIOUSLY BACKED UP USING DELTA ENCODING OR 
> I'M BACKING UP A PATCH FILE GENERATED BY DELTA ENCODING OR THE OWN SIGNATURE 
> FILE OF A DELTA COPY, YOU SAY : 
> 
> - FIND THE SIGNATURE FILE OF THAT DELTA COPY GROUP, CALCULATE A SHA256 OF THE 
> SIGNATURE FILE PRESENT IN DISK 
> - Copy to a new variable in save_pkt structure the calculated sha256
> 
> - When a function using save_pkt will get called and see, that the new now 
> present, field inside structure with the sha256 of the file signature present 
> in disk (of a delta copy group), has the same value as the one stored sha256 
> in a database for the signature file we're checking, go on with the patch or 
> return a bRC_Error in that function?.
> 
> In general, you can use a couple of bytes with this technique. 
> 
> I DON'T UNDERSTAND YOU THIS LAST SENTENCE SORRY :)
> 
> I don't think you have enough space to store a file signature, you will
> have to use an other way to store it (a local file, a database record, ...) 
> 
> I WILL USE A SHA256 FOR INSTANCE... IT WOULD BE SMALLER AND IT WOULD PROVIDE 
> SAME GUARANTEE FOR KNOWING THE SIGNATURE FILE IS NOT MODIFIED WITHOUT 
> EXPECTED... (YOU KNOW FOR GENERATING PROPERLY THE PATCH)...
> 
>> - The last one :) . For restoring, and for the code seen (for instance
>> in insert_missing_delta()) I assume Bacula detects we are restoring a
>> delta compressed file. Then I assume Bacula restores apart from the own
>> initial file, patches to arrive to the day we want to restore to. Am I
>> wrong?.
> 
> This is correct, bacula will send back to the plugin the data that was
> produced. Up to the plugin to reassemble the data. If one delta piece is
> missing, the restore will stop to the last correct one. 
> 
> I SEE... SO I ASSUME WHEN RESTORING IT SAIS.... : 
> 
> - FILEA -> NORMAL FILE... 
> - FILEB -> NORMAL FILE... 
> - FILEC --> HEY! THIS HAS A DELTA_SEQ TO 0 SO THIS IS COPIED USING DELTA 
> ENCODING!! LET'S FIRST OF ALL, RESTORE THIS ONE FILE BUT LATER WE WILL 
> RESTORE ONE BY ONE EACH PATCH IF THIS DELTA GROUP. IS THIS CORRECT?. 
> 
> IF IT DOES THIS WAY... SURE... I COULD ONCE IN THE DISK RESTORED THE PATCH 
> (IN THE ENDBACKUPFILE() FOR INSTANCE) APPLY THE PATCH TO THE FILE ALREADY 
> RESTORED THAT CORRESPONDED TO DELTA_SEQ 0... 
> 
> WHEN APPLYING ALL PATCHES, I WILL REGENERATE THE SIGNATURE OF THE DELTA 
> COPIED FILE.... 
> 
>> Perhaps later in a post-restore job I could run a shell script
>> that tries to find patches pending to be applied to a parent file. I
>> suppose then I could apply and the backup would become finally restored.
>> Does some other more elegant way you could advise me?.
> 
> Normally, the plugin will receive patches one after the other, you can
> re-open the file on disk and apply the patch. It is also possible to
> store everything on disk and call a script to do the work at the end,
> it depends. 
> 
> I SEE... SO THE SUPPOSED IDEA WRITTEN ABOVE... OK OK.... 
> 
> Good luck, 
> 
> THANK YOU SO MUCH FOR YOUR NICE HELP ERIC. I WOULD REALLY LOVE TO GET 
> KNOWLEDGE OF THE BACULA PLUGIN API... :) 
> 
> CHEERS!!! 
> 
> Best Regards,
> Eric
_______________________________________________
Bacula-devel mailing list
Bacula-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-devel

Reply via email to