Hi Eric!!
Sorry for answering so late. I answer between lines in color green bold
for instance, for better clarification.
El 2022-03-08 09:56, Eric Bollengier escribió:
> Hello,
>
> On 3/7/22 12:44, egoitz--- via Bacula-devel wrote:
>
>> Hi!,
>>
>> After digging in plugin building documentation and checking the provided
>> examples, I have some doubts I have not been able to clarify by my own.
>> I describe them below in case you could give me a hand :). I would be
>> very thankful if you could help me a little clarifying this doubts :)
>
> I very pleased to see someone interested to develop Bacula plugins,
> specially around data compression.
>
> I WROTE SOMETIME AGO TOO A VERY LITTLE PATCH FOR BACULA, FOR SPEEDING UP THE
> BVFS_CACHE ON POSTGRESQL (IT HAPPENED AT LEAST IN THE POSTGRESQL VERSION WE
> WERE USING AT LEAST IN THAT MOMENT). VERY HAPPY AND PROUD OF DOING THAT AND
> THIS NEW ONE :) .
>
> MY PLAN IS TO FIRST WRITE ABOUT DATA COMPRESSION BUT I WOULD LOVE TOO... TO
> KNOW BETTER THE BACULA FD PLUGIN API, IN ORDER TO WRITE SOME OTHER CUSTOM
> PLUGINS FOR US... THE DELTA ONE, IS IMPORTANT FOR US, BUT IT'S TOO THE FACT
> OF LEARNING HOW THE API WORKS. YOU KNOW, KNOWLEDGE ENDS UP BEING FLEXIBILITY
> AND SO... AND BECAUSE I WANTED TOO DOING IT TOO REALLY :) :) :) :) LOL
>
>> - I'm working in trying to create an open source version of the delta
>> encoding plugin by using the bacula-fd plugin api. When working on it I
>> have seen Bacula's source is aware of delta and delta file
>> sequentiation.
>
> Yes, Bacula can manage "delta" or "patches" for a given file. The first
> Full backup should take the entire file, the delta_seq will be 0.
>
> YEP I SUPPOSED BY THE CODE READ...
>
> I ASSUME NOW IT'S JUST FOR REGISTRATION AND HELPING DEVELOPERS :) :) :)
>
> BUT BASICALLY PREVIOUSLY WANTED TO KNOW IF YOU WERE JUST STORING THE DELTA
> SEQUENCE, BECAUSE DELTA ENCONDING IS USEFUL IF YOU NEED IT OR YOU WERE
> STORING THAT DELTA DATA FOR LATER USING IT WITH SOME OTHER CODE THAT I DON'T
> HAVE OR WHATEVER....
>
> During the first Incr or the Differential, the plugin can generate a
> "patch" (can be done with "diff", or "xdelta", or rsync, or whatever).
>
> OK, YES I SUPPOSE I WILL USE RDIFF DIRECTLY. THE BINARY BUILT AND PROVIDED
> WITH LIBRSYNC BY DEFAULT. IT SEEMS TO RUN PRETTY FINE EVEN ON WINDOWS WITH
> MSYS2.ORG PACKAGE SO....
>
> This new files is based on the original file, and the delta_seq will
> be set to 1 automatically (Accurate mode should be turned on).
>
> YEP I HAVE READ BEFORE THIS TOO...
>
> The plugin
> must set a variable in the save_pkt to indicates that it has saved a
> patch,
>
> I ASSUME, YOU ARE TALKING ABOUT THE PRESENCE OF THE FO_DELTA FLAG IN
> SP->FLAGS?
>
> and at the restore time, bacula must send back the version of
> the file included in the Full, then in Diff (if any) and all Incrementals.
>
> YES THIS PART IT'S CLEAR TOO...
>
> On the plugin side, the restore code will be called for each part of
> the file.
> OK, SO EACH THE INITIAL FILE COPIED USING DELTA PLUS ALL IT'S PATCHES WILL
> GET RESTORED AS TOTALLY NORMAL FILES. FOR INSTANCE, WHEN BACULA IS GOING TO
> RESTORE A FILE WHICH IS BACKED UP USING DELTA ENCODING, BACULA WILL DO :
> - RESTORE INITIAL FILE
> - RESTORE FIRST PATCH
> - RESTORE SECOND PATCH
> - AND SO ON...?
>
>> I have seen for instance, even a .bvfs command exists for
>> showing deltas of a file id. But, what I have not found is that Bacula
>> works on that Delta files generation (patch generation, signatures,
>> etc...). I assume that Bacula in the non-fd part, acts just as a just
>> delta file holder keeping the files and stores the patch sequentiation
>> just that. Bacula keeps records of deltas in database (and file
>> storages) but only fd works with them (with probably a library like
>> librsync in the delta plugin) in the sense of applying patches over an
>> original file or even generating deltas when backup. Am I wrong?. Was
>> just for understanding the nice work done and what's already written and
>> free in Bacula's source for this purpose.
>
> I think that you got the concept, the "delta" in Bacula means that you
> must restore all parts of a file, and not just the last copy.
>
> With a program such as VMware, CBT helps the backup software to generate
> "patches" as well for example.
>
> YEP I HAVE READ ABOUT CBT TOO... AND EVEN I HAVE DONE SOME WORK FOR BACKING
> IP XCP-NG USING DELTAS OF VM :) (THERE IN XCP-NG WITHOUT CBT :) )
>
>> - By the way, I have one question about virtual files. I have not seen
>> very clear (perhaps my problem as don't understand it) how to work with
>> them. I understand the concept, but have not seen a clear example of how
>> for instance in the backup you create a virtual file, how do you see it
>> in bvfs and finally... what you get after restoring. In page 36/146 of
>> Bacula 11 for developers pdf, you say "This will create a virtual file."
>> but really you are entering in the structure :
>>
>> sp->type = FT_REG;
>> sp->statp.st_mode = 0700 | S_IFREG;
>>
>> FT_REG and S_IFREG both are for regular files.... what exactly causes a
>> virtual file to be created?. Perhaps st_size -1?.
>
> A virtual file is generated by plugins, they don't have to exist on disk.
> The name can be anything, it can also point to an existing file.
>
> The plugin code will be executed to restore the "virtual file", the result
> can be a real file on disk, or a virtual machine on Proxmox for example.
>
> In this example, we have a regular file, but it's a virtual file that may
> or may not exist on the filesystem.
>
> BUT, ONE THING.... I DON'T HAVE CLEAR, HOW BACULA DISTINGUISHES IF IT'S
> STORING A VIRTUAL FILE OR A NORMAL ONE. WHICH FIELD OR PARAMETER SHOULD YOU
> SET AND HOW IN
> THE SAVE_PKT STRUCTURE?. FIRST I THOULD IT COULD BE USING A SIZE OF -1 IN THE
> SAVE_PKT STRUCTURE OR... USING SOME KIND OF FLAG OR... HOW DOES BACULA KNOW
> IT'S DEALING IN THE STRUCTURES WITH A VIRTUAL OR NORMAL FILE?
>
> OK... I GOT THIS OTHER PART... SO I SUPPOSE IT DOES NOT APPLY FOR DELTA
> ENCODING BECAUSE HERE BACULA, HAS ALREADY DEFINED VARIABLES LIKE DELTA
> SEQUENCE AND SO... FOR HANDLING DELTA ENCONDING BACKUP AND RESTORE... BUT FOR
> EXAMPLE IF YOU WANTED TO GROUP FOR INSTANCE FIVE FILES TO GET RESTORED... YOU
> COULD SAY THEY ARE A VIRTUAL FILE AND WHEN YOU RESTORE THAT VIRTUAL FILE, THE
> FIVE FILES WILL APPEAR IN THE DISK....
>
> COULD YOU PLEASE ERIC, TELL ME HOW YOU SET IN THE STRUCTURE YOU ARE DEALING
> WITH A VIRTUAL FILE?.
>
>> Are they relevant for what I'm trying to do?. It seems Bacula handles
>> delta sequentiation so... perhaps for this purpose I shouldn't need
>> "virtual files"?.
>
> In your case, it will be virtual files that points to regular files.
>
> BUT YOU SAID BACULA WOULD CALL FOR INSTANCE THE ENDBACKUPFILE() FOR EACH FILE
> BACKED WITH DELTA ENCODING. I MEAN, WOULD IT RESTORE THE WAY DESCRIBED SOME
> LINES ABOVE?. BECAUSE IF IT WOULD DO THIS WAY :
>
>> - RESTORE INITIAL FILE
>
>> - RESTORE FIRST PATCH
>
>> - RESTORE SECOND PATCH
>
>> - AND SO ON...?
>
> I ASSUME THERE'S NO SENSE HERE ON USING VIRTUAL FILES, AM I WRONG?. HAVE I
> UNDERSTAND SOMETHING WRONG?.
>
>> - I'm planning to implement delta encoding by checking the previous day
>> file signature done by librsync. Instead of looking at the filesystem it
>> would be nice if I could take a look at that signature in the last
>> backup done (yesterday backup). Could it be possible in some manner,
>> that if I see a file passed in EventHandleBackupFile() to check if
>> yesterdays signature exists in the backup of yesterday, and then read
>> the yesterday signature from the own backup?. I mean, instead of having
>> to leave the signature in the being backed server's filesystem.
>
> HERE COMES AN INTERESTING PART :)
>
> You can store information in the save_pkt structure and the plugin can check
> the last version of that information with the accurate mode.
>
> I WILL CHECK THE SIGNATURE FILE IS THE EXPECTED WAY WITH A SHA256 SUM. I'M
> NOT GOING TO COPY THE WHOLE SIGNATURE FILE...
>
> OK, SO IF I UNDERSTAND YOU PROPERLY... YOU MEAN :
>
> IF I'M GOING TO BACKUP SOMETHING PREVIOUSLY BACKED UP USING DELTA ENCODING OR
> I'M BACKING UP A PATCH FILE GENERATED BY DELTA ENCODING OR THE OWN SIGNATURE
> FILE OF A DELTA COPY, YOU SAY :
>
> - FIND THE SIGNATURE FILE OF THAT DELTA COPY GROUP, CALCULATE A SHA256 OF THE
> SIGNATURE FILE PRESENT IN DISK
> - Copy to a new variable in save_pkt structure the calculated sha256
>
> - When a function using save_pkt will get called and see, that the new now
> present, field inside structure with the sha256 of the file signature present
> in disk (of a delta copy group), has the same value as the one stored sha256
> in a database for the signature file we're checking, go on with the patch or
> return a bRC_Error in that function?.
>
> In general, you can use a couple of bytes with this technique.
>
> I DON'T UNDERSTAND YOU THIS LAST SENTENCE SORRY :)
>
> I don't think you have enough space to store a file signature, you will
> have to use an other way to store it (a local file, a database record, ...)
>
> I WILL USE A SHA256 FOR INSTANCE... IT WOULD BE SMALLER AND IT WOULD PROVIDE
> SAME GUARANTEE FOR KNOWING THE SIGNATURE FILE IS NOT MODIFIED WITHOUT
> EXPECTED... (YOU KNOW FOR GENERATING PROPERLY THE PATCH)...
>
>> - The last one :) . For restoring, and for the code seen (for instance
>> in insert_missing_delta()) I assume Bacula detects we are restoring a
>> delta compressed file. Then I assume Bacula restores apart from the own
>> initial file, patches to arrive to the day we want to restore to. Am I
>> wrong?.
>
> This is correct, bacula will send back to the plugin the data that was
> produced. Up to the plugin to reassemble the data. If one delta piece is
> missing, the restore will stop to the last correct one.
>
> I SEE... SO I ASSUME WHEN RESTORING IT SAIS.... :
>
> - FILEA -> NORMAL FILE...
> - FILEB -> NORMAL FILE...
> - FILEC --> HEY! THIS HAS A DELTA_SEQ TO 0 SO THIS IS COPIED USING DELTA
> ENCODING!! LET'S FIRST OF ALL, RESTORE THIS ONE FILE BUT LATER WE WILL
> RESTORE ONE BY ONE EACH PATCH IF THIS DELTA GROUP. IS THIS CORRECT?.
>
> IF IT DOES THIS WAY... SURE... I COULD ONCE IN THE DISK RESTORED THE PATCH
> (IN THE ENDBACKUPFILE() FOR INSTANCE) APPLY THE PATCH TO THE FILE ALREADY
> RESTORED THAT CORRESPONDED TO DELTA_SEQ 0...
>
> WHEN APPLYING ALL PATCHES, I WILL REGENERATE THE SIGNATURE OF THE DELTA
> COPIED FILE....
>
>> Perhaps later in a post-restore job I could run a shell script
>> that tries to find patches pending to be applied to a parent file. I
>> suppose then I could apply and the backup would become finally restored.
>> Does some other more elegant way you could advise me?.
>
> Normally, the plugin will receive patches one after the other, you can
> re-open the file on disk and apply the patch. It is also possible to
> store everything on disk and call a script to do the work at the end,
> it depends.
>
> I SEE... SO THE SUPPOSED IDEA WRITTEN ABOVE... OK OK....
>
> Good luck,
>
> THANK YOU SO MUCH FOR YOUR NICE HELP ERIC. I WOULD REALLY LOVE TO GET
> KNOWLEDGE OF THE BACULA PLUGIN API... :)
>
> CHEERS!!!
>
> Best Regards,
> Eric
_______________________________________________
Bacula-devel mailing list
Bacula-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-devel