Re: xpatch command-line interface (was: Binary Patch (xpatch))

Nathan Hartman Sun, 15 Dec 2024 13:40:45 -0800

On Fri, Dec 13, 2024 at 5:01 PM Nathan Hartman <hartman.nat...@gmail.com> wrote:
>
> On Fri, Dec 13, 2024 at 9:06 AM Timofei Zhakov <t...@chemodax.net> wrote:
> >
> > (snip)
> >
> >>
> >> I think this makes sense. That's what I wrote in a (rough draft) 'svn
> >> help patch' text for this:
> >>
> >> "  An xpatch file can be produced with the 'svn diff --xpatch' command.\n"
> >> "  This patch format is specific to Subversion and can represent all\n"
> >> "  types of working copy changes. It is applied with three-way merging\n"
> >> "  and conflict resolution.\n"
> >>
> >> > However, I'm changing my opinion about the CLI of xpatches' 
> >> > application/merging onto a WC. I realised that `svn merge --xpatch` is 
> >> > not the best and a little bit confusing for the users. In my opinion, 
> >> > the reason for that is that we should not rely the format detection onto 
> >> > the users. They should not even realise that they are going to apply an 
> >> > xpatch instead of a plaintext patch. So, I think that it's much simpler 
> >> > for the users to always invoke `svn patch`, for all kinds of patches, 
> >> > even if it would be a completely new format XYZ.
> >>
> >> This is my thinking as well. 'svn patch' has worked for unidiff and
> >> git's extended unidiff without needing the user to disambiguate, so
> >> from a user's perspective, it makes sense that it should work for
> >> xpatches also.
> >>
> >> > I also thought about how this would be implemented in the GUI clients, 
> >> > and didn't find anything better than automatic detection of the patch 
> >> > format, and utilising either unidif or xpatch applier.
> >> >
> >> > Finalising the points above, I'd suggest to do the xpatch application 
> >> > through the `svn patch patchfile.xpatch` command. I think this would be 
> >> > best for the users.
> >> >
> >> > However, this approach might cause a few technical problems; For 
> >> > example, how to detect the format?
> >>
> >> One issue is that currently, svn_diff_open_patch_file() opens a file
> >> (expected to be unidiff or git unidiff) and populates a
> >> svn_patch_file_t. This is passed to svn_diff_parse_next_patch(), which
> >> runs a state machine that automatically switches to git unidiff format
> >> if it sees lines only found in those. It's one implementation for both
> >> (somewhat related) formats. That's nice, but a XML-formatted patch
> >> would be too different to handle with the same parser.
> >>
> >> My initial idea was to insert a new initial state in the patch parsing
> >> state machine (transitions[] array in libsvn_diff/parse-diff.c) to
> >> look for an indication of xpatch format. Unfortunately, that would
> >> look for an exact match. What if there's an extra whitespace in the
> >> XML tag? Also, unfortunately, if it is detected, quite a lot of code
> >> churn would need to be done to divert processing to a different
> >> parser. I think it's ugly.
> >>
> >> My next thought was higher up in the call stack, in apply_patch() (in
> >> libsvn_client/patch.c). Here, svn_diff_open_patch_file() opens the
> >> file and populates a svn_patch_file_t. There could be a call added
> >> immediately after, to a function that probes the file for indication
> >> of xpatch. This would read the beginning of the file and look for the
> >> xpatch indication. If it's there, then the svn_patch_file_t gets
> >> transmuted into a new struct, such as svn_xpatch_file_t, and
> >> processing is diverted elsewhere, with the file still open. Otherwise,
> >> it's not xpatch, the file is rewound back to the beginning and the
> >> existing loop is allowed to run. Actually, we don't even need to
> >> rewind the file because one of the first things
> >> svn_diff_parse_next_patch() does is to fseek to
> >> patch_file->next_patch_offset. So in the unidiff case, we only need to
> >> make sure that this is 0. AFAICT this is always the case right now,
> >> since svn_diff_open_patch_file() already initializes it to 0, but I
> >> think the probing function should explicitly do it to future-proof the
> >> code.
> >>
> >> > I think that the xpatch files should begin with a line that describes 
> >> > it. For example, there is a standard tag in XML called "Processing 
> >> > Instruction" [1]. It specifies that below an XML stream is going. 
> >> > Usually it begins with a line like this:
> >> >
> >> > [[[
> >> > <?xml version="1.0" encoding="UTF-8" ?>
> >> > ]]]
> >> >
> >> > However, the wiki-page at [1], explains that it is allowed to use any 
> >> > tag as for the processing instruction. I think we may begin each xpatch 
> >> > with a line presented below:
> >> >
> >> > [[[
> >> > <?xml-patch encoding="UTF-8" version="42" ?>
> >> > ...or...
> >> > <?xpatch encoding="UTF-8" version="42" ?>
> >> > ]]]
> >> >
> >> > Here the 'encoding' attribute specifies the encoding used for the tags, 
> >> > and the 'version' will let us know whether we support it or not. We can 
> >> > add any other attribute, as needed.
> >> >
> >> > So to detect if it is an xpatch or not we can just compare the first 
> >> > line. Isn't it?
> >>
> >> I don't know; hopefully someone else here knows? Meanwhile, I'll try
> >> to research this.
> >>
> >> > The problem is that a unidiff patch can begin with it. For example, this 
> >> > is a good practice to write a log-message before the patch itself, and 
> >> > it should not be interpreted by the client as a content. This means that 
> >> > each plain and unidiff patch, beginning with this line, as a 
> >> > log-message, will be rejected by the client.
> >>
> >> I agree that beginning a log-message with this kind of text would be
> >> highly unusual but...
> >>
> >> > Probably we can deal with that and interpret those patches as 'xpatch', 
> >> > since it's very unusual to begin a log-message with this line, but we 
> >> > should take it in mind.
> >> >
> >> > Additionally, we may add an argument to the `svn patch` command, so 
> >> > users will be able to specify the kind explicitly, for example, `svn 
> >> > patch patchfile.patch --format unidiff` will apply it as a unidiff 
> >> > patch, even if it begins with the xpatch marker.
> >>
> >> ...an optional flag to explicitly specify the format could provide a
> >> workaround for that edge case. Also, it might be useful when used with
> >> scripting, if the script author knows that a patch must be a certain
> >> format and they want to reject all other formats.
> >>
> >> Now, unidiff can start with a log message, but can xpatch do that? I
> >> think the xml processing instruction must be first in the file. So,
> >> perhaps the log can be provided with some sort of tags within the xml
> >> structure? Maybe we provide an empty <log>...</log> that users can
> >> optionally fill in with their $EDITOR?
> >>
> >> > Thoughts?
> >>
> >> That's it for now... I'll try to find a definitive answer about the
> >> xml processing instruction.
> >>
> >> >
> >> > Thanks to Nathan for writing the "help" docstrings. They add much more 
> >> > sense to the feature in general. Are you going to describe xpatch in the 
> >> > `svn patch` command?
> >>
> >> That was the one I wrote already, but it's incomplete. I'll write the
> >> one for 'svn diff --xpatch' soon.
> >>
> >> > [1] https://en.wikipedia.org/wiki/Processing_Instruction
> >> >
> >> > --
> >> > Timofei Zhakov
> >>
> >> Cheers,
> >> Nathan
> >
> >
> > Hi Nathan,
> >
> > The thing is that xpatch format is going to be completely different from 
> > unidiff patch files, either plain Unix format or Git extended patches. The 
> > unidiff patches include only changes in each text/(sometimes binary) files, 
> > while xpatches should save the entire work of the svn_diff_tree_processor_t 
> > drivers.
> >
> > In other words, all unidiff patches and Git extended patches are both doing 
> > the same thing. They both are made to read the next patch (svn_patch_t) 
> > from svn_patch_file_t struct. Each patch file can be opened from an 
> > absolute path to it using svn_diff_open_patch_file(), and then closed using 
> > svn_diff_close_patch_file().
> >
> > The xpatch applyer may be implemented using an XML push-parser. It may work 
> > out the sequence like this: 
> > xpatch-file->svn_stream_t->expat->svn_diff_tree_processor_t->apply-processor.
> >
> > Note: since we need to actually "push" the content of the file to an XML 
> > parser, an svn_stream_copy3() function has to be invoked in order to wrap a 
> > file into a stream.
> >
> > Following from this, I think patch and xpatch are completely different, on 
> > the side of their implementations. I think the format detection has to be 
> > done before the client starts applying it. For example, cmdline client can 
> > open the patch file, when `svn patch` has been invoked, detect the format 
> > using a function such svn_client_patch_detect_format(apr_file_t 
> > *patch_file, svn_client_patch_kind_t *kind), then it will apply either 
> > patch or xpatch.
> >
> > However, I noticed a little problem in the implementation of patching; The 
> > svn_client_patch() function accesses the patchfile by an absolute path 
> > instead of a handle to a file. This follows that the file will be opened 
> > twice. Is it possible to modify it, so the svn_client_patch() function will 
> > accept apr_file_t or a svn_stream_t instead of an absolute path to this 
> > file?
> >
> > --
> > Timofei Zhakov
>
>
> I looked at the patches in the other thread [1] but so far I only had
> time to give them a cursory look. So some of my thoughts below might
> be totally off. Nevertheless... my initial thoughts:
>
> Yes, the patch API probably will need a new revision.
>
> However, before making the new revision, we should study the merge API
> because xpatches will trigger a merge behind the scenes, which in turn
> can trigger conflict resolution. If run from a GUI, the GUI will need
> to hook into the interactive conflict resolution.
>
> Also, we briefly spoke about a possible (optional) cmdline switch to
> tell the patch API the type of patch file. So the patch API needs a
> way to pass that information.
>
> Finally, I think it is better if the patch API continues to take an
> absolute path to the patch file, if possible.
>
> But, how to handle:
> * The totally different patch file formats (unidiff vs xpatch)
> * The need for different parsers and different code paths
> * Avoid opening the file twice (once to detect format, second time to
>   actually parse it)
>
> An idea that I tried to describe in my previous email -- I will try to
> explain it better here, but tell me if I'm missing something:


Some corrections to what I wrote before...

> First, apply_patches opens the file, but instead of using a struct
> svn_patch_file_t, it uses a plain FILE pointer, or perhaps a special
> struct.

Not a plain FILE pointer. We don't do that around here. :-) So, an
apr_file_t * file.

> Then it calls a function that probes the contents of the patch file,
> until it recognizes something that indicates if this is unidiff or
> xpatch. (If the user told us what kind of file, we skip this step
> because we already know the file type.)
>
> If unidiff, it will create a struct svn_patch_file_t, store the FILE
> pointer there, ensure next_patch_offset is 0, and call the unidiff
> code, which is the loop that currently lives at apply_patch() which
> calls svn_diff_parse_next_patch().
>
> If xpatch, it will create a different struct that doesn't exist yet.
> Earlier I suggested a name like svn_xpatch_file_t but you can choose
> any suitable name. We need to convert a plain FILE pointer to
> svn_stream_t -- I am not sure if a function to do this exists already,
> so I need to look into this further; for some reason, I can't find the
> definition right now -- and then call the xpatch code.

It is defined in libsvn_subr/stream.c. I couldn't find it before
because I searched *.h files. Looks like it's in a source file for
data hiding of the struct's contents.

And, yes, there is a way to make a stream from a apr_file_t:

/* Helper function that creates a stream from an APR file. */
static svn_stream_t *
make_stream_from_apr_file(apr_file_t *file,
                          svn_boolean_t disown,
                          svn_boolean_t supports_seek,
                          svn_boolean_t truncate_on_seek,
                          apr_pool_t *pool)

So I think something similar to the idea I suggested before could
work, if you agree that it's a good idea. :-)

> So, the file is only opened once, but we can call different parsers.
>
> Maybe the rest of apply_patch should be refactored to a new function,
> so apply_patch() would only decide the format and direct execution to
> one of the two alternatives.
>
> I have to run for a little while but I'll study the code more in-depth
> and try to provide better feedback soon.
>
> Meanwhile, just wanted to articulate those thoughts a little better...
>
> Hope this helps (let me know!),
> Nathan

Hope this helps...
Nathan

Re: xpatch command-line interface (was: Binary Patch (xpatch))

Reply via email to