Re: xpatch command-line interface (was: Binary Patch (xpatch))

Timofei Zhakov Fri, 13 Dec 2024 06:06:46 -0800

(snip)


> I think this makes sense. That's what I wrote in a (rough draft) 'svn
> help patch' text for this:
>
> "  An xpatch file can be produced with the 'svn diff --xpatch' command.\n"
> "  This patch format is specific to Subversion and can represent all\n"
> "  types of working copy changes. It is applied with three-way merging\n"
> "  and conflict resolution.\n"
>
> > However, I'm changing my opinion about the CLI of xpatches'
> application/merging onto a WC. I realised that `svn merge --xpatch` is not
> the best and a little bit confusing for the users. In my opinion, the
> reason for that is that we should not rely the format detection onto the
> users. They should not even realise that they are going to apply an xpatch
> instead of a plaintext patch. So, I think that it's much simpler for the
> users to always invoke `svn patch`, for all kinds of patches, even if it
> would be a completely new format XYZ.
>
> This is my thinking as well. 'svn patch' has worked for unidiff and
> git's extended unidiff without needing the user to disambiguate, so
> from a user's perspective, it makes sense that it should work for
> xpatches also.
>
> > I also thought about how this would be implemented in the GUI clients,
> and didn't find anything better than automatic detection of the patch
> format, and utilising either unidif or xpatch applier.
> >
> > Finalising the points above, I'd suggest to do the xpatch application
> through the `svn patch patchfile.xpatch` command. I think this would be
> best for the users.
> >
> > However, this approach might cause a few technical problems; For
> example, how to detect the format?
>
> One issue is that currently, svn_diff_open_patch_file() opens a file
> (expected to be unidiff or git unidiff) and populates a
> svn_patch_file_t. This is passed to svn_diff_parse_next_patch(), which
> runs a state machine that automatically switches to git unidiff format
> if it sees lines only found in those. It's one implementation for both
> (somewhat related) formats. That's nice, but a XML-formatted patch
> would be too different to handle with the same parser.
>
> My initial idea was to insert a new initial state in the patch parsing
> state machine (transitions[] array in libsvn_diff/parse-diff.c) to
> look for an indication of xpatch format. Unfortunately, that would
> look for an exact match. What if there's an extra whitespace in the
> XML tag? Also, unfortunately, if it is detected, quite a lot of code
> churn would need to be done to divert processing to a different
> parser. I think it's ugly.
>
> My next thought was higher up in the call stack, in apply_patch() (in
> libsvn_client/patch.c). Here, svn_diff_open_patch_file() opens the
> file and populates a svn_patch_file_t. There could be a call added
> immediately after, to a function that probes the file for indication
> of xpatch. This would read the beginning of the file and look for the
> xpatch indication. If it's there, then the svn_patch_file_t gets
> transmuted into a new struct, such as svn_xpatch_file_t, and
> processing is diverted elsewhere, with the file still open. Otherwise,
> it's not xpatch, the file is rewound back to the beginning and the
> existing loop is allowed to run. Actually, we don't even need to
> rewind the file because one of the first things
> svn_diff_parse_next_patch() does is to fseek to
> patch_file->next_patch_offset. So in the unidiff case, we only need to
> make sure that this is 0. AFAICT this is always the case right now,
> since svn_diff_open_patch_file() already initializes it to 0, but I
> think the probing function should explicitly do it to future-proof the
> code.
>
> > I think that the xpatch files should begin with a line that describes
> it. For example, there is a standard tag in XML called "Processing
> Instruction" [1]. It specifies that below an XML stream is going. Usually
> it begins with a line like this:
> >
> > [[[
> > <?xml version="1.0" encoding="UTF-8" ?>
> > ]]]
> >
> > However, the wiki-page at [1], explains that it is allowed to use any
> tag as for the processing instruction. I think we may begin each xpatch
> with a line presented below:
> >
> > [[[
> > <?xml-patch encoding="UTF-8" version="42" ?>
> > ...or...
> > <?xpatch encoding="UTF-8" version="42" ?>
> > ]]]
> >
> > Here the 'encoding' attribute specifies the encoding used for the tags,
> and the 'version' will let us know whether we support it or not. We can add
> any other attribute, as needed.
> >
> > So to detect if it is an xpatch or not we can just compare the first
> line. Isn't it?
>
> I don't know; hopefully someone else here knows? Meanwhile, I'll try
> to research this.
>
> > The problem is that a unidiff patch can begin with it. For example, this
> is a good practice to write a log-message before the patch itself, and it
> should not be interpreted by the client as a content. This means that each
> plain and unidiff patch, beginning with this line, as a log-message, will
> be rejected by the client.
>
> I agree that beginning a log-message with this kind of text would be
> highly unusual but...
>
> > Probably we can deal with that and interpret those patches as 'xpatch',
> since it's very unusual to begin a log-message with this line, but we
> should take it in mind.
> >
> > Additionally, we may add an argument to the `svn patch` command, so
> users will be able to specify the kind explicitly, for example, `svn patch
> patchfile.patch --format unidiff` will apply it as a unidiff patch, even if
> it begins with the xpatch marker.
>
> ...an optional flag to explicitly specify the format could provide a
> workaround for that edge case. Also, it might be useful when used with
> scripting, if the script author knows that a patch must be a certain
> format and they want to reject all other formats.
>
> Now, unidiff can start with a log message, but can xpatch do that? I
> think the xml processing instruction must be first in the file. So,
> perhaps the log can be provided with some sort of tags within the xml
> structure? Maybe we provide an empty <log>...</log> that users can
> optionally fill in with their $EDITOR?
>
> > Thoughts?
>
> That's it for now... I'll try to find a definitive answer about the
> xml processing instruction.
>
> >
> > Thanks to Nathan for writing the "help" docstrings. They add much more
> sense to the feature in general. Are you going to describe xpatch in the
> `svn patch` command?
>
> That was the one I wrote already, but it's incomplete. I'll write the
> one for 'svn diff --xpatch' soon.
>
> > [1] https://en.wikipedia.org/wiki/Processing_Instruction
> >
> > --
> > Timofei Zhakov
>
> Cheers,
> Nathan
>

Hi Nathan,

The thing is that xpatch format is going to be completely different from
unidiff patch files, either plain Unix format or Git extended patches. The
unidiff patches include only changes in each text/(sometimes binary) files,
while xpatches should save the entire work of the svn_diff_tree_processor_t
drivers.

In other words, all unidiff patches and Git extended patches are both doing
the same thing. They both are made to read the next patch (svn_patch_t)
from svn_patch_file_t struct. Each patch file can be opened from an
absolute path to it using svn_diff_open_patch_file(), and then closed using
svn_diff_close_patch_file().

The xpatch applyer may be implemented using an XML push-parser. It may work
out the sequence like this:
xpatch-file->svn_stream_t->expat->svn_diff_tree_processor_t->apply-processor.

Note: since we need to actually "push" the content of the file to an XML
parser, an svn_stream_copy3() function has to be invoked in order to wrap a
file into a stream.

Following from this, I think patch and xpatch are completely different, on
the side of their implementations. I think the format detection has to be
done before the client starts applying it. For example, cmdline client can
open the patch file, when `svn patch` has been invoked, detect the format
using a function such svn_client_patch_detect_format(apr_file_t
*patch_file, svn_client_patch_kind_t *kind), then it will apply either
patch or xpatch.

However, I noticed a little problem in the implementation of patching; The
svn_client_patch() function accesses the patchfile by an absolute path
instead of a handle to a file. This follows that the file will be opened
twice. Is it possible to modify it, so the svn_client_patch() function will
accept apr_file_t or a svn_stream_t instead of an absolute path to this
file?

-- 
Timofei Zhakov

Re: xpatch command-line interface (was: Binary Patch (xpatch))

Reply via email to