On Fri, Dec 13, 2024 at 5:01 PM Nathan Hartman <hartman.nat...@gmail.com> wrote: > > On Fri, Dec 13, 2024 at 9:06 AM Timofei Zhakov <t...@chemodax.net> wrote: > > > > (snip) > > > >> > >> I think this makes sense. That's what I wrote in a (rough draft) 'svn > >> help patch' text for this: > >> > >> " An xpatch file can be produced with the 'svn diff --xpatch' command.\n" > >> " This patch format is specific to Subversion and can represent all\n" > >> " types of working copy changes. It is applied with three-way merging\n" > >> " and conflict resolution.\n" > >> > >> > However, I'm changing my opinion about the CLI of xpatches' > >> > application/merging onto a WC. I realised that `svn merge --xpatch` is > >> > not the best and a little bit confusing for the users. In my opinion, > >> > the reason for that is that we should not rely the format detection onto > >> > the users. They should not even realise that they are going to apply an > >> > xpatch instead of a plaintext patch. So, I think that it's much simpler > >> > for the users to always invoke `svn patch`, for all kinds of patches, > >> > even if it would be a completely new format XYZ. > >> > >> This is my thinking as well. 'svn patch' has worked for unidiff and > >> git's extended unidiff without needing the user to disambiguate, so > >> from a user's perspective, it makes sense that it should work for > >> xpatches also. > >> > >> > I also thought about how this would be implemented in the GUI clients, > >> > and didn't find anything better than automatic detection of the patch > >> > format, and utilising either unidif or xpatch applier. > >> > > >> > Finalising the points above, I'd suggest to do the xpatch application > >> > through the `svn patch patchfile.xpatch` command. I think this would be > >> > best for the users. > >> > > >> > However, this approach might cause a few technical problems; For > >> > example, how to detect the format? > >> > >> One issue is that currently, svn_diff_open_patch_file() opens a file > >> (expected to be unidiff or git unidiff) and populates a > >> svn_patch_file_t. This is passed to svn_diff_parse_next_patch(), which > >> runs a state machine that automatically switches to git unidiff format > >> if it sees lines only found in those. It's one implementation for both > >> (somewhat related) formats. That's nice, but a XML-formatted patch > >> would be too different to handle with the same parser. > >> > >> My initial idea was to insert a new initial state in the patch parsing > >> state machine (transitions[] array in libsvn_diff/parse-diff.c) to > >> look for an indication of xpatch format. Unfortunately, that would > >> look for an exact match. What if there's an extra whitespace in the > >> XML tag? Also, unfortunately, if it is detected, quite a lot of code > >> churn would need to be done to divert processing to a different > >> parser. I think it's ugly. > >> > >> My next thought was higher up in the call stack, in apply_patch() (in > >> libsvn_client/patch.c). Here, svn_diff_open_patch_file() opens the > >> file and populates a svn_patch_file_t. There could be a call added > >> immediately after, to a function that probes the file for indication > >> of xpatch. This would read the beginning of the file and look for the > >> xpatch indication. If it's there, then the svn_patch_file_t gets > >> transmuted into a new struct, such as svn_xpatch_file_t, and > >> processing is diverted elsewhere, with the file still open. Otherwise, > >> it's not xpatch, the file is rewound back to the beginning and the > >> existing loop is allowed to run. Actually, we don't even need to > >> rewind the file because one of the first things > >> svn_diff_parse_next_patch() does is to fseek to > >> patch_file->next_patch_offset. So in the unidiff case, we only need to > >> make sure that this is 0. AFAICT this is always the case right now, > >> since svn_diff_open_patch_file() already initializes it to 0, but I > >> think the probing function should explicitly do it to future-proof the > >> code. > >> > >> > I think that the xpatch files should begin with a line that describes > >> > it. For example, there is a standard tag in XML called "Processing > >> > Instruction" [1]. It specifies that below an XML stream is going. > >> > Usually it begins with a line like this: > >> > > >> > [[[ > >> > <?xml version="1.0" encoding="UTF-8" ?> > >> > ]]] > >> > > >> > However, the wiki-page at [1], explains that it is allowed to use any > >> > tag as for the processing instruction. I think we may begin each xpatch > >> > with a line presented below: > >> > > >> > [[[ > >> > <?xml-patch encoding="UTF-8" version="42" ?> > >> > ...or... > >> > <?xpatch encoding="UTF-8" version="42" ?> > >> > ]]] > >> > > >> > Here the 'encoding' attribute specifies the encoding used for the tags, > >> > and the 'version' will let us know whether we support it or not. We can > >> > add any other attribute, as needed. > >> > > >> > So to detect if it is an xpatch or not we can just compare the first > >> > line. Isn't it? > >> > >> I don't know; hopefully someone else here knows? Meanwhile, I'll try > >> to research this. > >> > >> > The problem is that a unidiff patch can begin with it. For example, this > >> > is a good practice to write a log-message before the patch itself, and > >> > it should not be interpreted by the client as a content. This means that > >> > each plain and unidiff patch, beginning with this line, as a > >> > log-message, will be rejected by the client. > >> > >> I agree that beginning a log-message with this kind of text would be > >> highly unusual but... > >> > >> > Probably we can deal with that and interpret those patches as 'xpatch', > >> > since it's very unusual to begin a log-message with this line, but we > >> > should take it in mind. > >> > > >> > Additionally, we may add an argument to the `svn patch` command, so > >> > users will be able to specify the kind explicitly, for example, `svn > >> > patch patchfile.patch --format unidiff` will apply it as a unidiff > >> > patch, even if it begins with the xpatch marker. > >> > >> ...an optional flag to explicitly specify the format could provide a > >> workaround for that edge case. Also, it might be useful when used with > >> scripting, if the script author knows that a patch must be a certain > >> format and they want to reject all other formats. > >> > >> Now, unidiff can start with a log message, but can xpatch do that? I > >> think the xml processing instruction must be first in the file. So, > >> perhaps the log can be provided with some sort of tags within the xml > >> structure? Maybe we provide an empty <log>...</log> that users can > >> optionally fill in with their $EDITOR? > >> > >> > Thoughts? > >> > >> That's it for now... I'll try to find a definitive answer about the > >> xml processing instruction. > >> > >> > > >> > Thanks to Nathan for writing the "help" docstrings. They add much more > >> > sense to the feature in general. Are you going to describe xpatch in the > >> > `svn patch` command? > >> > >> That was the one I wrote already, but it's incomplete. I'll write the > >> one for 'svn diff --xpatch' soon. > >> > >> > [1] https://en.wikipedia.org/wiki/Processing_Instruction > >> > > >> > -- > >> > Timofei Zhakov > >> > >> Cheers, > >> Nathan > > > > > > Hi Nathan, > > > > The thing is that xpatch format is going to be completely different from > > unidiff patch files, either plain Unix format or Git extended patches. The > > unidiff patches include only changes in each text/(sometimes binary) files, > > while xpatches should save the entire work of the svn_diff_tree_processor_t > > drivers. > > > > In other words, all unidiff patches and Git extended patches are both doing > > the same thing. They both are made to read the next patch (svn_patch_t) > > from svn_patch_file_t struct. Each patch file can be opened from an > > absolute path to it using svn_diff_open_patch_file(), and then closed using > > svn_diff_close_patch_file(). > > > > The xpatch applyer may be implemented using an XML push-parser. It may work > > out the sequence like this: > > xpatch-file->svn_stream_t->expat->svn_diff_tree_processor_t->apply-processor. > > > > Note: since we need to actually "push" the content of the file to an XML > > parser, an svn_stream_copy3() function has to be invoked in order to wrap a > > file into a stream. > > > > Following from this, I think patch and xpatch are completely different, on > > the side of their implementations. I think the format detection has to be > > done before the client starts applying it. For example, cmdline client can > > open the patch file, when `svn patch` has been invoked, detect the format > > using a function such svn_client_patch_detect_format(apr_file_t > > *patch_file, svn_client_patch_kind_t *kind), then it will apply either > > patch or xpatch. > > > > However, I noticed a little problem in the implementation of patching; The > > svn_client_patch() function accesses the patchfile by an absolute path > > instead of a handle to a file. This follows that the file will be opened > > twice. Is it possible to modify it, so the svn_client_patch() function will > > accept apr_file_t or a svn_stream_t instead of an absolute path to this > > file? > > > > -- > > Timofei Zhakov > > > I looked at the patches in the other thread [1] but so far I only had > time to give them a cursory look. So some of my thoughts below might > be totally off. Nevertheless... my initial thoughts: > > Yes, the patch API probably will need a new revision. > > However, before making the new revision, we should study the merge API > because xpatches will trigger a merge behind the scenes, which in turn > can trigger conflict resolution. If run from a GUI, the GUI will need > to hook into the interactive conflict resolution. > > Also, we briefly spoke about a possible (optional) cmdline switch to > tell the patch API the type of patch file. So the patch API needs a > way to pass that information. > > Finally, I think it is better if the patch API continues to take an > absolute path to the patch file, if possible. > > But, how to handle: > * The totally different patch file formats (unidiff vs xpatch) > * The need for different parsers and different code paths > * Avoid opening the file twice (once to detect format, second time to > actually parse it) > > An idea that I tried to describe in my previous email -- I will try to > explain it better here, but tell me if I'm missing something:
Some corrections to what I wrote before... > First, apply_patches opens the file, but instead of using a struct > svn_patch_file_t, it uses a plain FILE pointer, or perhaps a special > struct. Not a plain FILE pointer. We don't do that around here. :-) So, an apr_file_t * file. > Then it calls a function that probes the contents of the patch file, > until it recognizes something that indicates if this is unidiff or > xpatch. (If the user told us what kind of file, we skip this step > because we already know the file type.) > > If unidiff, it will create a struct svn_patch_file_t, store the FILE > pointer there, ensure next_patch_offset is 0, and call the unidiff > code, which is the loop that currently lives at apply_patch() which > calls svn_diff_parse_next_patch(). > > If xpatch, it will create a different struct that doesn't exist yet. > Earlier I suggested a name like svn_xpatch_file_t but you can choose > any suitable name. We need to convert a plain FILE pointer to > svn_stream_t -- I am not sure if a function to do this exists already, > so I need to look into this further; for some reason, I can't find the > definition right now -- and then call the xpatch code. It is defined in libsvn_subr/stream.c. I couldn't find it before because I searched *.h files. Looks like it's in a source file for data hiding of the struct's contents. And, yes, there is a way to make a stream from a apr_file_t: /* Helper function that creates a stream from an APR file. */ static svn_stream_t * make_stream_from_apr_file(apr_file_t *file, svn_boolean_t disown, svn_boolean_t supports_seek, svn_boolean_t truncate_on_seek, apr_pool_t *pool) So I think something similar to the idea I suggested before could work, if you agree that it's a good idea. :-) > So, the file is only opened once, but we can call different parsers. > > Maybe the rest of apply_patch should be refactored to a new function, > so apply_patch() would only decide the format and direct execution to > one of the two alternatives. > > I have to run for a little while but I'll study the code more in-depth > and try to provide better feedback soon. > > Meanwhile, just wanted to articulate those thoughts a little better... > > Hope this helps (let me know!), > Nathan Hope this helps... Nathan