Re: Coding Style Review and Automation

Edwin Torok Wed, 12 Feb 2025 04:32:01 -0800

On 11 Feb 2025, at 18:54, Marek Marczykowski-Górecki 
<marma...@invisiblethingslab.com> wrote:
> 
> On Tue, Feb 11, 2025 at 03:06:08PM +0100, Roger Pau Monné wrote:
>> On Tue, Feb 11, 2025 at 11:19:23AM +0100, Jan Beulich wrote:
>>> On 11.02.2025 10:10, Luca Fancellu wrote:
>>>>>> 3) The size of the patch after applying clang-format is huge. Really. 
>>>>>> Something
>>>>>> like 9 MB. Even if everyone agrees that the approach is good and we can 
>>>>>> proceed
>>>>>> with it, it is highly unlikely anyone will be able to review it. 
>>>>>> Considering
>>>>>> that new patches are being added to the upstream during such a review, 
>>>>>> it may
>>>>>> also lead to new code style violations or require a new review of that 
>>>>>> huge
>>>>>> patch.
>>>>> 
>>>>> I think this approach is difficult.  It would likely introduce a lot
>>>>> of noise when using `git blame` (I know, it's just one extra jump,
>>>>> but...), plus would likely break every patch that we currently have
>>>>> in-flight.
>>>> 
>>>> I think we already discussed this in the past and having some churn was 
>>>> accepted,
>>>> also about breaking existing patches, every change merged has the 
>>>> potential to do
>>>> that, this one is more likely but it’s the game I guess?
>>> 
>>> That's easy to say if you have just a few patches in flight, yet I'm worried
>>> about this when considering the hundreds of mine that are awaiting review.
>> 
>> There are also downstreams (including distros) with varying length of
>> patch queues on top of Xen.  Arguably they have to rebase the queue
>> every time they update, but a wide change in coding style will likely
>> be fairly disruptive to them.
>> 
>> Don't take this as a reason to reject clang-format.  As mentioned
>> elsewhere I think the format supported by clang-format would need to
>> be fairly similar to the current Xen one (up to the point that chunks
>> of code using the new and the old style could live together).  Then we
>> would enforce it only for newly added chunks of code initially IMO.
> 
> While clang-format surely will force some changes (the earlier 9MB patch
> is a data point here...)


Here is an example of how reformatting the OCaml C stubs would look like based 
on an earlier attempt:
https://github.com/edwintorok/xen/commit/7ad754fcfb490954f7f01f788893f5c4b77fdc9a
https://github.com/edwintorok/xen/commit/41155c78cc95fd66fe2ed0d1634b0e59fcc3a3b2

In this case switching those files to Linux coding style results in a smaller 
diff, and might help reduce that 9MB a little bit.
Those files didn’t follow either the Xen or Linux coding style previously, 
seems to have been a mix of styles,
which is very confusing because it is not documented anywhere what style they 
are supposed to be, so everyone is left guessing as to
how to best preserve the style of existing code.


> , I generally expect it should match fairly
> close to the current code style, and so the rebase shouldn't be _that_
> painful. In some cases git likely will handle large part of the work
> already.


There are a few tools that might help with this.

This would reformat only diffs/commits:
https://github.com/llvm/llvm-project/blob/main/clang/tools/clang-format/git-clang-format

These are git merge drivers for clang-format that can help automatically solve 
most conflicts that a patch rebase would create:
https://github.com/llvm/llvm-project/blob/c174cc48401292e2eb9317128f56fd22af2f4848/libcxx/utils/clang-format-merge-driver.sh#L4
https://github.com/emilio/clang-format-merge
As you can see the implementation is very simple: you use clang-format on all 3 
sides during a conflict, and then attempt to merge again. If that has solved 
the conflict there is nothing more to do, if it hasn’t you get the usual 
conflict markers to solve it, but this time on the reformatted code.

If you are looking for something more automated to handle your work-in-progress 
patches, then there are 2 other tools that can help:

Jujutsu’s ‘fix’ command which goes through all (or subset) of your local and 
mutable commits (not part of upstream repo) and reformats each change 
individually (which is useful if your base is already well-formatted, and 
you’ve made a lot of changes to split/reorder/rebase your patches and didn’t 
use clang-format during development): https://jj-vcs.github.io/jj/latest/
(JJ can be used in ‘git’ colocation/compatibility mode, and I’ve actually 
switched to using it on a daily basis, because it enables a mega-merge 
workflow, where you can create a merge commit of all your in-flight branches 
that gets automatically rebased whenever one of your branches gets rebased, and 
with jj’s way of deferring conflict resolution you don’t even need to 
immediately fix the conflicts, which is very useful for “parallelising” 
patches, i.e. splitting out commits that are independent.
And ‘jj fix’ can keep up with formatting changes during that workflow too)

If JJ isn’t your cup of tea, then there are other tools that can help too, e.g. 
‘git-branchless’ has a ‘git test run’ command that can be used to run a 
formatter on all commits, see 
https://github.com/arxanas/git-branchless/discussions/803 (using ‘rustfmt’ as 
an example, but it should work in similarly with ‘clang-format’)

Also if you are worried about git blame then you can create a list of commits 
to ignore that were due to formatting, here is an example of how that looked 
like in the XAPI project:
https://github.com/xapi-project/xen-api/blob/master/.git-blame-ignore-revs

We went through a similar process in the XAPI project when we introduced 
`ocamlformat`, and had similar concerns over back-portability / in-flight work 
/ etc.
But at least for me the equivalent to the above tools for ‘ocamlformat’ made it 
a fairly smooth process (https://ocaml.org/p/merge-fmt/0.3), and even if the 
coding style may not suit everyone’s tastes, it is at least *consistent*,
 which is more important than what the coding style actually is (e.g. it 
prevents many mistakes where incorrect indentation or understanding of 
precedence leads to incorrect code that looks deceptively correct, but when run 
through the formatter the bug is immediately obvious)
It also allows the reviews to focus on the contents of the change, rather than 
nitpicking on style.

Reviewing the initial changes can indeed be difficult, but if the patch 
submitter specifies the exact version and command they used to create the 
re-formatting commit in the commit message itself, then reviewers can rerun 
that command on the parent commit,
and verify that they get the same (or equivalent) outcome.
(This requires the tool to be deterministic of course and always produce the 
same output given the same inputs).



Best regards,
—Edwin

> 
> It surely is a cost of introducing such a change, but IMO it's a cost
> worth paying.
> 
> -- 
> Best Regards,
> Marek Marczykowski-Górecki
> Invisible Things Lab

Re: Coding Style Review and Automation

Reply via email to