Hi all,
Having a build fail because of a lint check or similar style requirement is
frustrating. It adds time to the process of getting a patch finished and
merged because you have to check back to see the build failed, switch
contexts from whatever you'd been working on since, and go fix it. And as a
new or infrequent contributor, it's particularly challenging because you
have to figure out which additional dependencies (and specific versions, as
in the case of clang-format) to install and how to run this extra tool on
your machine.

It doesn't have to be this way. With GitHub Actions, we can run workflows
that fix style and other violations and push the fix in a commit back to
the branch. To demonstrate this, I have a PR (
https://github.com/apache/arrow/pull/6411) that updates the generated R man
pages whenever there's a change to the inline docstrings. The existing CI
for R package checks fail if there's a mismatch between code and docs, and
it's all too easy to update the inline docs and forget to regenerate the
man pages from them. I could envision doing the same kind of GHA workflow
for the C++ linting--the current CI job already prints the diff that should
be applied to fix the lint failures, so why not just fix it?--and even to
implement Python black, if we decided we wanted to do that.

On the PR discussion, it seems that there's some interest in doing this but
also some concern. Some thoughts:

* It only runs on your fork (apache/arrow is excluded) and it does not run
on master
* I've added to the commit message "[automated commit]" so that it's clear
that you personally didn't add the commit.
* If anyone were strongly opposed, we could add their fork to the list of
excluded repositories.

It was also pointed out to me that we have precommit hooks that we could
extend to do this kind of work instead of GitHub Actions. I tried them out
and found them to be largely undocumented and requiring additional
out-of-band environment setup; ultimately I disabled them because I kept
getting a popup telling me to go to some website and get another JDK, all
just when editing a .R file in the r/ directory. If I as a regular
contributor found them burdensome to set up, I'm not going to recommend
them to a new contributor.

Style guides and linting are important for large projects like Arrow, but
we don't want to add unnecessary friction to the dev process, particularly
for new contributors--it's challenging enough without it. From my
experience, the best way to enforce standards and processes is to automate
them so that contributors don't need to think about doing the right thing
because the right thing is the easy thing.

What are your thoughts? If anyone objects to using GitHub Actions in this
way, would you be satisfied with blacklisting your fork (i.e. you don't
want it running on your branches but you don't mind if others do)?

Thanks,
Neal

Reply via email to