Hi all, Having a build fail because of a lint check or similar style requirement is frustrating. It adds time to the process of getting a patch finished and merged because you have to check back to see the build failed, switch contexts from whatever you'd been working on since, and go fix it. And as a new or infrequent contributor, it's particularly challenging because you have to figure out which additional dependencies (and specific versions, as in the case of clang-format) to install and how to run this extra tool on your machine.
It doesn't have to be this way. With GitHub Actions, we can run workflows that fix style and other violations and push the fix in a commit back to the branch. To demonstrate this, I have a PR ( https://github.com/apache/arrow/pull/6411) that updates the generated R man pages whenever there's a change to the inline docstrings. The existing CI for R package checks fail if there's a mismatch between code and docs, and it's all too easy to update the inline docs and forget to regenerate the man pages from them. I could envision doing the same kind of GHA workflow for the C++ linting--the current CI job already prints the diff that should be applied to fix the lint failures, so why not just fix it?--and even to implement Python black, if we decided we wanted to do that. On the PR discussion, it seems that there's some interest in doing this but also some concern. Some thoughts: * It only runs on your fork (apache/arrow is excluded) and it does not run on master * I've added to the commit message "[automated commit]" so that it's clear that you personally didn't add the commit. * If anyone were strongly opposed, we could add their fork to the list of excluded repositories. It was also pointed out to me that we have precommit hooks that we could extend to do this kind of work instead of GitHub Actions. I tried them out and found them to be largely undocumented and requiring additional out-of-band environment setup; ultimately I disabled them because I kept getting a popup telling me to go to some website and get another JDK, all just when editing a .R file in the r/ directory. If I as a regular contributor found them burdensome to set up, I'm not going to recommend them to a new contributor. Style guides and linting are important for large projects like Arrow, but we don't want to add unnecessary friction to the dev process, particularly for new contributors--it's challenging enough without it. From my experience, the best way to enforce standards and processes is to automate them so that contributors don't need to think about doing the right thing because the right thing is the easy thing. What are your thoughts? If anyone objects to using GitHub Actions in this way, would you be satisfied with blacklisting your fork (i.e. you don't want it running on your branches but you don't mind if others do)? Thanks, Neal
