Pre-commit (when you run stuff on your local change) works only on the files that you changed (you can make it a bit more complex like " run always" etc" ). The pre-commit takes care of running those checks only on the files that are changed, it automatically splits the list of changed files into "batches"- as many as processors you have on your development machine. Usually, those are fast enough that when you have an average size, they complete in < 2-3 seconds. When you run full tests on all files, they would take ~ 10-12 minutes. So the pre-commit is fantastic to be set as the actual "git pre-commit hook" - and this is mainly what pre-commit is about - you install it once, and then it automatically manages the set of checks that you run (the .pre-commit-config.yml is part of the repo and when you get a new version pre-commit will automatically manage virtualenvs, node and whatever else you have as pre-commit. Then those full"fast" pre-commit checks are run in the PR always for all files (just to be sure), But these are the "fast" ones. So from the static-checks point of view, it is perfect for us. Locally you have just pre-commits for your changes so you get immediate feedback even before you make a commit. What's more, some of those (like license check) will actually fix files for you when you try to commit them. I do not remember a time when I had to manually add a license header to a file because pre-commit does it for me when I attempt to commit a file. It will fail then and I can manually "git add -p ." and see what pre-commit changed and choose to add/discard/commit the changes. This all includes black formatting, automatically generating some documentation, and a lot, lot, lot more.
But those are just static checks, On top of that, we run actual python tests on multiple Databases/Python versions/Kubernetes versions. About 50 combinations of those - and this is the part that takes really a lot of time. Those have nothing to do with the pre-commits, those are usual pytest tests but also some automated integration bash scripts that are testing various parts of the system (for example package preparation and installation). For those "long" tests - I actually (this weekend) implemented selective tests this week in this PR: https://github.com/apache/airflow/pull/11417 - we split the tests into several different test types (based on the internal structure of the project) and we only run those tests that make sense for a given change. For example, if part of the "core" of Airflow is changed, we run "all tests" but in case only some "providers" (which is just connector to an external system) - we only run those tests for that provider. That helps to bring the tests time down by 40-50% for most of the PRs - most of the PRs are not about the core but for some external stuff. I am rather interested in how those kinds of cases might be handled better by Yetus - i.e. how much smarter it can be when selecting which parts of the tests should be run - and how you would define such relation. What pre-commit is doing is rather straightforward (run tests on files that changed), what I did in tests takes into account the "structure" of the project and acts accordingly. And those are rather simple to implement. As you'd see in my PR it's merely <100 lines in bash to find which files have changed and based on some predefined rules select which tests to run. I'd be really interested though if Yates can provide some better ways of handling it? J. On Tue, Oct 13, 2020 at 7:21 PM Allen Wittenauer <a...@effectivemachines.com.invalid> wrote: > > > > On Oct 13, 2020, at 9:02 AM, Jarek Potiuk <jarek.pot...@polidea.com> > wrote: > > > > Yep having pre-commits is cool and we extensively use it as part of our > > setup in Airflow. Since we are heavily Pythonic project we are using the > > fantastic https://pre-commit.com/ framework. > > Is pre-commit still "dumb?" i.e., it treats PRs and branches the > same? Because Yetus doesn't. It gives targeted advice based upon the > change. Which makes it faster during the PR cycle which is why the bigger > the project, the bigger the speed bump. -- Jarek Potiuk Polidea <https://www.polidea.com/> | Principal Software Engineer M: +48 660 796 129 <+48660796129> [image: Polidea] <https://www.polidea.com/>