Re: GitHub Actions Concurrency Limits for Apache projects

Jarek Potiuk Tue, 13 Oct 2020 11:47:14 -0700

Pre-commit (when you run stuff on your local change) works only on the
files that you changed (you can make it a bit more complex like " run
always" etc" ). The pre-commit takes care of running those checks only on
the files that are changed, it automatically splits the list of changed
files into "batches"- as many as processors you have on your development
machine. Usually, those are fast enough that when you have an average size,
they complete in < 2-3 seconds. When you run full tests on all files,
they would take ~ 10-12 minutes. So the pre-commit is fantastic to be set
as the actual "git pre-commit hook"  - and this is mainly what pre-commit
is about - you install it once, and then it automatically manages the set
of checks that you run (the .pre-commit-config.yml is part of the repo and
when you get a new version pre-commit will automatically manage
virtualenvs, node and whatever else you have as pre-commit.
Then those full"fast" pre-commit checks are run in the PR always for all
files (just to be sure), But these are the "fast" ones. So from the
static-checks point of view, it is perfect for us. Locally you have just
pre-commits for your changes so you get immediate feedback even before you
make a commit. What's more, some of those (like license check) will
actually fix files for you when you try to commit them. I do not remember a
time when I had to manually add a license header to a file because
pre-commit does it for me when I attempt to commit a file. It will fail
then and I can manually "git add -p ." and see what pre-commit changed and
choose to add/discard/commit the changes. This all includes black
formatting, automatically generating some documentation, and a lot, lot,
lot more.

But those are just static checks, On top of that, we run actual python
tests on multiple Databases/Python versions/Kubernetes versions. About 50
combinations of those - and this is the part that takes really a lot of
time. Those have nothing to do with the pre-commits, those are usual pytest
tests but also some automated integration bash scripts that are testing
various parts of the system (for example package preparation and
installation).
For those "long" tests - I actually (this weekend) implemented selective
tests this week in this PR: https://github.com/apache/airflow/pull/11417 -
we split the tests into several different test types (based on the internal
structure of the project) and we only run those tests that make sense for a
given change. For example, if part of the "core" of Airflow is changed, we
run "all tests" but in case only some "providers" (which is just connector
to an external system) - we only run those tests for that provider. That
helps to bring the tests time down by 40-50% for most of the PRs - most of
the PRs are not about the core but for some external stuff.

I am rather interested in how those kinds of cases might be handled better
by Yetus - i.e. how much smarter it can be when selecting which parts of
the tests should be run - and how you would define such relation. What
pre-commit is doing is rather straightforward (run tests on files that
changed), what I did in tests takes into account the "structure" of the
project and acts accordingly. And those are rather simple to implement. As
you'd see in my PR it's merely <100 lines in bash to find which files have
changed and based on some predefined rules select which tests to run. I'd
be really interested though if Yates can provide some better ways of
handling it?

J.

On Tue, Oct 13, 2020 at 7:21 PM Allen Wittenauer
<a...@effectivemachines.com.invalid> wrote:

>
>
> > On Oct 13, 2020, at 9:02 AM, Jarek Potiuk <jarek.pot...@polidea.com>
> wrote:
> >
> > Yep having pre-commits is cool and we extensively use it as part of our
> > setup in Airflow. Since we are heavily Pythonic project we are using the
> > fantastic https://pre-commit.com/  framework.
>
>         Is pre-commit still "dumb?"  i.e., it treats PRs and branches the
> same?  Because Yetus doesn't.  It gives targeted advice based upon the
> change.  Which makes it faster during the PR cycle which is why the bigger
> the project, the bigger the speed bump.

-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: GitHub Actions Concurrency Limits for Apache projects

Reply via email to