Hello,

Le 12/06/2024 à 16:57, Jakub Jelinek a écrit :
On Wed, Jun 12, 2024 at 04:53:38PM +0200, Mikael Morin wrote:
Perhaps you could create a mirror version of the repo and do some experiments 
locally on that to identify where the bottle-neck is coming from?

Not sure where to start for that.Are the hooks published somewhere?

Yes: https://github.com/AdaCore/git-hooks/tree/master

Note, we use some tweaks on top of that, but that is mostly for the
release branches and trunk, so it would be interesting to just try
to reproduce that with the stock AdaCore git hooks.

I have finally taken some time to investigate this hook slowness, and here are my findings.

My tests were run with configs commit-extra-checker and commit-email-formatter disabled, and hooks.update-hook set to a minimal script (either "true" or "sleep 1"). With that config, I could not reproduce the slowness pushing to refs/users/mikael/*. The push finishes in less than a minute.

However, trying to push to a normal tag, there is some email count check coming into play, and I can reproduce some slowness (details below). This email count check shouldn't happen on the gcc repository in my use case (as email checks don't apply to user references), but the slowness could well happen in other cases than email count check depending on the configuration, as the problem relates to the size of the list of new commits and is not restricted to email count.

Anyway, even with email count check triggering, each tag takes less than 2 minutes to be rejected in my test. With 330 tags to process, that would make an upper bound of 11 hours before rejecting the push in my test (I killed it after a few minutes). On the other hand, with the information you gave upthread, the hook on the gcc repository seemed to be still processing the first tag after a few hours (assuming they are processed in alphabetical order, which seems to be the case). So this still doesn't explain what was happening on the gcc repository.

Regarding the email count check slowness I mentioned above, I traced it back to the updates.AbstractUpdate class, whose (procedural) new_commits_for_ref attribute is a list of "new" commits, containing both really new commits and commits newly on the branch to be updated, but already known to the repository. For a tag or branch creation, a list of "new on the branch" commits would be huge as everything is new, so parent commits of the oldest "repository-new" commit are not picked up. But in my test the list still amounts to a little less than 80,000 commits, basically what happened on trunk in the last 8 years. Anything that walks such a big list is bound to be slow.

To sum up:
- The hooks support checking "new on the branch" commits additionally to "new on the repository" commits, and that is a feature, not a bug. - In my use case, that means that the hooks process 80,000 commits, even if only 330 of them are new on the repository. - As the hook is called on a per-reference basis, the same commits would be processed over and over again for every reference in my use case, so the best would be to push them one by one, in order. - I still don't know why it took hours (without even finishing) to process just one tag the other day on the gcc repository.

Nikael

Reply via email to