I think we need to 1) cut CIs pressure and 2) look for more resources to
run CIs at the same time.

Cut CIs:

   - I think the biggest cut would be on the scheduled jobs first. For
   instance change 3.5 and 4.0 scheduled jobs from daily to once in three
   days, or even once per week.
   - Then for branch 4.x or more active release branches we can do daily
   post merge CI, instead of after each commit?
   - Meanwhile we can explore ways to run selected tests on the actual
   affected code path to avoid full runs.
   - And optimize tests themselves so they run faster.

Expand resources:

   - We can probably move some of the scheduled jobs out to another repo
   like what Apache Arrow did.
   - I wonder if self hosted runners are acceptable to the community? This
   sounds like a longer term solution if we were to introduce more checks in
   the future.


Best regards,
Yicong Huang

On Wed, May 6, 2026 at 3:04 PM Hyukjin Kwon <[email protected]> wrote:

> We should probably reduce the scheduled build for the time being.
>
> As a reference, I worked in Apache Arrow, and they use an extra CI by
> thirdparty, e.g., see
> - PR: https://github.com/apache/arrow/pull/48915
> <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Farrow%2Fpull%2F48915&data=05%7C02%7Cyiconghuang%40umass.edu%7C925f538971c045093e2a08deabbb6bf5%7C7bd08b0b33954dc194bbd0b2e56a497f%7C0%7C0%7C639137018630755244%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=yIAcTyytFETWD5dWaKPKr4B2Pw1%2BNFyyChskxhSFcZE%3D&reserved=0>
> - You comment like
> https://github.com/apache/arrow/pull/48915#issuecomment-3852062184
> <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Farrow%2Fpull%2F48915%23issuecomment-3852062184&data=05%7C02%7Cyiconghuang%40umass.edu%7C925f538971c045093e2a08deabbb6bf5%7C7bd08b0b33954dc194bbd0b2e56a497f%7C0%7C0%7C639137018630807540%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=0NpSP%2FMHlidY10rwOPeDbYyCNMV8yWCKcKAc580t9xE%3D&reserved=0>
> - It posts the CI link like
> https://github.com/apache/arrow/pull/48915#issuecomment-3852079993
> <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Farrow%2Fpull%2F48915%23issuecomment-3852079993&data=05%7C02%7Cyiconghuang%40umass.edu%7C925f538971c045093e2a08deabbb6bf5%7C7bd08b0b33954dc194bbd0b2e56a497f%7C0%7C0%7C639137018630856045%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=eotDM%2Fyb4uCVDgG3BRTmRZ5k6XDJ9hW54mwYe8ab56c%3D&reserved=0>
> - The CI is defined at https://github.com/ursacomputing/crossbow
> <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fursacomputing%2Fcrossbow&data=05%7C02%7Cyiconghuang%40umass.edu%7C925f538971c045093e2a08deabbb6bf5%7C7bd08b0b33954dc194bbd0b2e56a497f%7C0%7C0%7C639137018630902539%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=nrWQwyc5L2MMaDnGkLpAcwLNCMvfe8IVo%2FGQ9whAYJM%3D&reserved=0>
>
> I feel like this can be an alternative if any vendor is willing to support
> it.
>
> On Thu, 7 May 2026 at 04:09, Tian Gao via dev <[email protected]>
> wrote:
>
>> I did some quick calculations, and we can't afford the CI with our
>> existing infra.
>>
>> Per ASF policy (https://infra.apache.org/github-actions-policy.html
>> <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Finfra.apache.org%2Fgithub-actions-policy.html&data=05%7C02%7Cyiconghuang%40umass.edu%7C925f538971c045093e2a08deabbb6bf5%7C7bd08b0b33954dc194bbd0b2e56a497f%7C0%7C0%7C639137018630945683%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=2RDYe4opehAqZ6er1r1JD2Kk1mcQ5Rx46annPpjkkfo%3D&reserved=0>),
>> the maximum weekly runner minutes we have is 250k. That's 1m per month, and
>> last month, we hit almost the exact number - 1,082,721 minutes.
>>
>> Our current CI consists of a few components (all numbers are per month):
>> * each commits on master branch - ~280k
>> * 4.1 scheduled run - ~200k
>> * 4.0 scheduled run - ~200k
>> * 3.5 scheduled run - negligible because we don't run many tests
>> * master scheduled run ~ 300k
>>
>> With the new release cadence, even if we only do scheduled run on 4.x
>> (which we shouldn't because it's an active dev branch but that's another
>> story), we need an extra 200k. With a 6-month maintenance window, we will
>> always have at least 3 active maintained versions (including LTS) that
>> require CI.
>>
>> If it's just 200k extra, maybe it's manageable. But I really believe we
>> need tests for the 4.x branch - we should treat that branch more like
>> master, than say 4.2. Even if we don't do pre-merge check on it, we should
>> do post-merge check for every commit. Daily check on an active dev branch
>> sounds a bit too risky to me. That would be another 300k.
>>
>> This does not include the discussion about any pre-merge check for 4.x,
>> which we should actually think about in the future.
>>
>> So the question is - how do we deal with that? The solutions I can think
>> of are
>> * Get some self-host runners and increase our CI capability limited by
>> ASF policy
>> * Optimize our CIs and tests so it takes less time to run
>> * Reduce the coverage of our tests so we can at least test all branches
>>
>> Any idea is welcome.
>>
>> Tian
>>
>

Reply via email to