I can prepare a short presentation for the next dev call (22 Aug) to explain the architecture we tried to implement, why we chose it, and what the blockers are (mainly related to the infrastructure team).
Also, I am still interested in completing this work. On Mon, Aug 12, 2024 at 3:58 AM Kaxil Naik <kaxiln...@gmail.com> wrote: > Added as agenda item for the next dev call (22 Aug) > > On Mon, 12 Aug 2024 at 00:25, Jarek Potiuk <ja...@potiuk.com> wrote: > > > I will let Hussein (if he has time) to share some more details :). > > > > Generally speaking we are using Github Actions as CI - so what we > > **really** need is auto-scaling k8S cluster where K8S Controller is > deployd > > and connected (via ASF infrastructure's Github APP) > > https://github.com/actions/actions-runner-controller. The last state we > > had > > - as far as I remember - Hussein already had a (Terraform?) deployment > for > > it and it generally was depending on the ASF/ Infra authorisation / > setup. > > Then some fine-tuning / labels (small/medium/big instances) to > > define/findalize and extend it to be able to also run ARM instances. > > > > J. > > > > On Mon, Aug 12, 2024 at 1:10 AM Neil <neil4r...@gmail.com> wrote: > > > > > I have solid AWS and EKS knowledge, I'd offer my help if my skills are > > > applicable. > > > Which Infrastructure as Code and CI/CD frameworks are being utilized > for > > > the testing Terraform Cloudformation? > > > I've had good experiences with Pulumi python. > > > Have you considered using EFS to handle the disk space needs? > > > > > > On Sun, Aug 11, 2024 at 6:18 PM Jarek Potiuk <ja...@potiuk.com> wrote: > > > > > > > Hello here, > > > > > > > > It would be great to have someone (or better two people) to get > engaged > > > in > > > > our test infrastructure work - this will improve everyone's > experience. > > > I > > > > **REALLY** think we should have other people that have engaged so > far, > > so > > > > that we can decrease the bus factor we have for our infrastructure. > > > > > > > > Just after I was away for 5 days and without too much connectivity > our > > > main > > > > was broken (lack of disk space for constraints generation) and some > > mypy > > > > checks were failing for the last few days. > > > > > > > > This is unsustainable and we need to find people who will know and be > > > able > > > > to fix this infrastructure. > > > > > > > > *Early warning* - I am planning 3 weeks holidays after Airflow > Summit - > > > and > > > > I won't be looking at my email/github during those days, which means > > that > > > > whoever will be working on Airflow 3 might be severely impacted by > some > > > of > > > > those failures. > > > > > > > > Just to remind - until we have the k8S controller set up on our AWS > > > > account and connected to our repo - we won't be able to use the > > credits > > > > that we got recently. So this is a good start. > > > > > > > > I created a high-level issue for that > > > > https://github.com/apache/airflow/issues/41388 and it waits for some > > > > volunteers to pick it up. It's a very important thing to do - we can > > > speed > > > > up many parts of our builds (for example release preparation - but > also > > > > likely most of our tests) up to 4 times, which means that a lot of > time > > > can > > > > be saved for waiting. > > > > > > > > Kaxil - I propose we should add a point at the next devcall - and > keep > > it > > > > as an unresolved Airflow 3 issue until it is well, unresolved. > > > > > > > > J. > > > > > > > > > >