this will be a relatively big update, as there are many many moving pieces with short, medium and long term goals.
TLDR1: we're shutting jenkins down at the end of 2021. TLDR2: i know we're way behind on pretty much everything. most of the hardware is at or beyond EOL, and random systemic build failures (like k8s/minikube) are randomly popping up. i've had to restrict access due to new campus policies, and i will be dealing with that shortly and only for a few contributors. long term (until EOY): * decide what the future of spark builds and releases will look like - do we need jenkins? - if we do, who's responsible for hosting + ops? * we will permanently shut down amplab jenkins by the end of 2021 - uc berkeley has funded this for over 10 years, and both the funds and staff (only me, for 7 years) are going away. i'm staying at cal, but have a much different job now. :) medium term (in 6 months): * prepare jenkins worker ansible configs and stick in the spark repo - nothing fancy, but enough to config ubuntu workers - could be used to create docker containers for testing in <wavey-hands>THE CLOUD</wavey-hands> * train up brian shiratsuki (cced) to help w/ops tasks and upgrades over the next ~6m * get to all of the python version, library installation, etc etc jira requests short term(weeks): * debug and figure out why minikube/k8s broke - https://issues.apache.org/jira/browse/SPARK-34738 - i really could use some help here... * bring up additional workers - finish hardware/system level repairs on the bare metal - see above, re k8s jira * stabilize cluster - recent jenkins LTS upgrade broke the web GUI - finish deploying monitoring/alerting - this hardware is OLD and literally falling over, so we have lots of random disk and ram failures. it's literally whack-a-mole and each trip to the colo to repair literally takes a full day i'm only able to spend a few hours a week on the build system, so expect random downtime, reboots, restarts, and testing. we're testing new nodes as we deploy, and hoping to fix anything before releasing them into the wild, but some things might be flaky. but the biggest question is what you all need w/regards to build infrastructure... and who's going to be responsible for it. thanks for reading! :) shane -- Shane Knapp Computer Guy / Voice of Reason UC Berkeley EECS Research / RISELab Staff Technical Lead https://rise.cs.berkeley.edu