> Sure. Build tools can even be GPL, and something like a linter isn't a > hard dependency for Airflow anyway. +1 >
Indeed. > But we are just about to start releasing Production Image and Helm Chart > > for Apache Airflow and I started to wonder if this is still acceptable > > practice when - by releasing the code - we make our users depend on those > > images. > > Just checking: surely a production Airflow Docker image doesn't have > hadolint in it? > Yes. It does not :). It's just for the earlier A) case. > > > We are going to officially support both - image and helm chart by the > > community and once we release the image and helm chart officially, those > > external images and downloads will become dependencies to our official > > "releases". We are allowing our users to use our official Dockerfile > > to build a new image (with user's configuration) and Helm Chart is going > to > > be officially available for anyone to install Airflow. > > Sounds like a good step for your project. > Indeed. > First question: Is it the *only* way you can run Airflow? Does it end up > in the source tarball? If so, you need to review the ASF licensing > requirements and make sure you're not in violation there. (Just Checking!) > It's one of the ways. You don't *have to* use the helm chart or docker image. We have also official INSTALL instructions that simply install Airflow directly from the sources using just Python's pip dependencies (providing that you have all the required apt-deps installed. And in the sources, we just have the names/references to the images (not the images themselves). And they are all released with liberal licenses when it comes to using them. > > Second: Most of these look like *testing* dependencies, not runtime > dependencies. > > Tru. The only "runtime" deps are the astronomer's one: > - astronomerinc/ap-statsd-exporter:0.11.0 > > - astronomerinc/ap-pgbouncer:1.8.1 > > - astronomerinc/ap-pgbouncer-exporter:0.5.0-1 > How hard would it be for the Airflow community to import the Dockerfiles > and build the images themselves? And keep those imported forks up to > date? We do this a lot in CouchDB for our dependencies (not just Docker) > where it's a personal project of someone in the community, or even where > it's some corporate thing that we want to be sure we don't break on when > they implement a change for their own reasons. > > Not hard. I think it would be rather an easy task and we automate everything - including building and testing our own images (production and CI ones for every Pull request). I even created the description on how to build a robust setup where Github Actions and Dockerhub work together and images are build and cached in Github Registry but then published nightly to DockerHub (The Infra team asked me to do that when I shared it with them). https://cwiki.apache.org/confluence/display/INFRA/Github+Actions+to+DockerHub So we are fully capable of doing it. > Automating building these and pushing them isn't hard these days, even > on ASF hardware if you want. The nice thing about Docker is that, for > you to do that, you really only need "docker build" (or "docker buildx" > for cross-platform) and a build machine or two to keep things current. Indeed. We use Github Actions and DockerHub Build integration so that would be rather easy. > > > 4) If some images are not acceptable, shoud we bring them in and release > > them in a community-managed registry? > > I don't think you need a dedicated registry, but I would recommend > setting up your own Docker Hub user and pushing at least CI images you > need there. (We have the couchdbdev user, for instance, images we keep > up to date with all of our build/test dependencies for Jenkins use.) And > of course there's a bunch of images under > https://hub.docker.com/u/apache for many ASF projects at this point. > Yeah. We have our own Dockerhub Airflow account where we publish (following the CI process above) our CI And Production images nightly. https://hub.docker.com/repository/docker/apache/airflow And indeed I thought about something separate like your couchdbdev user, but thought about making it under "apache" umbrella. However you are quite right that we likely do not have to have an "apache" managed account for that, we could easily have our own and then it would make it easier to have multiple repositories under apachedev user for example. I think that is going to be our setup eventually for dev dependencies. > For runtime dependency "sidecars" for Helm and other Docker images, I > don't have a strong opinion. If they're essential to bring-up for > Airflow, I'd encourage you to bring them in-project and re-build them > yourselves. They are quite important for the helm chart installation method. In this case it's the pgbouncer and statsd deamon. The first one is there to limit number of postgres connections opened to the database, the second to monitor metrics of a running instance. Both really important for "productionising" the installation of Airflow. > I recommend using a Git repo in which you maintain an > upstream branch for each Docker file on, and PR regularly to your > main/master branch. Then, you can tag the main/master branch with tags > like "Airflow-#.#.#" and reference those tags to prevent any sort of > breakage. It's not Docker, but you can see how we do this here: > https://github.com/apache/couchdb-jiffy > Right. DockerHub integration is easy to do for us indeed. > Hope this helps, > Joan "CouchDB build maestro" Touzet > It does, thanks! The fact that I am not the only one concerned about it and that other projects do it already as well is reassuring that it might make sense to bring it all in-house. And the idea to have a separate user on Dockerhub rather than use the apache one is a very good one. I wonder if others have similar experiences. J. -- Jarek Potiuk Polidea <https://www.polidea.com/> | Principal Software Engineer M: +48 660 796 129 <+48660796129> [image: Polidea] <https://www.polidea.com/>