Hi Jarek, Thanks a lot for detailed feedback and sharing the Airflow story, this is exactly what I was hoping to hear in response from the mailing list!
600+ dependencies is very impressive, so I'd be happy to chat more and learn from your experience. On Wed, Aug 24, 2022 at 5:50 AM Jarek Potiuk <ja...@potiuk.com> wrote: > Comment (from a bit outsider) > > Fantastic document Valentyn. > > Very, very insightful and interesting. We feel a lot of the same pain in > Apache Airflow (actually even more because we have not 20 but 620+ > dependencies) but we are also a bit more advanced in the way how we are > managing the dependencies - some of the ideas you had there are already > tested and tried in Airflow, some of them are a bit different but we can > definitely share "principles" and we are a little higher in the "supply > chain" (i.e. Apache Beam Python SDK is our dependency). > > I left some suggestions and some comments describing in detail how the > same problems look like in Airflow and how we addressed them (if we did) > and I am happy to participate in further discussions. I am "the dependency > guy" in Airflow and happy to share my experiences and help to work out some > problems - and especially help to solve problems coming from using multiple > google-client libraries and diamond dependencies (we are just now dealing > with similar issue - where likely we will have to do a massive update of > several of our clients - hopefully with the involvement of Composer team. > And I'd love to be involved in a joint discussion with the google client > team to work out some common and expectations that we can rely on when we > define our future upgrade strategy for google clients. > > I will watch it here and be happy to spend quite some time on helping to > hash it out. > > BTW. You can also watch my talk I gave last year at PyWaw about "Managing > Python dependencies at Scale" > https://www.youtube.com/watch?v=_SjMdQLP30s&t=2549s where I explain the > approach we took, reasoning behind it etc. > > J. > > > On Wed, Aug 24, 2022 at 2:45 AM Valentyn Tymofieiev via dev < > dev@beam.apache.org> wrote: > >> Hi everyone, >> >> Recently, several issues [1-3] have highlighted outage risks and >> developer inconveniences due to dependency management practices in Beam >> Python. >> >> With dependabot and other tooling that we have integrated with Beam, one >> of the missing pieces seems to be having a clear guideline of how we should >> be specifying requirements for our dependencies and when and how we should >> be updating them to have a sustainable process. >> >> As a conversation starter, I put together a retrospective >> <https://docs.google.com/document/d/1gxQF8mciRYgACNpCy1wlR7TBa8zN-Tl6PebW-U8QvBk/edit?resourcekey=0-XcHRyFh4KRPkA0GsdUmU3g#>[4] >> covering a recent incident and would like to get community opinions on the >> open questions. >> >> In particular, if you have experience managing dependencies for other >> Python libraries with rich dependency chains, knowledge of available >> tooling or first hand experience dealing with other dependency issues in >> Beam, your input would be greatly appreciated. >> >> Thanks, >> Valentyn >> >> [1] https://github.com/apache/beam/issues/22218 >> [2] https://github.com/apache/beam/pull/22550#issuecomment-1217348455 >> [3] https://github.com/apache/beam/issues/22533 >> [4] >> https://docs.google.com/document/d/1gxQF8mciRYgACNpCy1wlR7TBa8zN-Tl6PebW-U8QvBk/edit >> >