We already have https://github.com/apache/beam/blob/release-2.40.0/sdks/python/apache_beam/runners/worker/sdk_worker_main.py#L141 that allows arbitrary code to be imported and executed on worker startup. (Perhaps we could generalize to let it also reference a function to be called rather than just a module.)
On Fri, Dec 13, 2024 at 12:52 PM Danny McCormick via dev <dev@beam.apache.org> wrote: > > Thanks - I actually was thinking about this today and was annoyed that we > don't have this ability. I'm +1 to the proposed approach. > > I dropped a comment, but also upleveling in case there is broader interest; > it would be nice to have a similar capability for expansion service > containers as well. > > On Fri, Dec 13, 2024 at 3:23 PM Valentyn Tymofieiev via dev > <dev@beam.apache.org> wrote: >> >> Hi everyone, >> >> Currently we don't have a straightforward and documented way to do simple >> initialization steps on every Beam Python SDK worker before data processing >> starts. It is a rough edge that I've encountered on several occasions myself >> and in conversations with Beam users >> >> I put together some thoughts on how we could provide that capability in >> https://s.apache.org/python_sdk_worker_initialization . Looking forward to >> your ideas and other feedback on this topic. >> >> Thanks, >> Valentyn