Thanks for the feedback here and in the comments. I made a few updates and
added another alternative that discusses making the environment initializer
a resource hint. Looking forward to continuing the discussion.
On Tue, Dec 17, 2024 at 7:36 PM Robert Bradshaw via dev <dev@beam.apache.org>
wrote:

> On Tue, Dec 17, 2024 at 9:31 AM Kenneth Knowles <k...@apache.org> wrote:
> >
> > So is it just a documentation / examples / getting the knowledge out
> there problem?


> Possibly.


> > Incidentally I'm not a fan of modules that "do" things when you import
> them, nor am I a fan of the "try it as a module then a class" sort of
> fallback stuff vs just choosing the type you expect and sticking with it,
> giving very clear error messages. Also "ImportError" is going to be
> misinterpreted 99% of the time. Having something that calls a named
> function seems like it'll be a better experience all around.
>
> This was initially introduced to register things like filesystems, as
> Python doesn't have the service provider interface stuff that Java
> has, so we need to "run some code on startup" to register it. I agree
> a named function would be better, just thinking it might be preferable
> to avoid two distinct ways of doing almost the same thing.
>

This option wouldn't work for single-file pipelines, but I checked it and
can be used for pipelines that are structured as a package.  It is a bit
awkward to use since initialization has to be defined in the top-level
module code.

>
> >
> > Kenn
> >
> > On Fri, Dec 13, 2024 at 4:38 PM Robert Bradshaw via dev <
> dev@beam.apache.org> wrote:
> >>
> >> We already have
> >>
> https://github.com/apache/beam/blob/release-2.40.0/sdks/python/apache_beam/runners/worker/sdk_worker_main.py#L141
> >> that allows arbitrary code to be imported and executed on worker
> >> startup. (Perhaps we could generalize to let it also reference a
> >> function to be called rather than just a module.)
> >>
> >> On Fri, Dec 13, 2024 at 12:52 PM Danny McCormick via dev
> >> <dev@beam.apache.org> wrote:
> >> >
> >> > Thanks - I actually was thinking about this today and was annoyed
> that we don't have this ability. I'm +1 to the proposed approach.
> >> >
> >> > I dropped a comment, but also upleveling in case there is broader
> interest; it would be nice to have a similar capability for expansion
> service containers as well.
> >> >
> >> > On Fri, Dec 13, 2024 at 3:23 PM Valentyn Tymofieiev via dev <
> dev@beam.apache.org> wrote:
> >> >>
> >> >> Hi everyone,
> >> >>
> >> >> Currently we don't have a straightforward and documented way to do
> simple initialization steps on every Beam Python SDK worker before data
> processing  starts. It is a rough edge that I've encountered on several
> occasions myself and in conversations with Beam users
> >> >>
> >> >> I put together some thoughts on how we could provide that capability
> in https://s.apache.org/python_sdk_worker_initialization . Looking
> forward to your ideas and other feedback on this topic.
> >> >>
> >> >> Thanks,
> >> >> Valentyn
>

Reply via email to