> For ENTRYPOINT, as long as your image copies the launcher file (like
https://github.com/GoogleCloudPlatform/python-docs-samples/blob/main/dataflow/flex-templates/pipeline_with_dependencies/Dockerfile#L38),
you can just do `ENTRYPOINT ["/opt/apache/beam/boot"]`.

This is accurate, Dataflow will update the entrypoint on a template image
when launching the template to
/opt/google/dataflow/python_template_launcher if such file is present, even
if the image configures a separate entrypoint.

This approach allows you to have the same image for launcher and the worker.

> What is not completely clear to me is why you need the setup.py to run on
the launcher image, and not the worker. Also, if you need:

What matters is the package must be installed on the image running on the
worker. You can manually install the package during image building
(preferred), or you can tell Beam to install the package for you, if you
provide --extra_package or --setup_file options.

On Mon, Nov 4, 2024 at 8:23 AM Henry Tremblay via user <user@beam.apache.org>
wrote:

> My launcher image has:
>
>
>
> ENTRYPOINT ["/opt/google/dataflow/python_template_launcher"]
>
>
>
> If I use this as my sdk_harness_image, as I did before, the job will hang,
> probably because the worker is trying to launch a job. So I need a separate
> image with:
>
>
>
> ENTRYPOINT ["/opt/apache/beam/boot"]
>
>
>
> I think Google suggests that you actually include the ENTRYPOINT as an arg
> in the Dockerfile, so you can use the same image, and then build it twice
> with a different argument.
>
>
>
> I also don’t need:
>
>
>
> ENV FLEX_TEMPLATE_PYTHON_PY_FILE="${BASE}/${PY_FILE}"
>
> ENV FLEX_TEMPLATE_PYTHON_SETUP_FILE=${BASE}/$SETUP
>
> In my worker image.
>
>
>
> What is not completely clear to me is why you need the setup.py to run on
> the launcher image, and not the worker. Also, if you need:
>
>
>
> pipeline_options.view_as(SetupOptions).save_main_session =
> save_main_session
>
>
>
> That line is supposed to save the code in the main session and transfer it
> to the worker (via pickle), but I am wondering if you need this at all.
>
>
>
>
>
> *From:* XQ Hu via user <user@beam.apache.org>
> *Sent:* Monday, November 4, 2024 6:31 AM
> *To:* user@beam.apache.org
> *Cc:* XQ Hu <x...@google.com>
> *Subject:* Re: Solution to import problem
>
>
>
> For ENTRYPOINT, as long as your image copies the launcher file (like
> https://github.com/GoogleCloudPlatform/python-docs-samples/blob/main/dataflow/flex-templates/pipeline_with_dependencies/Dockerfile#L38),
> you can just do `ENTRYPOINT ["/opt/apache/beam/boot"]`.
>
> Again, using one container image is more convenient if you start managing
> more Python package depecdencies.
>
>
>
> On Mon, Nov 4, 2024 at 1:16 AM Henry Tremblay <paulhtremb...@gmail.com>
> wrote:
>
> Sorry, yes, you are correct, though Google does not document this.
>
>
>
> 1. Formerly I can import pyscog and requests because they are in the
> worker image you linked to.
>
> 2. secretmanager cannot be imported because it is not in the worker image.
>
> 3. passing the parameter --parameters
> sdk_container_image=$IMAGE_URL_WORKER causes the worker to use the
> pre-built image
>
> 4. I cannot use the same Docker image for both launcher and worker because
> of the ENTRYPOINT
>
>
>
> On Sun, Nov 3, 2024 at 1:53 PM XQ Hu via user <user@beam.apache.org>
> wrote:
>
> I think the problem is you do not specify sdk_container_image when running
> your template.
>
>
>
>
> https://github.com/GoogleCloudPlatform/python-docs-samples/tree/main/dataflow/flex-templates/pipeline_with_dependencies#run-the-template
> has more details.
>
>
>
> Basically, you do not need to
> https://github.com/paulhtremblay/data-engineering/blob/main/dataflow_/flex_proj_with_secret_manager/Dockerfile
> for your template launcher.
>
>
>
> You can use the same image for both launcher and dataflow workers. You
> only need to copy python_template_launcher to your image like
> https://github.com/GoogleCloudPlatform/python-docs-samples/blob/main/dataflow/flex-templates/pipeline_with_dependencies/Dockerfile#L38.
> Then you can use this image for both  launcher and dataflow workers. When
> running the template job, you need to add --parameters
> sdk_container_image=$SDK_CONTAINER_IMAGE.
>
>
>
>
>
> On Sun, Nov 3, 2024 at 4:18 PM Henry Tremblay <paulhtremb...@gmail.com>
> wrote:
>
> A few weeks ago I had posted a problem I had with importing the Google
> Cloud Secret Manager library in Python.
>
>
>
> Here is the problem and solution:
>
>
>
>
> https://github.com/paulhtremblay/data-engineering/tree/main/dataflow_/flex_proj_with_secret_manager
>
>
>
> --
>
> Henry Tremblay
>
> Data Engineer, Best Buy
>
>
>
>
> --
>
> Henry Tremblay
>
> Data Engineer, Best Buy
>
>

Reply via email to