My launcher image has:

ENTRYPOINT ["/opt/google/dataflow/python_template_launcher"]

If I use this as my sdk_harness_image, as I did before, the job will hang, 
probably because the worker is trying to launch a job. So I need a separate 
image with:

ENTRYPOINT ["/opt/apache/beam/boot"]

I think Google suggests that you actually include the ENTRYPOINT as an arg in 
the Dockerfile, so you can use the same image, and then build it twice with a 
different argument.

I also don’t need:

ENV FLEX_TEMPLATE_PYTHON_PY_FILE="${BASE}/${PY_FILE}"
ENV FLEX_TEMPLATE_PYTHON_SETUP_FILE=${BASE}/$SETUP
In my worker image.

What is not completely clear to me is why you need the setup.py to run on the 
launcher image, and not the worker. Also, if you need:

pipeline_options.view_as(SetupOptions).save_main_session = save_main_session

That line is supposed to save the code in the main session and transfer it to 
the worker (via pickle), but I am wondering if you need this at all.


From: XQ Hu via user <user@beam.apache.org>
Sent: Monday, November 4, 2024 6:31 AM
To: user@beam.apache.org
Cc: XQ Hu <x...@google.com>
Subject: Re: Solution to import problem

For ENTRYPOINT, as long as your image copies the launcher file (like 
https://github.com/GoogleCloudPlatform/python-docs-samples/blob/main/dataflow/flex-templates/pipeline_with_dependencies/Dockerfile#L38),
 you can just do `ENTRYPOINT ["/opt/apache/beam/boot"]`.
Again, using one container image is more convenient if you start managing more 
Python package depecdencies.

On Mon, Nov 4, 2024 at 1:16 AM Henry Tremblay 
<paulhtremb...@gmail.com<mailto:paulhtremb...@gmail.com>> wrote:
Sorry, yes, you are correct, though Google does not document this.

1. Formerly I can import pyscog and requests because they are in the worker 
image you linked to.
2. secretmanager cannot be imported because it is not in the worker image.
3. passing the parameter --parameters sdk_container_image=$IMAGE_URL_WORKER 
causes the worker to use the pre-built image
4. I cannot use the same Docker image for both launcher and worker because of 
the ENTRYPOINT

On Sun, Nov 3, 2024 at 1:53 PM XQ Hu via user 
<user@beam.apache.org<mailto:user@beam.apache.org>> wrote:
I think the problem is you do not specify sdk_container_image when running your 
template.

https://github.com/GoogleCloudPlatform/python-docs-samples/tree/main/dataflow/flex-templates/pipeline_with_dependencies#run-the-template
 has more details.

Basically, you do not need to 
https://github.com/paulhtremblay/data-engineering/blob/main/dataflow_/flex_proj_with_secret_manager/Dockerfile
 for your template launcher.

You can use the same image for both launcher and dataflow workers. You only 
need to copy python_template_launcher to your image like 
https://github.com/GoogleCloudPlatform/python-docs-samples/blob/main/dataflow/flex-templates/pipeline_with_dependencies/Dockerfile#L38.
 Then you can use this image for both  launcher and dataflow workers. When 
running the template job, you need to add --parameters 
sdk_container_image=$SDK_CONTAINER_IMAGE.


On Sun, Nov 3, 2024 at 4:18 PM Henry Tremblay 
<paulhtremb...@gmail.com<mailto:paulhtremb...@gmail.com>> wrote:
A few weeks ago I had posted a problem I had with importing the Google Cloud 
Secret Manager library in Python.

Here is the problem and solution:

https://github.com/paulhtremblay/data-engineering/tree/main/dataflow_/flex_proj_with_secret_manager

--
Henry Tremblay
Data Engineer, Best Buy


--
Henry Tremblay
Data Engineer, Best Buy

Reply via email to