My launcher image has: ENTRYPOINT ["/opt/google/dataflow/python_template_launcher"]
If I use this as my sdk_harness_image, as I did before, the job will hang, probably because the worker is trying to launch a job. So I need a separate image with: ENTRYPOINT ["/opt/apache/beam/boot"] I think Google suggests that you actually include the ENTRYPOINT as an arg in the Dockerfile, so you can use the same image, and then build it twice with a different argument. I also don’t need: ENV FLEX_TEMPLATE_PYTHON_PY_FILE="${BASE}/${PY_FILE}" ENV FLEX_TEMPLATE_PYTHON_SETUP_FILE=${BASE}/$SETUP In my worker image. What is not completely clear to me is why you need the setup.py to run on the launcher image, and not the worker. Also, if you need: pipeline_options.view_as(SetupOptions).save_main_session = save_main_session That line is supposed to save the code in the main session and transfer it to the worker (via pickle), but I am wondering if you need this at all. From: XQ Hu via user <user@beam.apache.org> Sent: Monday, November 4, 2024 6:31 AM To: user@beam.apache.org Cc: XQ Hu <x...@google.com> Subject: Re: Solution to import problem For ENTRYPOINT, as long as your image copies the launcher file (like https://github.com/GoogleCloudPlatform/python-docs-samples/blob/main/dataflow/flex-templates/pipeline_with_dependencies/Dockerfile#L38), you can just do `ENTRYPOINT ["/opt/apache/beam/boot"]`. Again, using one container image is more convenient if you start managing more Python package depecdencies. On Mon, Nov 4, 2024 at 1:16 AM Henry Tremblay <paulhtremb...@gmail.com<mailto:paulhtremb...@gmail.com>> wrote: Sorry, yes, you are correct, though Google does not document this. 1. Formerly I can import pyscog and requests because they are in the worker image you linked to. 2. secretmanager cannot be imported because it is not in the worker image. 3. passing the parameter --parameters sdk_container_image=$IMAGE_URL_WORKER causes the worker to use the pre-built image 4. I cannot use the same Docker image for both launcher and worker because of the ENTRYPOINT On Sun, Nov 3, 2024 at 1:53 PM XQ Hu via user <user@beam.apache.org<mailto:user@beam.apache.org>> wrote: I think the problem is you do not specify sdk_container_image when running your template. https://github.com/GoogleCloudPlatform/python-docs-samples/tree/main/dataflow/flex-templates/pipeline_with_dependencies#run-the-template has more details. Basically, you do not need to https://github.com/paulhtremblay/data-engineering/blob/main/dataflow_/flex_proj_with_secret_manager/Dockerfile for your template launcher. You can use the same image for both launcher and dataflow workers. You only need to copy python_template_launcher to your image like https://github.com/GoogleCloudPlatform/python-docs-samples/blob/main/dataflow/flex-templates/pipeline_with_dependencies/Dockerfile#L38. Then you can use this image for both launcher and dataflow workers. When running the template job, you need to add --parameters sdk_container_image=$SDK_CONTAINER_IMAGE. On Sun, Nov 3, 2024 at 4:18 PM Henry Tremblay <paulhtremb...@gmail.com<mailto:paulhtremb...@gmail.com>> wrote: A few weeks ago I had posted a problem I had with importing the Google Cloud Secret Manager library in Python. Here is the problem and solution: https://github.com/paulhtremblay/data-engineering/tree/main/dataflow_/flex_proj_with_secret_manager -- Henry Tremblay Data Engineer, Best Buy -- Henry Tremblay Data Engineer, Best Buy