My launcher image has:
ENTRYPOINT ["/opt/google/dataflow/python_template_launcher"]
If I use this as my sdk_harness_image, as I did before, the job will hang,
probably because the worker is trying to launch a job. So I need a separate
image with:
ENTRYPOINT ["/opt/apache/beam/boot"]
I think Google suggests that you actually include the ENTRYPOINT as an arg in
the Dockerfile, so you can use the same image, and then build it twice with a
different argument.
I also don’t need:
ENV FLEX_TEMPLATE_PYTHON_PY_FILE="${BASE}/${PY_FILE}"
ENV FLEX_TEMPLATE_PYTHON_SETUP_FILE=${BASE}/$SETUP
In my worker image.
What is not completely clear to me is why you need the setup.py to run on the
launcher image, and not the worker. Also, if you need:
pipeline_options.view_as(SetupOptions).save_main_session = save_main_session
That line is supposed to save the code in the main session and transfer it to
the worker (via pickle), but I am wondering if you need this at all.
From: XQ Hu via user <[email protected]>
Sent: Monday, November 4, 2024 6:31 AM
To: [email protected]
Cc: XQ Hu <[email protected]>
Subject: Re: Solution to import problem
For ENTRYPOINT, as long as your image copies the launcher file (like
https://github.com/GoogleCloudPlatform/python-docs-samples/blob/main/dataflow/flex-templates/pipeline_with_dependencies/Dockerfile#L38),
you can just do `ENTRYPOINT ["/opt/apache/beam/boot"]`.
Again, using one container image is more convenient if you start managing more
Python package depecdencies.
On Mon, Nov 4, 2024 at 1:16 AM Henry Tremblay
<[email protected]<mailto:[email protected]>> wrote:
Sorry, yes, you are correct, though Google does not document this.
1. Formerly I can import pyscog and requests because they are in the worker
image you linked to.
2. secretmanager cannot be imported because it is not in the worker image.
3. passing the parameter --parameters sdk_container_image=$IMAGE_URL_WORKER
causes the worker to use the pre-built image
4. I cannot use the same Docker image for both launcher and worker because of
the ENTRYPOINT
On Sun, Nov 3, 2024 at 1:53 PM XQ Hu via user
<[email protected]<mailto:[email protected]>> wrote:
I think the problem is you do not specify sdk_container_image when running your
template.
https://github.com/GoogleCloudPlatform/python-docs-samples/tree/main/dataflow/flex-templates/pipeline_with_dependencies#run-the-template
has more details.
Basically, you do not need to
https://github.com/paulhtremblay/data-engineering/blob/main/dataflow_/flex_proj_with_secret_manager/Dockerfile
for your template launcher.
You can use the same image for both launcher and dataflow workers. You only
need to copy python_template_launcher to your image like
https://github.com/GoogleCloudPlatform/python-docs-samples/blob/main/dataflow/flex-templates/pipeline_with_dependencies/Dockerfile#L38.
Then you can use this image for both launcher and dataflow workers. When
running the template job, you need to add --parameters
sdk_container_image=$SDK_CONTAINER_IMAGE.
On Sun, Nov 3, 2024 at 4:18 PM Henry Tremblay
<[email protected]<mailto:[email protected]>> wrote:
A few weeks ago I had posted a problem I had with importing the Google Cloud
Secret Manager library in Python.
Here is the problem and solution:
https://github.com/paulhtremblay/data-engineering/tree/main/dataflow_/flex_proj_with_secret_manager
--
Henry Tremblay
Data Engineer, Best Buy
--
Henry Tremblay
Data Engineer, Best Buy