Hi Henry,

Could you share the Google support ticket number?

XQ

On Mon, Nov 4, 2024 at 2:03 PM Valentyn Tymofieiev via user <
user@beam.apache.org> wrote:

> I meant python packages broadly, such as third-party dependencies of your
> pipeline, or the package that has the modules comprising your pipeline.
>
> > What happens if I don’t set
> pipeline_options.view_as(SetupOptions).save_main_session to true?
>
> Then we don't save and load the content of main session on the worker.
> See:
> https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/#pickling-and-managing-the-main-session
>
> Saving main session might be necessary when your main entrypoint code is
> importing functions/modules used in your pipeline. When your pipeline is in
> a separate package, your mnain entrypoint can be minimalistic, like so:
> https://github.com/GoogleCloudPlatform/python-docs-samples/blob/4fe5495e7bfc7d7bc3c3dd41411e84af833a282e/dataflow/flex-templates/pipeline_with_dependencies/main.py#L35
> , and there is no need to save the main session. we only need to make sure
> that the relevant package, such as `my_package` is installed in the worker
> runtime environment.
>
> On Mon, Nov 4, 2024 at 10:49 AM Henry Tremblay <henry.tremb...@paccar.com>
> wrote:
>
>>
>>
>>
>>
>> *From:* Valentyn Tymofieiev <valen...@google.com>
>> *Sent:* Monday, November 4, 2024 10:36 AM
>> *To:* user@beam.apache.org
>> *Cc:* Henry Tremblay <henry.tremb...@paccar.com>
>> *Subject:* Re: Solution to import problem
>>
>>
>>
>> You don't often get email from valen...@google.com. Learn why this is
>> important <https://aka.ms/LearnAboutSenderIdentification>
>>
>>
>>
>> What matters is the package must be installed on the image running on the
>> worker. You can manually install the package during image building
>> (preferred), or you can tell Beam to install the package for you, if you
>> provide --extra_package or --setup_file options.
>>
>>
>>
>> When you say package, this can mean different things: for example, the
>> secretmanager library, your own libraries, and the functions in the main.py
>> file (where the pipleline is created). What happens if I don’t set
>> pipeline_options.view_as(SetupOptions).save_main_session to true?
>>
>>
>>
>> On Mon, Nov 4, 2024 at 8:23 AM Henry Tremblay via user <
>> user@beam.apache.org> wrote:
>>
>> My launcher image has:
>>
>>
>>
>> ENTRYPOINT ["/opt/google/dataflow/python_template_launcher"]
>>
>>
>>
>> If I use this as my sdk_harness_image, as I did before, the job will
>> hang, probably because the worker is trying to launch a job. So I need a
>> separate image with:
>>
>>
>>
>> ENTRYPOINT ["/opt/apache/beam/boot"]
>>
>>
>>
>> I think Google suggests that you actually include the ENTRYPOINT as an
>> arg in the Dockerfile, so you can use the same image, and then build it
>> twice with a different argument.
>>
>>
>>
>> I also don’t need:
>>
>>
>>
>> ENV FLEX_TEMPLATE_PYTHON_PY_FILE="${BASE}/${PY_FILE}"
>>
>> ENV FLEX_TEMPLATE_PYTHON_SETUP_FILE=${BASE}/$SETUP
>>
>> In my worker image.
>>
>>
>>
>> What is not completely clear to me is why you need the setup.py to run on
>> the launcher image, and not the worker. Also, if you need:
>>
>>
>>
>> pipeline_options.view_as(SetupOptions).save_main_session =
>> save_main_session
>>
>>
>>
>> That line is supposed to save the code in the main session and transfer
>> it to the worker (via pickle), but I am wondering if you need this at all.
>>
>>
>>
>>
>>
>> *From:* XQ Hu via user <user@beam.apache.org>
>> *Sent:* Monday, November 4, 2024 6:31 AM
>> *To:* user@beam.apache.org
>> *Cc:* XQ Hu <x...@google.com>
>> *Subject:* Re: Solution to import problem
>>
>>
>>
>> For ENTRYPOINT, as long as your image copies the launcher file (like
>> https://github.com/GoogleCloudPlatform/python-docs-samples/blob/main/dataflow/flex-templates/pipeline_with_dependencies/Dockerfile#L38),
>> you can just do `ENTRYPOINT ["/opt/apache/beam/boot"]`.
>>
>> Again, using one container image is more convenient if you start managing
>> more Python package depecdencies.
>>
>>
>>
>> On Mon, Nov 4, 2024 at 1:16 AM Henry Tremblay <paulhtremb...@gmail.com>
>> wrote:
>>
>> Sorry, yes, you are correct, though Google does not document this.
>>
>>
>>
>> 1. Formerly I can import pyscog and requests because they are in the
>> worker image you linked to.
>>
>> 2. secretmanager cannot be imported because it is not in the worker image.
>>
>> 3. passing the parameter --parameters
>> sdk_container_image=$IMAGE_URL_WORKER causes the worker to use the
>> pre-built image
>>
>> 4. I cannot use the same Docker image for both launcher and worker
>> because of the ENTRYPOINT
>>
>>
>>
>> On Sun, Nov 3, 2024 at 1:53 PM XQ Hu via user <user@beam.apache.org>
>> wrote:
>>
>> I think the problem is you do not specify sdk_container_image when
>> running your template.
>>
>>
>>
>>
>> https://github.com/GoogleCloudPlatform/python-docs-samples/tree/main/dataflow/flex-templates/pipeline_with_dependencies#run-the-template
>> has more details.
>>
>>
>>
>> Basically, you do not need to
>> https://github.com/paulhtremblay/data-engineering/blob/main/dataflow_/flex_proj_with_secret_manager/Dockerfile
>> for your template launcher.
>>
>>
>>
>> You can use the same image for both launcher and dataflow workers. You
>> only need to copy python_template_launcher to your image like
>> https://github.com/GoogleCloudPlatform/python-docs-samples/blob/main/dataflow/flex-templates/pipeline_with_dependencies/Dockerfile#L38.
>> Then you can use this image for both  launcher and dataflow workers. When
>> running the template job, you need to add --parameters
>> sdk_container_image=$SDK_CONTAINER_IMAGE.
>>
>>
>>
>>
>>
>> On Sun, Nov 3, 2024 at 4:18 PM Henry Tremblay <paulhtremb...@gmail.com>
>> wrote:
>>
>> A few weeks ago I had posted a problem I had with importing the Google
>> Cloud Secret Manager library in Python.
>>
>>
>>
>> Here is the problem and solution:
>>
>>
>>
>>
>> https://github.com/paulhtremblay/data-engineering/tree/main/dataflow_/flex_proj_with_secret_manager
>>
>>
>>
>> --
>>
>> Henry Tremblay
>>
>> Data Engineer, Best Buy
>>
>>
>>
>>
>> --
>>
>> Henry Tremblay
>>
>> Data Engineer, Best Buy
>>
>>

Reply via email to