Hi Henry, Could you share the Google support ticket number?
XQ On Mon, Nov 4, 2024 at 2:03 PM Valentyn Tymofieiev via user < user@beam.apache.org> wrote: > I meant python packages broadly, such as third-party dependencies of your > pipeline, or the package that has the modules comprising your pipeline. > > > What happens if I don’t set > pipeline_options.view_as(SetupOptions).save_main_session to true? > > Then we don't save and load the content of main session on the worker. > See: > https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/#pickling-and-managing-the-main-session > > Saving main session might be necessary when your main entrypoint code is > importing functions/modules used in your pipeline. When your pipeline is in > a separate package, your mnain entrypoint can be minimalistic, like so: > https://github.com/GoogleCloudPlatform/python-docs-samples/blob/4fe5495e7bfc7d7bc3c3dd41411e84af833a282e/dataflow/flex-templates/pipeline_with_dependencies/main.py#L35 > , and there is no need to save the main session. we only need to make sure > that the relevant package, such as `my_package` is installed in the worker > runtime environment. > > On Mon, Nov 4, 2024 at 10:49 AM Henry Tremblay <henry.tremb...@paccar.com> > wrote: > >> >> >> >> >> *From:* Valentyn Tymofieiev <valen...@google.com> >> *Sent:* Monday, November 4, 2024 10:36 AM >> *To:* user@beam.apache.org >> *Cc:* Henry Tremblay <henry.tremb...@paccar.com> >> *Subject:* Re: Solution to import problem >> >> >> >> You don't often get email from valen...@google.com. Learn why this is >> important <https://aka.ms/LearnAboutSenderIdentification> >> >> >> >> What matters is the package must be installed on the image running on the >> worker. You can manually install the package during image building >> (preferred), or you can tell Beam to install the package for you, if you >> provide --extra_package or --setup_file options. >> >> >> >> When you say package, this can mean different things: for example, the >> secretmanager library, your own libraries, and the functions in the main.py >> file (where the pipleline is created). What happens if I don’t set >> pipeline_options.view_as(SetupOptions).save_main_session to true? >> >> >> >> On Mon, Nov 4, 2024 at 8:23 AM Henry Tremblay via user < >> user@beam.apache.org> wrote: >> >> My launcher image has: >> >> >> >> ENTRYPOINT ["/opt/google/dataflow/python_template_launcher"] >> >> >> >> If I use this as my sdk_harness_image, as I did before, the job will >> hang, probably because the worker is trying to launch a job. So I need a >> separate image with: >> >> >> >> ENTRYPOINT ["/opt/apache/beam/boot"] >> >> >> >> I think Google suggests that you actually include the ENTRYPOINT as an >> arg in the Dockerfile, so you can use the same image, and then build it >> twice with a different argument. >> >> >> >> I also don’t need: >> >> >> >> ENV FLEX_TEMPLATE_PYTHON_PY_FILE="${BASE}/${PY_FILE}" >> >> ENV FLEX_TEMPLATE_PYTHON_SETUP_FILE=${BASE}/$SETUP >> >> In my worker image. >> >> >> >> What is not completely clear to me is why you need the setup.py to run on >> the launcher image, and not the worker. Also, if you need: >> >> >> >> pipeline_options.view_as(SetupOptions).save_main_session = >> save_main_session >> >> >> >> That line is supposed to save the code in the main session and transfer >> it to the worker (via pickle), but I am wondering if you need this at all. >> >> >> >> >> >> *From:* XQ Hu via user <user@beam.apache.org> >> *Sent:* Monday, November 4, 2024 6:31 AM >> *To:* user@beam.apache.org >> *Cc:* XQ Hu <x...@google.com> >> *Subject:* Re: Solution to import problem >> >> >> >> For ENTRYPOINT, as long as your image copies the launcher file (like >> https://github.com/GoogleCloudPlatform/python-docs-samples/blob/main/dataflow/flex-templates/pipeline_with_dependencies/Dockerfile#L38), >> you can just do `ENTRYPOINT ["/opt/apache/beam/boot"]`. >> >> Again, using one container image is more convenient if you start managing >> more Python package depecdencies. >> >> >> >> On Mon, Nov 4, 2024 at 1:16 AM Henry Tremblay <paulhtremb...@gmail.com> >> wrote: >> >> Sorry, yes, you are correct, though Google does not document this. >> >> >> >> 1. Formerly I can import pyscog and requests because they are in the >> worker image you linked to. >> >> 2. secretmanager cannot be imported because it is not in the worker image. >> >> 3. passing the parameter --parameters >> sdk_container_image=$IMAGE_URL_WORKER causes the worker to use the >> pre-built image >> >> 4. I cannot use the same Docker image for both launcher and worker >> because of the ENTRYPOINT >> >> >> >> On Sun, Nov 3, 2024 at 1:53 PM XQ Hu via user <user@beam.apache.org> >> wrote: >> >> I think the problem is you do not specify sdk_container_image when >> running your template. >> >> >> >> >> https://github.com/GoogleCloudPlatform/python-docs-samples/tree/main/dataflow/flex-templates/pipeline_with_dependencies#run-the-template >> has more details. >> >> >> >> Basically, you do not need to >> https://github.com/paulhtremblay/data-engineering/blob/main/dataflow_/flex_proj_with_secret_manager/Dockerfile >> for your template launcher. >> >> >> >> You can use the same image for both launcher and dataflow workers. You >> only need to copy python_template_launcher to your image like >> https://github.com/GoogleCloudPlatform/python-docs-samples/blob/main/dataflow/flex-templates/pipeline_with_dependencies/Dockerfile#L38. >> Then you can use this image for both launcher and dataflow workers. When >> running the template job, you need to add --parameters >> sdk_container_image=$SDK_CONTAINER_IMAGE. >> >> >> >> >> >> On Sun, Nov 3, 2024 at 4:18 PM Henry Tremblay <paulhtremb...@gmail.com> >> wrote: >> >> A few weeks ago I had posted a problem I had with importing the Google >> Cloud Secret Manager library in Python. >> >> >> >> Here is the problem and solution: >> >> >> >> >> https://github.com/paulhtremblay/data-engineering/tree/main/dataflow_/flex_proj_with_secret_manager >> >> >> >> -- >> >> Henry Tremblay >> >> Data Engineer, Best Buy >> >> >> >> >> -- >> >> Henry Tremblay >> >> Data Engineer, Best Buy >> >>