Thanks. It appears that i did not read fully the documentation and i missed
this in my dataflow flex-template run
, '--parameters'
, 'sdk_container_image=$_SDK_CONTAINER_IMAGE'
All my other jobs use a dodgy docker file which does not require the
parameter above...
I should be fine for the time being, at least my pipeline is not plagued
anymore by import errors
thanks all for help ing out
kind regards
Marco
On Sun, Jun 16, 2024 at 6:27 PM Utkarsh Parekh <[email protected]>
wrote:
> You have “mypackage” incorrectly built. Please check and confirm that.
>
> Utkarsh
>
> On Sun, Jun 16, 2024 at 12:48 PM Sofia’s World <[email protected]>
> wrote:
>
>> Error is same...- see bottom -
>> i have tried to ssh in the container and the directory is setup as
>> expected...... so not quite sure where the issue is
>> i will try to start from the pipeline with dependencies sample and work
>> out from there w.o bothering the list
>>
>> thanks again for following up
>> Marco
>>
>> Could not load main session. Inspect which external dependencies are used
>> in the main module of your pipeline. Verify that corresponding packages are
>> installed in the pipeline runtime environment and their installed versions
>> match the versions used in pipeline submission environment. For more
>> information, see:
>> https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/
>> Traceback (most recent call last): File
>> "/usr/local/lib/python3.11/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
>> line 115, in create_harness _load_main_session(semi_persistent_directory)
>> File
>> "/usr/local/lib/python3.11/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
>> line 354, in _load_main_session pickler.load_session(session_file) File
>> "/usr/local/lib/python3.11/site-packages/apache_beam/internal/pickler.py",
>> line 65, in load_session return desired_pickle_lib.load_session(file_path)
>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File
>> "/usr/local/lib/python3.11/site-packages/apache_beam/internal/dill_pickler.py",
>> line 446, in load_session return dill.load_session(file_path)
>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File
>> "/usr/local/lib/python3.11/site-packages/dill/_dill.py", line 368, in
>> load_session module = unpickler.load() ^^^^^^^^^^^^^^^^ File
>> "/usr/local/lib/python3.11/site-packages/dill/_dill.py", line 472, in load
>> obj = StockUnpickler.load(self) ^^^^^^^^^^^^^^^^^^^^^^^^^ File
>> "/usr/local/lib/python3.11/site-packages/dill/_dill.py", line 827, in
>> _import_module return getattr(__import__(module, None, None, [obj]), obj)
>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ModuleNotFoundError: No module named
>> 'mypackage'
>>
>>
>>
>> On Sun, 16 Jun 2024, 14:50 XQ Hu via user, <[email protected]> wrote:
>>
>>> What is the error message now?
>>> You can easily ssh to your docker container and check everything is
>>> installed correctly by:
>>> docker run --rm -it --entrypoint=/bin/bash $CUSTOM_CONTAINER_IMAGE
>>>
>>>
>>> On Sun, Jun 16, 2024 at 5:18 AM Sofia’s World <[email protected]>
>>> wrote:
>>>
>>>> Valentin, many thanks... i actually spotted the reference in teh setup
>>>> file
>>>> However , after correcting it, i am still at square 1 where somehow my
>>>> runtime environment does not see it.. so i added some debugging to my
>>>> Dockerfile to check if i forgot to copy something,
>>>> and here's the output, where i can see the mypackage has been copied
>>>>
>>>> here's my directory structure
>>>>
>>>> ---- mypackage
>>>> __init__.py
>>>> obbutils.py
>>>> launcher.py
>>>> __init__.py
>>>> dataflow_tester.py
>>>> setup_dftester.py (copied to setup.py)
>>>>
>>>> i can see the directory structure has been maintained when i copy my
>>>> files to docker as i added some debug to my dockerfile
>>>>
>>>> Step #0 - "dftester-image": Removing intermediate container 4c4e763289d2
>>>> Step #0 - "dftester-image": ---> cda378f70a9e
>>>> Step #0 - "dftester-image": Step 6/23 : COPY requirements.txt .
>>>> Step #0 - "dftester-image": ---> 9a43da08b013
>>>> Step #0 - "dftester-image": Step 7/23 : COPY setup_dftester.py setup.py
>>>> Step #0 - "dftester-image": ---> 5a6bf71df052
>>>> Step #0 - "dftester-image": Step 8/23 : COPY dataflow_tester.py .
>>>> Step #0 - "dftester-image": ---> 82cfe1f1f9ed
>>>> Step #0 - "dftester-image": Step 9/23 : COPY mypackage mypackage
>>>> Step #0 - "dftester-image": ---> d86497b791d0
>>>> Step #0 - "dftester-image": Step 10/23 : COPY __init__.py
>>>> ${WORKDIR}/__init__.py
>>>> Step #0 - "dftester-image": ---> 337d149d64c7
>>>> Step #0 - "dftester-image": Step 11/23 : RUN echo '----- listing
>>>> workdir'
>>>> Step #0 - "dftester-image": ---> Running in 9d97d8a64319
>>>> Step #0 - "dftester-image": ----- listing workdir
>>>> Step #0 - "dftester-image": Removing intermediate container 9d97d8a64319
>>>> Step #0 - "dftester-image": ---> bc9a6a2aa462
>>>> Step #0 - "dftester-image": Step 12/23 : RUN ls -la ${WORKDIR}
>>>> Step #0 - "dftester-image": ---> Running in cf164108f9d6
>>>> Step #0 - "dftester-image": total 24
>>>> Step #0 - "dftester-image": drwxr-xr-x 1 root root 4096 Jun 16 08:59 .
>>>> Step #0 - "dftester-image": drwxr-xr-x 1 root root 4096 Jun 16 08:59 ..
>>>> Step #0 - "dftester-image": -rw-r--r-- 1 root root 0 Jun 16 08:57
>>>> __init__.py
>>>> Step #0 - "dftester-image": -rw-r--r-- 1 root root 135 Jun 16 08:57
>>>> dataflow_tester.py
>>>> Step #0 - "dftester-image": drwxr-xr-x 2 root root 4096 Jun 16 08:59
>>>> mypackage
>>>> Step #0 - "dftester-image": -rw-r--r-- 1 root root 64 Jun 16 08:57
>>>> requirements.txt
>>>> Step #0 - "dftester-image": -rw-r--r-- 1 root root 736 Jun 16 08:57
>>>> setup.py
>>>> Step #0 - "dftester-image": Removing intermediate container cf164108f9d6
>>>> Step #0 - "dftester-image": ---> eb1a080b7948
>>>> Step #0 - "dftester-image": Step 13/23 : RUN echo '--- listing modules
>>>> -----'
>>>> Step #0 - "dftester-image": ---> Running in 884f03dd81d6
>>>> Step #0 - "dftester-image": --- listing modules -----
>>>> Step #0 - "dftester-image": Removing intermediate container 884f03dd81d6
>>>> Step #0 - "dftester-image": ---> 9f6f7e27bd2f
>>>> Step #0 - "dftester-image": Step 14/23 : RUN ls -la
>>>> ${WORKDIR}/mypackage
>>>> Step #0 - "dftester-image": ---> Running in bd74ade37010
>>>> Step #0 - "dftester-image": total 16
>>>> Step #0 - "dftester-image": drwxr-xr-x 2 root root 4096 Jun 16 08:59 .
>>>> Step #0 - "dftester-image": drwxr-xr-x 1 root root 4096 Jun 16 08:59 ..
>>>> Step #0 - "dftester-image": -rw-r--r-- 1 root root 0 Jun 16 08:57
>>>> __init__.py
>>>> Step #0 - "dftester-image": -rw-r--r-- 1 root root 1442 Jun 16 08:57
>>>> launcher.py
>>>> Step #0 - "dftester-image": -rw-r--r-- 1 root root 607 Jun 16 08:57
>>>> obb_utils.py
>>>> Step #0 - "dftester-image": Removing intermediate container bd74ade37010
>>>>
>>>>
>>>> i have this in my setup.py
>>>>
>>>> REQUIRED_PACKAGES = [
>>>> 'openbb',
>>>> "apache-beam[gcp]", # Must match the version in `Dockerfile``.
>>>> 'sendgrid',
>>>> 'pandas_datareader',
>>>> 'vaderSentiment',
>>>> 'numpy',
>>>> 'bs4',
>>>> 'lxml',
>>>> 'pandas_datareader',
>>>> 'beautifulsoup4',
>>>> 'xlrd',
>>>> 'openpyxl'
>>>> ]
>>>>
>>>>
>>>> setuptools.setup(
>>>> name='mypackage',
>>>> version='0.0.1',
>>>> description='Shres Runner Package.',
>>>> install_requires=REQUIRED_PACKAGES,
>>>> packages=setuptools.find_packages()
>>>> )
>>>>
>>>>
>>>> and this is my dataflow_tester.py
>>>>
>>>> from mypackage import launcher
>>>> import logging
>>>> if __name__ == '__main__':
>>>> logging.getLogger().setLevel(logging.INFO)
>>>> launcher.run()
>>>>
>>>>
>>>>
>>>> have compared my setup vs
>>>> https://github.com/GoogleCloudPlatform/python-docs-samples/tree/main/dataflow/flex-templates/pipeline_with_dependencies
>>>> and all looks the same (apart from my copying the __init__.,py fromo
>>>> the directory where the main file(dataflow_tester.py) resides
>>>>
>>>> Would you know how else can i debug what is going on and why my
>>>> mypackages subdirectory is not being seen?
>>>>
>>>> Kind regars
>>>> Marco
>>>>
>>>>
>>>>
>>>>
>>>> On Sat, Jun 15, 2024 at 7:27 PM Valentyn Tymofieiev via user <
>>>> [email protected]> wrote:
>>>>
>>>>> Your pipeline launcher refers to a package named 'modules', but this
>>>>> package is not available in the runtime environment.
>>>>>
>>>>> On Sat, Jun 15, 2024 at 11:17 AM Sofia’s World <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Sorry, i cheered up too early
>>>>>> i can successfully build the image however, at runtime the code fails
>>>>>> always with this exception and i cannot figure out why
>>>>>>
>>>>>> i mimicked the sample directory structure
>>>>>>
>>>>>>
>>>>>> ---- mypackage
>>>>>> --- __init__,py
>>>>>> dftester.py
>>>>>> obb_utils.py
>>>>>>
>>>>>> dataflow_tester_main.py
>>>>>>
>>>>>> this is the content of my dataflow_tester_main.py
>>>>>>
>>>>>> from mypackage import dftester
>>>>>> import logging
>>>>>> if __name__ == '__main__':
>>>>>> logging.getLogger().setLevel(logging.INFO)
>>>>>> dftester.run()
>>>>>>
>>>>>>
>>>>>> and this is my dockerfile
>>>>>>
>>>>>>
>>>>>> https://github.com/mmistroni/GCP_Experiments/blob/master/dataflow/shareloader/Dockerfile_tester
>>>>>>
>>>>>> and at the bottom if this email my exception
>>>>>> I am puzzled on where the error is coming from as i have almost
>>>>>> copied this sample
>>>>>> https://github.com/GoogleCloudPlatform/python-docs-samples/blob/main/dataflow/flex-templates/pipeline_with_dependencies/main.py
>>>>>>
>>>>>> thanks and regards
>>>>>> Marco
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Traceback (most recent call last): File
>>>>>> "/usr/local/lib/python3.11/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
>>>>>> line 115, in create_harness _load_main_session(semi_persistent_directory)
>>>>>> File
>>>>>> "/usr/local/lib/python3.11/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
>>>>>> line 354, in _load_main_session pickler.load_session(session_file) File
>>>>>> "/usr/local/lib/python3.11/site-packages/apache_beam/internal/pickler.py",
>>>>>> line 65, in load_session return
>>>>>> desired_pickle_lib.load_session(file_path)
>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File
>>>>>> "/usr/local/lib/python3.11/site-packages/apache_beam/internal/dill_pickler.py",
>>>>>> line 446, in load_session return dill.load_session(file_path)
>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File
>>>>>> "/usr/local/lib/python3.11/site-packages/dill/_dill.py", line 368, in
>>>>>> load_session module = unpickler.load() ^^^^^^^^^^^^^^^^ File
>>>>>> "/usr/local/lib/python3.11/site-packages/dill/_dill.py", line 472, in
>>>>>> load
>>>>>> obj = StockUnpickler.load(self) ^^^^^^^^^^^^^^^^^^^^^^^^^ File
>>>>>> "/usr/local/lib/python3.11/site-packages/dill/_dill.py", line 462, in
>>>>>> find_class return StockUnpickler.find_class(self, module, name)
>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ModuleNotFoundError: No
>>>>>> module named 'modules'
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Jun 14, 2024 at 5:52 AM Sofia’s World <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Many thanks Hu, worked like a charm
>>>>>>>
>>>>>>> few qq
>>>>>>> so in my reqs.txt i should put all beam requirements PLUS my own?
>>>>>>>
>>>>>>> and in the setup.py, shall i just declare
>>>>>>>
>>>>>>> "apache-beam[gcp]==2.54.0", # Must match the version in
>>>>>>> `Dockerfile``.
>>>>>>>
>>>>>>> thanks and kind regards
>>>>>>> Marco
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jun 12, 2024 at 1:48 PM XQ Hu <[email protected]> wrote:
>>>>>>>
>>>>>>>> Any reason to use this?
>>>>>>>>
>>>>>>>> RUN pip install avro-python3 pyarrow==0.15.1
>>>>>>>> apache-beam[gcp]==2.30.0 pandas-datareader==0.9.0
>>>>>>>>
>>>>>>>> It is typically recommended to use the latest Beam and build the
>>>>>>>> docker image using the requirements released for each Beam, for
>>>>>>>> example,
>>>>>>>> https://github.com/apache/beam/blob/release-2.56.0/sdks/python/container/py311/base_image_requirements.txt
>>>>>>>>
>>>>>>>> On Wed, Jun 12, 2024 at 1:31 AM Sofia’s World <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Sure, apologies, it crossed my mind it would have been useful to
>>>>>>>>> refert to it
>>>>>>>>>
>>>>>>>>> so this is the docker file
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> https://github.com/mmistroni/GCP_Experiments/edit/master/dataflow/shareloader/Dockerfile_tester
>>>>>>>>>
>>>>>>>>> I was using a setup.py as well, but then i commented out the usage
>>>>>>>>> in the dockerfile after checking some flex templates which said it is
>>>>>>>>> not
>>>>>>>>> needed
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> https://github.com/mmistroni/GCP_Experiments/blob/master/dataflow/shareloader/setup_dftester.py
>>>>>>>>>
>>>>>>>>> thanks in advance
>>>>>>>>> Marco
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Jun 11, 2024 at 10:54 PM XQ Hu <[email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Can you share your Dockerfile?
>>>>>>>>>>
>>>>>>>>>> On Tue, Jun 11, 2024 at 4:43 PM Sofia’s World <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> thanks all, it seemed to work but now i am getting a different
>>>>>>>>>>> problem, having issues in building pyarrow...
>>>>>>>>>>>
>>>>>>>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":
>>>>>>>>>>> <string>:36: DeprecationWarning: pkg_resources is deprecated
>>>>>>>>>>> as an API. See
>>>>>>>>>>> https://setuptools.pypa.io/en/latest/pkg_resources.html
>>>>>>>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":
>>>>>>>>>>> WARNING setuptools_scm.pyproject_reading toml section missing
>>>>>>>>>>> 'pyproject.toml does not contain a tool.setuptools_scm section'
>>>>>>>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":
>>>>>>>>>>> Traceback (most recent call last):
>>>>>>>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":
>>>>>>>>>>> File
>>>>>>>>>>> "/tmp/pip-build-env-meihcxsp/overlay/lib/python3.11/site-packages/setuptools_scm/_integration/pyproject_reading.py",
>>>>>>>>>>> line 36, in read_pyproject
>>>>>>>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":
>>>>>>>>>>> section = defn.get("tool", {})[tool_name]
>>>>>>>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":
>>>>>>>>>>> ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^
>>>>>>>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":
>>>>>>>>>>> KeyError: 'setuptools_scm'
>>>>>>>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":
>>>>>>>>>>> running bdist_wheel
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> It is somehow getting messed up with a toml ?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Could anyone advise?
>>>>>>>>>>>
>>>>>>>>>>> thanks
>>>>>>>>>>>
>>>>>>>>>>> Marco
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jun 11, 2024 at 1:00 AM XQ Hu via user <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> https://github.com/GoogleCloudPlatform/python-docs-samples/tree/main/dataflow/flex-templates/pipeline_with_dependencies
>>>>>>>>>>>> is a great example.
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Jun 10, 2024 at 4:28 PM Valentyn Tymofieiev via user <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> In this case the Python version will be defined by the Python
>>>>>>>>>>>>> version installed in the docker image of your flex template. So,
>>>>>>>>>>>>> you'd
>>>>>>>>>>>>> have to build your flex template from a base image with Python
>>>>>>>>>>>>> 3.11.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Jun 10, 2024 at 12:50 PM Sofia’s World <
>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hello
>>>>>>>>>>>>>> no i am running my pipelien on GCP directly via a flex
>>>>>>>>>>>>>> template, configured using a Docker file
>>>>>>>>>>>>>> Any chances to do something in the Dockerfile to force the
>>>>>>>>>>>>>> version at runtime?
>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Jun 10, 2024 at 7:24 PM Anand Inguva via user <
>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Are you running your pipeline from the python 3.11
>>>>>>>>>>>>>>> environment? If you are running from a python 3.11 environment
>>>>>>>>>>>>>>> and don't
>>>>>>>>>>>>>>> use a custom docker container image, DataflowRunner(Assuming
>>>>>>>>>>>>>>> Apache Beam on
>>>>>>>>>>>>>>> GCP means Apache Beam on DataflowRunner), will use Python 3.11.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Anand
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>