Valentin, many thanks... i actually spotted the reference in teh setup file
However , after correcting it, i am still at square 1 where somehow my
runtime environment does not see it.. so i added some debugging to my
Dockerfile to check if i forgot to copy something,
and here's the output, where i can see the mypackage has been copied
here's my directory structure
---- mypackage
__init__.py
obbutils.py
launcher.py
__init__.py
dataflow_tester.py
setup_dftester.py (copied to setup.py)
i can see the directory structure has been maintained when i copy my files
to docker as i added some debug to my dockerfile
Step #0 - "dftester-image": Removing intermediate container 4c4e763289d2
Step #0 - "dftester-image": ---> cda378f70a9e
Step #0 - "dftester-image": Step 6/23 : COPY requirements.txt .
Step #0 - "dftester-image": ---> 9a43da08b013
Step #0 - "dftester-image": Step 7/23 : COPY setup_dftester.py setup.py
Step #0 - "dftester-image": ---> 5a6bf71df052
Step #0 - "dftester-image": Step 8/23 : COPY dataflow_tester.py .
Step #0 - "dftester-image": ---> 82cfe1f1f9ed
Step #0 - "dftester-image": Step 9/23 : COPY mypackage mypackage
Step #0 - "dftester-image": ---> d86497b791d0
Step #0 - "dftester-image": Step 10/23 : COPY __init__.py
${WORKDIR}/__init__.py
Step #0 - "dftester-image": ---> 337d149d64c7
Step #0 - "dftester-image": Step 11/23 : RUN echo '----- listing workdir'
Step #0 - "dftester-image": ---> Running in 9d97d8a64319
Step #0 - "dftester-image": ----- listing workdir
Step #0 - "dftester-image": Removing intermediate container 9d97d8a64319
Step #0 - "dftester-image": ---> bc9a6a2aa462
Step #0 - "dftester-image": Step 12/23 : RUN ls -la ${WORKDIR}
Step #0 - "dftester-image": ---> Running in cf164108f9d6
Step #0 - "dftester-image": total 24
Step #0 - "dftester-image": drwxr-xr-x 1 root root 4096 Jun 16 08:59 .
Step #0 - "dftester-image": drwxr-xr-x 1 root root 4096 Jun 16 08:59 ..
Step #0 - "dftester-image": -rw-r--r-- 1 root root 0 Jun 16 08:57
__init__.py
Step #0 - "dftester-image": -rw-r--r-- 1 root root 135 Jun 16 08:57
dataflow_tester.py
Step #0 - "dftester-image": drwxr-xr-x 2 root root 4096 Jun 16 08:59
mypackage
Step #0 - "dftester-image": -rw-r--r-- 1 root root 64 Jun 16 08:57
requirements.txt
Step #0 - "dftester-image": -rw-r--r-- 1 root root 736 Jun 16 08:57
setup.py
Step #0 - "dftester-image": Removing intermediate container cf164108f9d6
Step #0 - "dftester-image": ---> eb1a080b7948
Step #0 - "dftester-image": Step 13/23 : RUN echo '--- listing modules
-----'
Step #0 - "dftester-image": ---> Running in 884f03dd81d6
Step #0 - "dftester-image": --- listing modules -----
Step #0 - "dftester-image": Removing intermediate container 884f03dd81d6
Step #0 - "dftester-image": ---> 9f6f7e27bd2f
Step #0 - "dftester-image": Step 14/23 : RUN ls -la ${WORKDIR}/mypackage
Step #0 - "dftester-image": ---> Running in bd74ade37010
Step #0 - "dftester-image": total 16
Step #0 - "dftester-image": drwxr-xr-x 2 root root 4096 Jun 16 08:59 .
Step #0 - "dftester-image": drwxr-xr-x 1 root root 4096 Jun 16 08:59 ..
Step #0 - "dftester-image": -rw-r--r-- 1 root root 0 Jun 16 08:57
__init__.py
Step #0 - "dftester-image": -rw-r--r-- 1 root root 1442 Jun 16 08:57
launcher.py
Step #0 - "dftester-image": -rw-r--r-- 1 root root 607 Jun 16 08:57
obb_utils.py
Step #0 - "dftester-image": Removing intermediate container bd74ade37010
i have this in my setup.py
REQUIRED_PACKAGES = [
'openbb',
"apache-beam[gcp]", # Must match the version in `Dockerfile``.
'sendgrid',
'pandas_datareader',
'vaderSentiment',
'numpy',
'bs4',
'lxml',
'pandas_datareader',
'beautifulsoup4',
'xlrd',
'openpyxl'
]
setuptools.setup(
name='mypackage',
version='0.0.1',
description='Shres Runner Package.',
install_requires=REQUIRED_PACKAGES,
packages=setuptools.find_packages()
)
and this is my dataflow_tester.py
from mypackage import launcher
import logging
if __name__ == '__main__':
logging.getLogger().setLevel(logging.INFO)
launcher.run()
have compared my setup vs
https://github.com/GoogleCloudPlatform/python-docs-samples/tree/main/dataflow/flex-templates/pipeline_with_dependencies
and all looks the same (apart from my copying the __init__.,py fromo the
directory where the main file(dataflow_tester.py) resides
Would you know how else can i debug what is going on and why my mypackages
subdirectory is not being seen?
Kind regars
Marco
On Sat, Jun 15, 2024 at 7:27 PM Valentyn Tymofieiev via user <
[email protected]> wrote:
> Your pipeline launcher refers to a package named 'modules', but this
> package is not available in the runtime environment.
>
> On Sat, Jun 15, 2024 at 11:17 AM Sofia’s World <[email protected]>
> wrote:
>
>> Sorry, i cheered up too early
>> i can successfully build the image however, at runtime the code fails
>> always with this exception and i cannot figure out why
>>
>> i mimicked the sample directory structure
>>
>>
>> ---- mypackage
>> --- __init__,py
>> dftester.py
>> obb_utils.py
>>
>> dataflow_tester_main.py
>>
>> this is the content of my dataflow_tester_main.py
>>
>> from mypackage import dftester
>> import logging
>> if __name__ == '__main__':
>> logging.getLogger().setLevel(logging.INFO)
>> dftester.run()
>>
>>
>> and this is my dockerfile
>>
>>
>> https://github.com/mmistroni/GCP_Experiments/blob/master/dataflow/shareloader/Dockerfile_tester
>>
>> and at the bottom if this email my exception
>> I am puzzled on where the error is coming from as i have almost copied
>> this sample
>> https://github.com/GoogleCloudPlatform/python-docs-samples/blob/main/dataflow/flex-templates/pipeline_with_dependencies/main.py
>>
>> thanks and regards
>> Marco
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Traceback (most recent call last): File
>> "/usr/local/lib/python3.11/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
>> line 115, in create_harness _load_main_session(semi_persistent_directory)
>> File
>> "/usr/local/lib/python3.11/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
>> line 354, in _load_main_session pickler.load_session(session_file) File
>> "/usr/local/lib/python3.11/site-packages/apache_beam/internal/pickler.py",
>> line 65, in load_session return desired_pickle_lib.load_session(file_path)
>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File
>> "/usr/local/lib/python3.11/site-packages/apache_beam/internal/dill_pickler.py",
>> line 446, in load_session return dill.load_session(file_path)
>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File
>> "/usr/local/lib/python3.11/site-packages/dill/_dill.py", line 368, in
>> load_session module = unpickler.load() ^^^^^^^^^^^^^^^^ File
>> "/usr/local/lib/python3.11/site-packages/dill/_dill.py", line 472, in load
>> obj = StockUnpickler.load(self) ^^^^^^^^^^^^^^^^^^^^^^^^^ File
>> "/usr/local/lib/python3.11/site-packages/dill/_dill.py", line 462, in
>> find_class return StockUnpickler.find_class(self, module, name)
>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ModuleNotFoundError: No
>> module named 'modules'
>>
>>
>>
>>
>>
>>
>>
>> On Fri, Jun 14, 2024 at 5:52 AM Sofia’s World <[email protected]>
>> wrote:
>>
>>> Many thanks Hu, worked like a charm
>>>
>>> few qq
>>> so in my reqs.txt i should put all beam requirements PLUS my own?
>>>
>>> and in the setup.py, shall i just declare
>>>
>>> "apache-beam[gcp]==2.54.0", # Must match the version in `Dockerfile``.
>>>
>>> thanks and kind regards
>>> Marco
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Jun 12, 2024 at 1:48 PM XQ Hu <[email protected]> wrote:
>>>
>>>> Any reason to use this?
>>>>
>>>> RUN pip install avro-python3 pyarrow==0.15.1 apache-beam[gcp]==2.30.0
>>>> pandas-datareader==0.9.0
>>>>
>>>> It is typically recommended to use the latest Beam and build the docker
>>>> image using the requirements released for each Beam, for example,
>>>> https://github.com/apache/beam/blob/release-2.56.0/sdks/python/container/py311/base_image_requirements.txt
>>>>
>>>> On Wed, Jun 12, 2024 at 1:31 AM Sofia’s World <[email protected]>
>>>> wrote:
>>>>
>>>>> Sure, apologies, it crossed my mind it would have been useful to
>>>>> refert to it
>>>>>
>>>>> so this is the docker file
>>>>>
>>>>>
>>>>> https://github.com/mmistroni/GCP_Experiments/edit/master/dataflow/shareloader/Dockerfile_tester
>>>>>
>>>>> I was using a setup.py as well, but then i commented out the usage in
>>>>> the dockerfile after checking some flex templates which said it is not
>>>>> needed
>>>>>
>>>>>
>>>>> https://github.com/mmistroni/GCP_Experiments/blob/master/dataflow/shareloader/setup_dftester.py
>>>>>
>>>>> thanks in advance
>>>>> Marco
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Jun 11, 2024 at 10:54 PM XQ Hu <[email protected]> wrote:
>>>>>
>>>>>> Can you share your Dockerfile?
>>>>>>
>>>>>> On Tue, Jun 11, 2024 at 4:43 PM Sofia’s World <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> thanks all, it seemed to work but now i am getting a different
>>>>>>> problem, having issues in building pyarrow...
>>>>>>>
>>>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":
>>>>>>> <string>:36: DeprecationWarning: pkg_resources is deprecated as an
>>>>>>> API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
>>>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":
>>>>>>> WARNING setuptools_scm.pyproject_reading toml section missing
>>>>>>> 'pyproject.toml does not contain a tool.setuptools_scm section'
>>>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":
>>>>>>> Traceback (most recent call last):
>>>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":
>>>>>>> File
>>>>>>> "/tmp/pip-build-env-meihcxsp/overlay/lib/python3.11/site-packages/setuptools_scm/_integration/pyproject_reading.py",
>>>>>>> line 36, in read_pyproject
>>>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":
>>>>>>> section = defn.get("tool", {})[tool_name]
>>>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":
>>>>>>> ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^
>>>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":
>>>>>>> KeyError: 'setuptools_scm'
>>>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":
>>>>>>> running bdist_wheel
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> It is somehow getting messed up with a toml ?
>>>>>>>
>>>>>>>
>>>>>>> Could anyone advise?
>>>>>>>
>>>>>>> thanks
>>>>>>>
>>>>>>> Marco
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jun 11, 2024 at 1:00 AM XQ Hu via user <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> https://github.com/GoogleCloudPlatform/python-docs-samples/tree/main/dataflow/flex-templates/pipeline_with_dependencies
>>>>>>>> is a great example.
>>>>>>>>
>>>>>>>> On Mon, Jun 10, 2024 at 4:28 PM Valentyn Tymofieiev via user <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> In this case the Python version will be defined by the Python
>>>>>>>>> version installed in the docker image of your flex template. So, you'd
>>>>>>>>> have to build your flex template from a base image with Python 3.11.
>>>>>>>>>
>>>>>>>>> On Mon, Jun 10, 2024 at 12:50 PM Sofia’s World <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Hello
>>>>>>>>>> no i am running my pipelien on GCP directly via a flex
>>>>>>>>>> template, configured using a Docker file
>>>>>>>>>> Any chances to do something in the Dockerfile to force the
>>>>>>>>>> version at runtime?
>>>>>>>>>> Thanks
>>>>>>>>>>
>>>>>>>>>> On Mon, Jun 10, 2024 at 7:24 PM Anand Inguva via user <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hello,
>>>>>>>>>>>
>>>>>>>>>>> Are you running your pipeline from the python 3.11 environment?
>>>>>>>>>>> If you are running from a python 3.11 environment and don't use a
>>>>>>>>>>> custom
>>>>>>>>>>> docker container image, DataflowRunner(Assuming Apache Beam on GCP
>>>>>>>>>>> means
>>>>>>>>>>> Apache Beam on DataflowRunner), will use Python 3.11.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Anand
>>>>>>>>>>>
>>>>>>>>>>