Sorry, i cheered up too early
i can successfully build the image however, at runtime the code fails
always with this exception and i cannot figure out why

i mimicked the sample directory structure


---- mypackage
   --- __init__,py
       dftester.py
       obb_utils.py

dataflow_tester_main.py

this is the content of my dataflow_tester_main.py

from mypackage import dftester
import logging
if __name__ == '__main__':
  logging.getLogger().setLevel(logging.INFO)
  dftester.run()


and this is my dockerfile

https://github.com/mmistroni/GCP_Experiments/blob/master/dataflow/shareloader/Dockerfile_tester

and at the bottom if this email my exception
I am puzzled on where the error is coming from as i have almost copied this
sample
https://github.com/GoogleCloudPlatform/python-docs-samples/blob/main/dataflow/flex-templates/pipeline_with_dependencies/main.py

thanks and regards
 Marco











Traceback (most recent call last): File
"/usr/local/lib/python3.11/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
line 115, in create_harness _load_main_session(semi_persistent_directory)
File
"/usr/local/lib/python3.11/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
line 354, in _load_main_session pickler.load_session(session_file) File
"/usr/local/lib/python3.11/site-packages/apache_beam/internal/pickler.py",
line 65, in load_session return desired_pickle_lib.load_session(file_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File
"/usr/local/lib/python3.11/site-packages/apache_beam/internal/dill_pickler.py",
line 446, in load_session return dill.load_session(file_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File
"/usr/local/lib/python3.11/site-packages/dill/_dill.py", line 368, in
load_session module = unpickler.load() ^^^^^^^^^^^^^^^^ File
"/usr/local/lib/python3.11/site-packages/dill/_dill.py", line 472, in load
obj = StockUnpickler.load(self) ^^^^^^^^^^^^^^^^^^^^^^^^^ File
"/usr/local/lib/python3.11/site-packages/dill/_dill.py", line 462, in
find_class return StockUnpickler.find_class(self, module, name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ModuleNotFoundError: No
module named 'modules'







On Fri, Jun 14, 2024 at 5:52 AM Sofia’s World <[email protected]> wrote:

> Many thanks Hu, worked like a charm
>
> few qq
> so in my reqs.txt i should put all beam requirements PLUS my own?
>
> and in the setup.py, shall i just declare
>
> "apache-beam[gcp]==2.54.0",  # Must match the version in `Dockerfile``.
>
> thanks and kind regards
> Marco
>
>
>
>
>
>
> On Wed, Jun 12, 2024 at 1:48 PM XQ Hu <[email protected]> wrote:
>
>> Any reason to use this?
>>
>> RUN pip install avro-python3 pyarrow==0.15.1 apache-beam[gcp]==2.30.0
>>  pandas-datareader==0.9.0
>>
>> It is typically recommended to use the latest Beam and build the docker
>> image using the requirements released for each Beam, for example,
>> https://github.com/apache/beam/blob/release-2.56.0/sdks/python/container/py311/base_image_requirements.txt
>>
>> On Wed, Jun 12, 2024 at 1:31 AM Sofia’s World <[email protected]>
>> wrote:
>>
>>> Sure, apologies, it crossed my mind it would have been useful to refert
>>> to it
>>>
>>> so this is the docker file
>>>
>>>
>>> https://github.com/mmistroni/GCP_Experiments/edit/master/dataflow/shareloader/Dockerfile_tester
>>>
>>> I was using a setup.py as well, but then i commented out the usage in
>>> the dockerfile after checking some flex templates which said it is not
>>> needed
>>>
>>>
>>> https://github.com/mmistroni/GCP_Experiments/blob/master/dataflow/shareloader/setup_dftester.py
>>>
>>> thanks in advance
>>>  Marco
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Jun 11, 2024 at 10:54 PM XQ Hu <[email protected]> wrote:
>>>
>>>> Can you share your Dockerfile?
>>>>
>>>> On Tue, Jun 11, 2024 at 4:43 PM Sofia’s World <[email protected]>
>>>> wrote:
>>>>
>>>>> thanks all,  it seemed to work but now i am getting a different
>>>>> problem, having issues in building pyarrow...
>>>>>
>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":       
>>>>> <string>:36: DeprecationWarning: pkg_resources is deprecated as an API. 
>>>>> See https://setuptools.pypa.io/en/latest/pkg_resources.html
>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":       
>>>>> WARNING setuptools_scm.pyproject_reading toml section missing 
>>>>> 'pyproject.toml does not contain a tool.setuptools_scm section'
>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":       
>>>>> Traceback (most recent call last):
>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":       
>>>>>   File 
>>>>> "/tmp/pip-build-env-meihcxsp/overlay/lib/python3.11/site-packages/setuptools_scm/_integration/pyproject_reading.py",
>>>>>  line 36, in read_pyproject
>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":       
>>>>>     section = defn.get("tool", {})[tool_name]
>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":       
>>>>>               ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^
>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":       
>>>>> KeyError: 'setuptools_scm'
>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":       
>>>>> running bdist_wheel
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> It is somehow getting messed up with a toml ?
>>>>>
>>>>>
>>>>> Could anyone advise?
>>>>>
>>>>> thanks
>>>>>
>>>>>  Marco
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Jun 11, 2024 at 1:00 AM XQ Hu via user <[email protected]>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> https://github.com/GoogleCloudPlatform/python-docs-samples/tree/main/dataflow/flex-templates/pipeline_with_dependencies
>>>>>> is a great example.
>>>>>>
>>>>>> On Mon, Jun 10, 2024 at 4:28 PM Valentyn Tymofieiev via user <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> In this case the Python version will be defined by the Python
>>>>>>> version installed in the docker image of your flex template. So, you'd
>>>>>>> have to build your flex template from a base image with Python 3.11.
>>>>>>>
>>>>>>> On Mon, Jun 10, 2024 at 12:50 PM Sofia’s World <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hello
>>>>>>>>  no i am running my pipelien on  GCP directly via a flex template,
>>>>>>>> configured using a Docker file
>>>>>>>> Any chances to do something in the Dockerfile to force the version
>>>>>>>> at runtime?
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>> On Mon, Jun 10, 2024 at 7:24 PM Anand Inguva via user <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> Are you running your pipeline from the python 3.11 environment?
>>>>>>>>> If you are running from a python 3.11 environment and don't use a 
>>>>>>>>> custom
>>>>>>>>> docker container image, DataflowRunner(Assuming Apache Beam on GCP 
>>>>>>>>> means
>>>>>>>>> Apache Beam on DataflowRunner), will use Python 3.11.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Anand
>>>>>>>>>
>>>>>>>>

Reply via email to