I recommend to put all top-level dependencies for your pipeline in setup.py
install_requires section, and autogenerate the requirements.txt, which
would then include all transitive dependencies and ensure reproducible
builds.

For approaches to generate the requirements.txt file from top level
requirements specified in the setup.py file, see:
https://github.com/GoogleCloudPlatform/python-docs-samples/tree/main/dataflow/flex-templates/pipeline_with_dependencies#optional-update-the-dependencies-in-the-requirements-file-and-rebuild-the-docker-images
.

Valentyn

On Thu, Jun 13, 2024 at 9:52 PM Sofia’s World <[email protected]> wrote:

> Many thanks Hu, worked like a charm
>
> few qq
> so in my reqs.txt i should put all beam requirements PLUS my own?
>
> and in the setup.py, shall i just declare
>
> "apache-beam[gcp]==2.54.0",  # Must match the version in `Dockerfile``.
>
> thanks and kind regards
> Marco
>
>
>
>
>
>
> On Wed, Jun 12, 2024 at 1:48 PM XQ Hu <[email protected]> wrote:
>
>> Any reason to use this?
>>
>> RUN pip install avro-python3 pyarrow==0.15.1 apache-beam[gcp]==2.30.0
>>  pandas-datareader==0.9.0
>>
>> It is typically recommended to use the latest Beam and build the docker
>> image using the requirements released for each Beam, for example,
>> https://github.com/apache/beam/blob/release-2.56.0/sdks/python/container/py311/base_image_requirements.txt
>>
>> On Wed, Jun 12, 2024 at 1:31 AM Sofia’s World <[email protected]>
>> wrote:
>>
>>> Sure, apologies, it crossed my mind it would have been useful to refert
>>> to it
>>>
>>> so this is the docker file
>>>
>>>
>>> https://github.com/mmistroni/GCP_Experiments/edit/master/dataflow/shareloader/Dockerfile_tester
>>>
>>> I was using a setup.py as well, but then i commented out the usage in
>>> the dockerfile after checking some flex templates which said it is not
>>> needed
>>>
>>>
>>> https://github.com/mmistroni/GCP_Experiments/blob/master/dataflow/shareloader/setup_dftester.py
>>>
>>> thanks in advance
>>>  Marco
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Jun 11, 2024 at 10:54 PM XQ Hu <[email protected]> wrote:
>>>
>>>> Can you share your Dockerfile?
>>>>
>>>> On Tue, Jun 11, 2024 at 4:43 PM Sofia’s World <[email protected]>
>>>> wrote:
>>>>
>>>>> thanks all,  it seemed to work but now i am getting a different
>>>>> problem, having issues in building pyarrow...
>>>>>
>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":       
>>>>> <string>:36: DeprecationWarning: pkg_resources is deprecated as an API. 
>>>>> See https://setuptools.pypa.io/en/latest/pkg_resources.html
>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":       
>>>>> WARNING setuptools_scm.pyproject_reading toml section missing 
>>>>> 'pyproject.toml does not contain a tool.setuptools_scm section'
>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":       
>>>>> Traceback (most recent call last):
>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":       
>>>>>   File 
>>>>> "/tmp/pip-build-env-meihcxsp/overlay/lib/python3.11/site-packages/setuptools_scm/_integration/pyproject_reading.py",
>>>>>  line 36, in read_pyproject
>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":       
>>>>>     section = defn.get("tool", {})[tool_name]
>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":       
>>>>>               ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^
>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":       
>>>>> KeyError: 'setuptools_scm'
>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":       
>>>>> running bdist_wheel
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> It is somehow getting messed up with a toml ?
>>>>>
>>>>>
>>>>> Could anyone advise?
>>>>>
>>>>> thanks
>>>>>
>>>>>  Marco
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Jun 11, 2024 at 1:00 AM XQ Hu via user <[email protected]>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> https://github.com/GoogleCloudPlatform/python-docs-samples/tree/main/dataflow/flex-templates/pipeline_with_dependencies
>>>>>> is a great example.
>>>>>>
>>>>>> On Mon, Jun 10, 2024 at 4:28 PM Valentyn Tymofieiev via user <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> In this case the Python version will be defined by the Python
>>>>>>> version installed in the docker image of your flex template. So, you'd
>>>>>>> have to build your flex template from a base image with Python 3.11.
>>>>>>>
>>>>>>> On Mon, Jun 10, 2024 at 12:50 PM Sofia’s World <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hello
>>>>>>>>  no i am running my pipelien on  GCP directly via a flex template,
>>>>>>>> configured using a Docker file
>>>>>>>> Any chances to do something in the Dockerfile to force the version
>>>>>>>> at runtime?
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>> On Mon, Jun 10, 2024 at 7:24 PM Anand Inguva via user <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> Are you running your pipeline from the python 3.11 environment?
>>>>>>>>> If you are running from a python 3.11 environment and don't use a 
>>>>>>>>> custom
>>>>>>>>> docker container image, DataflowRunner(Assuming Apache Beam on GCP 
>>>>>>>>> means
>>>>>>>>> Apache Beam on DataflowRunner), will use Python 3.11.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Anand
>>>>>>>>>
>>>>>>>>

Reply via email to