Re: Installing non-native Python dependencies in Dataflow

2017-06-08 Thread Dmitry Demeshchuk
Thanks for all your help, Ahmet! Comments inline. On Thu, Jun 8, 2017 at 6:32 PM, Ahmet Altay wrote: > Thank you for the update, some questions inline. > > On Thu, Jun 8, 2017 at 6:21 PM, Dmitry Demeshchuk > wrote: > >> FYI, I tried to install a psycopg2 wheel from a file using the >> "extra_p

Re: Installing non-native Python dependencies in Dataflow

2017-06-08 Thread Ahmet Altay
Thank you for the update, some questions inline. On Thu, Jun 8, 2017 at 6:21 PM, Dmitry Demeshchuk wrote: > FYI, I tried to install a psycopg2 wheel from a file using the > "extra_packages" argument (although, wheels installation is apparently > still an experimental feature), but this led to a

Re: Installing non-native Python dependencies in Dataflow

2017-06-08 Thread Dmitry Demeshchuk
FYI, I tried to install a psycopg2 wheel from a file using the "extra_packages" argument (although, wheels installation is apparently still an experimental feature), but this led to a problem with ECS-2 vs ECS-4 compatibility issues (looks like the Dataflow version of Python is using ECS-2, while w

Re: Installing non-native Python dependencies in Dataflow

2017-06-06 Thread Dmitry Demeshchuk
Yeah, I wasn't really pinning it myself, it's one of the dependency packages that depends on that specific version. Thanks for the information, I'll try to explicitly install 33.1.1 and see if it changes anything. On Tue, Jun 6, 2017 at 7:13 PM, Ahmet Altay wrote: > Pinning setuptools is genera

Re: Installing non-native Python dependencies in Dataflow

2017-06-06 Thread Ahmet Altay
Pinning setuptools is generally not a good practice. The reason is at installation time it might cause removal of the the setuptools that is being used to install packages. FWIW, dataflow workers should have setuptools 33.1.1, which was released in 2017/01/16. Ahmet On Tue, Jun 6, 2017 at 6:53 P

Re: Installing non-native Python dependencies in Dataflow

2017-06-06 Thread Dmitry Demeshchuk
Thanks, Ahmet, it really turned out that Stackdriver had more logs than just the Dataflow logs section. So, I ended up seeing this code that fails constantly: IRunning setup.py install for dataflow: started I Running setup.py install for dataflow: finished with status 'error' I Comp

Re: Installing non-native Python dependencies in Dataflow

2017-06-06 Thread Ahmet Altay
On Tue, Jun 6, 2017 at 2:07 PM, Dmitry Demeshchuk wrote: > Hi Ahmet, > > Thanks a lot for pointing out that doc, I somehow missed it from the > official Python SDK page! > > One thing that comes to my mind is that generally one should probably use > the 'install' command in setuptools, not 'build

Re: Installing non-native Python dependencies in Dataflow

2017-06-06 Thread Dmitry Demeshchuk
Hi Ahmet, Thanks a lot for pointing out that doc, I somehow missed it from the official Python SDK page! One thing that comes to my mind is that generally one should probably use the 'install' command in setuptools, not 'build', like it's done in https://github.com/apache/beam/blob/master/sdks/py

Re: Installing non-native Python dependencies in Dataflow

2017-06-06 Thread Ahmet Altay
Hi, Please see Managing Python Pipeline Dependencies [1] for various ways on installing additional dependencies. The section on non-python dependencies is relevant to your question. Thank you, Ahmet [1] https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/ On Mon, Jun 5, 2017

Re: Installing non-native Python dependencies in Dataflow

2017-06-05 Thread Morand, Sebastien
Hi, Interested too. Could be fine for instance to add sftp BoundedSource, but compilalation of paramiko with ssl library (and so installation of ssl-dev) Regards, *Sébastien MORAND* Team Lead Solution Architect Technology & Operations / Digital Factory Veolia - Group Information Systems & Techno

Installing non-native Python dependencies in Dataflow

2017-06-05 Thread Dmitry Demeshchuk
Hi again, folks, How should I go about installing Python packages that require to be built and/or require native dependencies like shared libraries or such? I guess, I could potentially build the C-based modules using the same version of kernel and glibc that Dataflow is running, but doesn't seem