Re: [E] COMMERCIAL BULK: Re: TensorFlow on Spark

2022-02-24 Thread Gourav Sengupta
Dear Sean, I do agree with you to a certain extent, makes sense. Perhaps I am wrong in asking for native integrations and not depending on over engineered external solutions which have their own performance issues, and bottlenecks in live production environment. But asking and stating ones opinion

Re: [E] COMMERCIAL BULK: Re: TensorFlow on Spark

2022-02-24 Thread Gourav Sengupta
Hi Bitfox, yes distributed training using Pytorch and Tensorflow is really superb and great and you are spot on. There is actually absolutely no need for solutions like Ray/ Petastorm etc... But in case I want to pre process data in SPARK and push the results to these deep learning libraries, the

Re: [E] COMMERCIAL BULK: Re: TensorFlow on Spark

2022-02-24 Thread Sean Owen
On the contrary, distributed deep learning is not data parallel. It's dominated by the need to share parameters across workers. Gourav, I don't understand what you're looking for. Have you looked at Petastorm and Horovod? they _use Spark_, not another platform like Ray. Why recreate this which has

Re: [E] COMMERCIAL BULK: Re: TensorFlow on Spark

2022-02-24 Thread Bitfox
I have been using tensorflow for a long time, it's not hard to implement a distributed training job at all, either by model parallelization or data parallelization. I don't think there is much need to develop spark to support tensorflow jobs. Just my thoughts... On Thu, Feb 24, 2022 at 4:36 PM Go

Re: [E] COMMERCIAL BULK: Re: TensorFlow on Spark

2022-02-24 Thread Gourav Sengupta
Hi, I do not think that there is any reason for using over engineered platforms like Petastorm and Ray, except for certain use cases. What Ray is doing, except for certain use cases, could have been easily done by SPARK, I think, had the open source community got that steer. But maybe I am wrong

Re: [E] COMMERCIAL BULK: Re: TensorFlow on Spark

2022-02-23 Thread Dennis Suhari
Currently we are trying AnalyticsZoo and Ray Von meinem iPhone gesendet > Am 23.02.2022 um 04:53 schrieb Bitfox : > >  > tensorflow itself can implement the distributed computing via a parameter > server. Why did you want spark here? > > regards. > >> On Wed, Feb 23, 2022 at 11:27 AM Vijaya

Re: [E] COMMERCIAL BULK: Re: TensorFlow on Spark

2022-02-23 Thread Sean Owen
Petastorm does that https://github.com/uber/petastorm in the sense that it feeds Spark DFs to those frameworks in distributed training. I'm not sure what you mean by native integration that is different? these tools do just what you are talking about and have for a while. On Wed, Feb 23, 2022 at 7

Re: [E] COMMERCIAL BULK: Re: TensorFlow on Spark

2022-02-23 Thread Gourav Sengupta
Hi, I am sure those who have actually built a data processing pipeline whose contents have to be then delivered to tensorflow or pytorch (not for POC, or writing a blog to get clicks, or resolving symptomatic bugs, but in real life end-to-end application), will perhaps understand some of the issu

Re: [E] COMMERCIAL BULK: Re: TensorFlow on Spark

2022-02-23 Thread Sean Owen
Spark does do distributed ML, but not Tensorflow. Barrier execution mode is an element that things like Horovod uses. Not sure what you are getting at? Ray is not Spark. As I say -- Horovod does this already. The upside over TF distributed is that Spark sets up and manages the daemon processes rath

Re: [E] COMMERCIAL BULK: Re: TensorFlow on Spark

2022-02-23 Thread Gourav Sengupta
Hi, the SPARK community should have been able to build distributed ML capabilities, and as far as I remember that was the idea initially behind SPARK 3.x roadmap (barrier execution mode, https://issues.apache.org/jira/browse/SPARK-24579). Ray, another Berkeley Labs output like SPARK, is trying to

Re: [E] COMMERCIAL BULK: Re: TensorFlow on Spark

2022-02-22 Thread Bitfox
tensorflow itself can implement the distributed computing via a parameter server. Why did you want spark here? regards. On Wed, Feb 23, 2022 at 11:27 AM Vijayant Kumar wrote: > Thanks Sean for your response. !! > > > > Want to add some more background here. > > > > I am using Spark3.0+ version

Re: [E] COMMERCIAL BULK: Re: TensorFlow on Spark

2022-02-22 Thread Sean Owen
Dependencies? Sure like any python library. What are you asking about there? I don't know of a modern alternative on Spark. Did you read the docs or search? Plenty of examples On Tue, Feb 22, 2022, 9:27 PM Vijayant Kumar wrote: > Thanks Sean for your response. !! > > > > Want to add some more

RE: [E] COMMERCIAL BULK: Re: TensorFlow on Spark

2022-02-22 Thread Vijayant Kumar
Thanks Sean for your response. !! Want to add some more background here. I am using Spark3.0+ version with Tensorflow 2.0+. My use case is not for the image data but for the Time-series data where I am using LSTM and transformers to forecast. I evaluated SparkFlow and spark_tensorflow_distribut