Re: Deep Learning with Spark, what is your experience?

Gourav Sengupta Sun, 05 May 2019 11:06:36 -0700

If someone is trying to actually use deep learning algorithms, their focus
should be in choosing the technology stack which gives them maximum
flexibility to try the nuances of their algorithms.


>From a personal perspective, I always prefer to use libraries which
provides the best flexibility and extensibility in terms of the science/
mathematics of the subjects. For example try to open a book on Linear
Regression and then try to see whether all the mathematical formulations
are available in the SPARK module for regression or not.

It is always better to choose a technology that fits into the nuances and
perfection of the science, rather than choose a technology and then try to
fit the science into it.

Regards,
Gourav

On Sun, May 5, 2019 at 2:23 PM Jason Dai <jason....@gmail.com> wrote:

> You may find talks from Analytics Zoo users at
> https://analytics-zoo.github.io/master/#presentations/; in particular,
> some of recent user examples on Analytics Zoo:
>
>    - Mastercard:
>    
> https://software.intel.com/en-us/articles/deep-learning-with-analytic-zoo-optimizes-mastercard-recommender-ai-service
>
>    - Azure:
>    
> https://software.intel.com/en-us/articles/use-analytics-zoo-to-inject-ai-into-customer-service-platforms-on-microsoft-azure-part-1
>    - CERN:
>    
> https://db-blog.web.cern.ch/blog/luca-canali/machine-learning-pipelines-high-energy-physics-using-apache-spark-bigdl
>    - Midea/KUKA:
>    
> https://software.intel.com/en-us/articles/industrial-inspection-platform-in-midea-and-kuka-using-distributed-tensorflow-on-analytics
>    - Talroo:
>    
> https://software.intel.com/en-us/articles/talroo-uses-analytics-zoo-and-aws-to-leverage-deep-learning-for-job-recommendation
>    
> <https://software.intel.com/en-us/articles/talroo-uses-analytics-zoo-and-aws-to-leverage-deep-learning-for-job-recommendations>
>
> Thanks,
> -Jason
>
> On Sun, May 5, 2019 at 6:29 AM Riccardo Ferrari <ferra...@gmail.com>
> wrote:
>
>> Thank you for your answers!
>>
>> While it is clear each DL framework can solve the distributed model
>> training on their own (some better than others).  Still I see a lot of
>> value of having Spark on the ETL/pre-processing part, thus the origin of my
>> question.
>> I am trying to avoid to mange multiple stacks/workflows and hoping to
>> unify my system. Projects like TensorflowOnSpark or Analytics-Zoo (to name
>> couple) feels like they can help, still I really appreciate your comments
>> and anyone that could add some value to this discussion. Does anyone have
>> experience with them?
>>
>> Thanks
>>
>> On Sat, May 4, 2019 at 8:01 PM Pat Ferrel <p...@occamsmachete.com> wrote:
>>
>>> @Riccardo
>>>
>>> Spark does not do the DL learning part of the pipeline (afaik) so it is
>>> limited to data ingestion and transforms (ETL). It therefore is optional
>>> and other ETL options might be better for you.
>>>
>>> Most of the technologies @Gourav mentions have their own scaling based
>>> on their own compute engines specialized for their DL implementations, so
>>> be aware that Spark scaling has nothing to do with scaling most of the DL
>>> engines, they have their own solutions.
>>>
>>> From: Gourav Sengupta <gourav.sengu...@gmail.com>
>>> <gourav.sengu...@gmail.com>
>>> Reply: Gourav Sengupta <gourav.sengu...@gmail.com>
>>> <gourav.sengu...@gmail.com>
>>> Date: May 4, 2019 at 10:24:29 AM
>>> To: Riccardo Ferrari <ferra...@gmail.com> <ferra...@gmail.com>
>>> Cc: User <user@spark.apache.org> <user@spark.apache.org>
>>> Subject:  Re: Deep Learning with Spark, what is your experience?
>>>
>>> Try using MxNet and Horovod directly as well (I think that MXNet is
>>> worth a try as well):
>>> 1.
>>> https://medium.com/apache-mxnet/distributed-training-using-apache-mxnet-with-horovod-44f98bf0e7b7
>>> 2.
>>> https://docs.nvidia.com/deeplearning/dgx/mxnet-release-notes/rel_19-01.html
>>> 3. https://aws.amazon.com/mxnet/
>>> 4.
>>> https://aws.amazon.com/blogs/machine-learning/aws-deep-learning-amis-now-include-horovod-for-faster-multi-gpu-tensorflow-training-on-amazon-ec2-p3-instances/
>>>
>>>
>>> Ofcourse Tensorflow is backed by Google's advertisement team as well
>>> https://aws.amazon.com/blogs/machine-learning/scalable-multi-node-training-with-tensorflow/
>>>
>>>
>>> Regards,
>>>
>>>
>>>
>>>
>>> On Sat, May 4, 2019 at 10:59 AM Riccardo Ferrari <ferra...@gmail.com>
>>> wrote:
>>>
>>>> Hi list,
>>>>
>>>> I am trying to undestand if ti make sense to leverage on Spark as
>>>> enabling platform for Deep Learning.
>>>>
>>>> My open question to you are:
>>>>
>>>>    - Do you use Apache Spark in you DL pipelines?
>>>>    - How do you use Spark for DL? Is it just a stand-alone stage in
>>>>    the workflow (ie data preparation script) or is it  more integrated
>>>>
>>>> I see a major advantage in leveraging on Spark as a unified entrypoint,
>>>> for example you can easily abstract data sources and leverage on existing
>>>> team skills for data pre-processing and training. On the flip side you may
>>>> hit some limitations including supported versions and so on.
>>>> What is your experience?
>>>>
>>>> Thanks!
>>>>
>>>

Re: Deep Learning with Spark, what is your experience?

Reply via email to