If someone is trying to actually use deep learning algorithms, their focus
should be in choosing the technology stack which gives them maximum
flexibility to try the nuances of their algorithms.

>From a personal perspective, I always prefer to use libraries which
provides the best flexibility and extensibility in terms of the science/
mathematics of the subjects. For example try to open a book on Linear
Regression and then try to see whether all the mathematical formulations
are available in the SPARK module for regression or not.

It is always better to choose a technology that fits into the nuances and
perfection of the science, rather than choose a technology and then try to
fit the science into it.

Regards,
Gourav

On Sun, May 5, 2019 at 2:23 PM Jason Dai <jason....@gmail.com> wrote:

> You may find talks from Analytics Zoo users at
> https://analytics-zoo.github.io/master/#presentations/; in particular,
> some of recent user examples on Analytics Zoo:
>
>    - Mastercard:
>    
> https://software.intel.com/en-us/articles/deep-learning-with-analytic-zoo-optimizes-mastercard-recommender-ai-service
>
>    - Azure:
>    
> https://software.intel.com/en-us/articles/use-analytics-zoo-to-inject-ai-into-customer-service-platforms-on-microsoft-azure-part-1
>    - CERN:
>    
> https://db-blog.web.cern.ch/blog/luca-canali/machine-learning-pipelines-high-energy-physics-using-apache-spark-bigdl
>    - Midea/KUKA:
>    
> https://software.intel.com/en-us/articles/industrial-inspection-platform-in-midea-and-kuka-using-distributed-tensorflow-on-analytics
>    - Talroo:
>    
> https://software.intel.com/en-us/articles/talroo-uses-analytics-zoo-and-aws-to-leverage-deep-learning-for-job-recommendation
>    
> <https://software.intel.com/en-us/articles/talroo-uses-analytics-zoo-and-aws-to-leverage-deep-learning-for-job-recommendations>
>
> Thanks,
> -Jason
>
> On Sun, May 5, 2019 at 6:29 AM Riccardo Ferrari <ferra...@gmail.com>
> wrote:
>
>> Thank you for your answers!
>>
>> While it is clear each DL framework can solve the distributed model
>> training on their own (some better than others).  Still I see a lot of
>> value of having Spark on the ETL/pre-processing part, thus the origin of my
>> question.
>> I am trying to avoid to mange multiple stacks/workflows and hoping to
>> unify my system. Projects like TensorflowOnSpark or Analytics-Zoo (to name
>> couple) feels like they can help, still I really appreciate your comments
>> and anyone that could add some value to this discussion. Does anyone have
>> experience with them?
>>
>> Thanks
>>
>> On Sat, May 4, 2019 at 8:01 PM Pat Ferrel <p...@occamsmachete.com> wrote:
>>
>>> @Riccardo
>>>
>>> Spark does not do the DL learning part of the pipeline (afaik) so it is
>>> limited to data ingestion and transforms (ETL). It therefore is optional
>>> and other ETL options might be better for you.
>>>
>>> Most of the technologies @Gourav mentions have their own scaling based
>>> on their own compute engines specialized for their DL implementations, so
>>> be aware that Spark scaling has nothing to do with scaling most of the DL
>>> engines, they have their own solutions.
>>>
>>> From: Gourav Sengupta <gourav.sengu...@gmail.com>
>>> <gourav.sengu...@gmail.com>
>>> Reply: Gourav Sengupta <gourav.sengu...@gmail.com>
>>> <gourav.sengu...@gmail.com>
>>> Date: May 4, 2019 at 10:24:29 AM
>>> To: Riccardo Ferrari <ferra...@gmail.com> <ferra...@gmail.com>
>>> Cc: User <user@spark.apache.org> <user@spark.apache.org>
>>> Subject:  Re: Deep Learning with Spark, what is your experience?
>>>
>>> Try using MxNet and Horovod directly as well (I think that MXNet is
>>> worth a try as well):
>>> 1.
>>> https://medium.com/apache-mxnet/distributed-training-using-apache-mxnet-with-horovod-44f98bf0e7b7
>>> 2.
>>> https://docs.nvidia.com/deeplearning/dgx/mxnet-release-notes/rel_19-01.html
>>> 3. https://aws.amazon.com/mxnet/
>>> 4.
>>> https://aws.amazon.com/blogs/machine-learning/aws-deep-learning-amis-now-include-horovod-for-faster-multi-gpu-tensorflow-training-on-amazon-ec2-p3-instances/
>>>
>>>
>>> Ofcourse Tensorflow is backed by Google's advertisement team as well
>>> https://aws.amazon.com/blogs/machine-learning/scalable-multi-node-training-with-tensorflow/
>>>
>>>
>>> Regards,
>>>
>>>
>>>
>>>
>>> On Sat, May 4, 2019 at 10:59 AM Riccardo Ferrari <ferra...@gmail.com>
>>> wrote:
>>>
>>>> Hi list,
>>>>
>>>> I am trying to undestand if ti make sense to leverage on Spark as
>>>> enabling platform for Deep Learning.
>>>>
>>>> My open question to you are:
>>>>
>>>>    - Do you use Apache Spark in you DL pipelines?
>>>>    - How do you use Spark for DL? Is it just a stand-alone stage in
>>>>    the workflow (ie data preparation script) or is it  more integrated
>>>>
>>>> I see a major advantage in leveraging on Spark as a unified entrypoint,
>>>> for example you can easily abstract data sources and leverage on existing
>>>> team skills for data pre-processing and training. On the flip side you may
>>>> hit some limitations including supported versions and so on.
>>>> What is your experience?
>>>>
>>>> Thanks!
>>>>
>>>

Reply via email to