If someone is trying to actually use deep learning algorithms, their focus should be in choosing the technology stack which gives them maximum flexibility to try the nuances of their algorithms.
>From a personal perspective, I always prefer to use libraries which provides the best flexibility and extensibility in terms of the science/ mathematics of the subjects. For example try to open a book on Linear Regression and then try to see whether all the mathematical formulations are available in the SPARK module for regression or not. It is always better to choose a technology that fits into the nuances and perfection of the science, rather than choose a technology and then try to fit the science into it. Regards, Gourav On Sun, May 5, 2019 at 2:23 PM Jason Dai <jason....@gmail.com> wrote: > You may find talks from Analytics Zoo users at > https://analytics-zoo.github.io/master/#presentations/; in particular, > some of recent user examples on Analytics Zoo: > > - Mastercard: > > https://software.intel.com/en-us/articles/deep-learning-with-analytic-zoo-optimizes-mastercard-recommender-ai-service > > - Azure: > > https://software.intel.com/en-us/articles/use-analytics-zoo-to-inject-ai-into-customer-service-platforms-on-microsoft-azure-part-1 > - CERN: > > https://db-blog.web.cern.ch/blog/luca-canali/machine-learning-pipelines-high-energy-physics-using-apache-spark-bigdl > - Midea/KUKA: > > https://software.intel.com/en-us/articles/industrial-inspection-platform-in-midea-and-kuka-using-distributed-tensorflow-on-analytics > - Talroo: > > https://software.intel.com/en-us/articles/talroo-uses-analytics-zoo-and-aws-to-leverage-deep-learning-for-job-recommendation > > <https://software.intel.com/en-us/articles/talroo-uses-analytics-zoo-and-aws-to-leverage-deep-learning-for-job-recommendations> > > Thanks, > -Jason > > On Sun, May 5, 2019 at 6:29 AM Riccardo Ferrari <ferra...@gmail.com> > wrote: > >> Thank you for your answers! >> >> While it is clear each DL framework can solve the distributed model >> training on their own (some better than others). Still I see a lot of >> value of having Spark on the ETL/pre-processing part, thus the origin of my >> question. >> I am trying to avoid to mange multiple stacks/workflows and hoping to >> unify my system. Projects like TensorflowOnSpark or Analytics-Zoo (to name >> couple) feels like they can help, still I really appreciate your comments >> and anyone that could add some value to this discussion. Does anyone have >> experience with them? >> >> Thanks >> >> On Sat, May 4, 2019 at 8:01 PM Pat Ferrel <p...@occamsmachete.com> wrote: >> >>> @Riccardo >>> >>> Spark does not do the DL learning part of the pipeline (afaik) so it is >>> limited to data ingestion and transforms (ETL). It therefore is optional >>> and other ETL options might be better for you. >>> >>> Most of the technologies @Gourav mentions have their own scaling based >>> on their own compute engines specialized for their DL implementations, so >>> be aware that Spark scaling has nothing to do with scaling most of the DL >>> engines, they have their own solutions. >>> >>> From: Gourav Sengupta <gourav.sengu...@gmail.com> >>> <gourav.sengu...@gmail.com> >>> Reply: Gourav Sengupta <gourav.sengu...@gmail.com> >>> <gourav.sengu...@gmail.com> >>> Date: May 4, 2019 at 10:24:29 AM >>> To: Riccardo Ferrari <ferra...@gmail.com> <ferra...@gmail.com> >>> Cc: User <user@spark.apache.org> <user@spark.apache.org> >>> Subject: Re: Deep Learning with Spark, what is your experience? >>> >>> Try using MxNet and Horovod directly as well (I think that MXNet is >>> worth a try as well): >>> 1. >>> https://medium.com/apache-mxnet/distributed-training-using-apache-mxnet-with-horovod-44f98bf0e7b7 >>> 2. >>> https://docs.nvidia.com/deeplearning/dgx/mxnet-release-notes/rel_19-01.html >>> 3. https://aws.amazon.com/mxnet/ >>> 4. >>> https://aws.amazon.com/blogs/machine-learning/aws-deep-learning-amis-now-include-horovod-for-faster-multi-gpu-tensorflow-training-on-amazon-ec2-p3-instances/ >>> >>> >>> Ofcourse Tensorflow is backed by Google's advertisement team as well >>> https://aws.amazon.com/blogs/machine-learning/scalable-multi-node-training-with-tensorflow/ >>> >>> >>> Regards, >>> >>> >>> >>> >>> On Sat, May 4, 2019 at 10:59 AM Riccardo Ferrari <ferra...@gmail.com> >>> wrote: >>> >>>> Hi list, >>>> >>>> I am trying to undestand if ti make sense to leverage on Spark as >>>> enabling platform for Deep Learning. >>>> >>>> My open question to you are: >>>> >>>> - Do you use Apache Spark in you DL pipelines? >>>> - How do you use Spark for DL? Is it just a stand-alone stage in >>>> the workflow (ie data preparation script) or is it more integrated >>>> >>>> I see a major advantage in leveraging on Spark as a unified entrypoint, >>>> for example you can easily abstract data sources and leverage on existing >>>> team skills for data pre-processing and training. On the flip side you may >>>> hit some limitations including supported versions and so on. >>>> What is your experience? >>>> >>>> Thanks! >>>> >>>