Hi!

I know I'm late, but just to point some highlights of our usecase. We
currently:


   - Use Spark as an ETL tool, followed by
   - a Python (numpy/pandas based) pipeline to preprocess information and
   - use Tensorflow for training our Neural Networks


What we'd love to, and why we don't:


   - Start using Spark for our full preprocessing pipeline. Because type
   safety. And distributed computation. And catalyst. Buy mainly because
   *not-python.*
   Our main issue:
      - We want to use the same code for online serving. We're not willing
      to duplicate the preprocessing operations. Spark is not
      *serving-friendly*.
      - If we want it to preprocess online, we need to copy/paste our
      custom transformations to MLeap.
      - It's an issue to communicate with a Tensorflow API to give it the
      preprocessed data to serve.
   - Use Spark to do hyperparameter tunning.
   We'd need:
      - GPU Integration with Spark, letting us achieve finer tuning.
      - Better TensorFlow integration


Would love to know other usecases, and if others relate to the same issues
than us.

El mié., 6 jun. 2018 a las 21:10, Holden Karau (<hol...@pigscanfly.ca>)
escribió:

> At Spark Summit some folks were talking about model serving and we wanted
> to collect requirements from the community.
> --
> Twitter: https://twitter.com/holdenkarau
>

Reply via email to