Hi! I know I'm late, but just to point some highlights of our usecase. We currently:
- Use Spark as an ETL tool, followed by - a Python (numpy/pandas based) pipeline to preprocess information and - use Tensorflow for training our Neural Networks What we'd love to, and why we don't: - Start using Spark for our full preprocessing pipeline. Because type safety. And distributed computation. And catalyst. Buy mainly because *not-python.* Our main issue: - We want to use the same code for online serving. We're not willing to duplicate the preprocessing operations. Spark is not *serving-friendly*. - If we want it to preprocess online, we need to copy/paste our custom transformations to MLeap. - It's an issue to communicate with a Tensorflow API to give it the preprocessed data to serve. - Use Spark to do hyperparameter tunning. We'd need: - GPU Integration with Spark, letting us achieve finer tuning. - Better TensorFlow integration Would love to know other usecases, and if others relate to the same issues than us. El mié., 6 jun. 2018 a las 21:10, Holden Karau (<hol...@pigscanfly.ca>) escribió: > At Spark Summit some folks were talking about model serving and we wanted > to collect requirements from the community. > -- > Twitter: https://twitter.com/holdenkarau >