It sounds like you may be interested in a solution that implements the Lambda
Architecture <https://en.wikipedia.org/wiki/Lambda_architecture>, such as
Oryx2 <http://oryx.io/>. At a high level, this gives you the ability to
request and receive information immediately (serving layer), generating the
responses using a pre-built model (speed layer). Meanwhile, that model is
constantly being updated in the background as new information becomes
available (batch layer).

An example of a pre-built model in this scenario may be a predictive model
that want to predict the class of an incoming piece of data (i.e. does this
email look like SPAM or not).

On Fri, Jan 15, 2016 at 5:00 PM, feribg <fer...@gmail.com> wrote:

> I'm fairly new to Spark and Mllib, but i'm doing some research into multi
> tenancy of mllib based app. The idea is to provide ability to train models
> on demand with certain constraints (executor size) and then allow to serve
> predictions from those models via a REST layer.
>
> So far from my research I've gathered the following:
>
> 1) It's fairly easy to schedule training jobs and define the size of the
> executor of the job with something like spark job server or via cmd. I'd
> imagine you need separate contexts here anyways, because if theres one big
> context shared amongst different tenants, it wont allow training different
> models in parallel for the most part. So the solution here seems a context
> per tenant and training via Spark Job Server.
>
> 2) Second part seems a bit more tricky as it must expose the results of the
> trained models to the outside world via some form of API. So far I've been
> able to create a new context inside of a simple Spring REST application,
> load the persisted model and be able to call predict and return results.
>
> My main problem with this approach is that now I need to load the whole
> spark context for each single model instance and a single tenant can
> potentially have a bunch, which also means at least a JVM per tenant and
> this is quite wasteful. It seems the actual prediction part is fairly
> simple
> and I was wondering if there was a way to share multiple models to predict
> from on the same context. Would that allow parallel predictions (ie model B
> doesnt have to wait for a prediction of model A to complete in order to
> return).
>
> Given this simple scenario do you see a better approach to architect that,
> maybe I'm missing certain features of Spark that would facilitate it in a
> cleaner and more efficient manner.
>
> Thanks!
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Multi-tenancy-REST-and-MLlib-tp25979.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to