Model sizes are 10m x rank, 100k x rank range.

For recommendation/topic modeling I can run batch recommendAll and then
keep serving the model using a distributed cache but then I can't
incorporate per user model re-predict if user feedback is making the
current topk stale. I have to wait for next batch refresh which might be 1
hr away.

spark job server + spark sql can get me fresh updates but each time running
a predict might be slow.

I am guessing the better idea might be to start with batch recommendAll and
then update the per user model if it get stale but that needs acess to the
key value store and the model over a API like spark job server. I am
running experiments with job server. In general it will be nice if my key
value store and model are both managed by same akka based API.

Yes sparksql is to filter/boost recommendation results using business logic
like user demography for example..
On Jun 23, 2015 2:07 AM, "Sean Owen" <so...@cloudera.com> wrote:

> Yes, and typically needs are <100ms. Now imagine even 10 concurrent
> requests. My experience has been that this approach won't nearly
> scale. The best you could probably do is async mini-batch
> near-real-time scoring, pushing results to some store for retrieval,
> which could be entirely suitable for your use case.
>
> On Tue, Jun 23, 2015 at 8:52 AM, Nick Pentreath
> <nick.pentre...@gmail.com> wrote:
> > If your recommendation needs are real-time (<1s) I am not sure job server
> > and computing the refs with spark will do the trick (though those new
> > BLAS-based methods may have given sufficient speed up).
>

Reply via email to