Model sizes are 10m x rank, 100k x rank range. For recommendation/topic modeling I can run batch recommendAll and then keep serving the model using a distributed cache but then I can't incorporate per user model re-predict if user feedback is making the current topk stale. I have to wait for next batch refresh which might be 1 hr away.
spark job server + spark sql can get me fresh updates but each time running a predict might be slow. I am guessing the better idea might be to start with batch recommendAll and then update the per user model if it get stale but that needs acess to the key value store and the model over a API like spark job server. I am running experiments with job server. In general it will be nice if my key value store and model are both managed by same akka based API. Yes sparksql is to filter/boost recommendation results using business logic like user demography for example.. On Jun 23, 2015 2:07 AM, "Sean Owen" <so...@cloudera.com> wrote: > Yes, and typically needs are <100ms. Now imagine even 10 concurrent > requests. My experience has been that this approach won't nearly > scale. The best you could probably do is async mini-batch > near-real-time scoring, pushing results to some store for retrieval, > which could be entirely suitable for your use case. > > On Tue, Jun 23, 2015 at 8:52 AM, Nick Pentreath > <nick.pentre...@gmail.com> wrote: > > If your recommendation needs are real-time (<1s) I am not sure job server > > and computing the refs with spark will do the trick (though those new > > BLAS-based methods may have given sufficient speed up). >