As Sean says, precomputing recommendations is pretty inefficient. Though with 500k items its easy to get all the item vectors in memory so pre-computing is not too bad.
Still, since you plan to serve these via a REST service anyway, computing on demand via a serving layer such as Oryx or PredictionIO (or the newly open sourced Seldon.io) is a good option. You can also cache the recommendations quite aggressively - once you compute a user or item top-K list, just stick the result in mem cache / redis / whatever and evict it when you recompute your offline model, or every hour or whatever. — Sent from Mailbox On Sun, Mar 15, 2015 at 3:03 PM, Shashidhar Rao <[email protected]> wrote: > Thanks Sean, your suggestions and the links provided are just what I needed > to start off with. > On Sun, Mar 15, 2015 at 6:16 PM, Sean Owen <[email protected]> wrote: >> I think you're assuming that you will pre-compute recommendations and >> store them in Mongo. That's one way to go, with certain tradeoffs. You >> can precompute offline easily, and serve results at large scale >> easily, but, you are forced to precompute everything -- lots of wasted >> effort, not completely up to date. >> >> The front-end part of the stack looks right. >> >> Spark would do the model building; you'd have to write a process to >> score recommendations and store the result. Mahout is the same thing, >> really. >> >> 500K items isn't all that large. Your requirements aren't driven just >> by items though. Number of users and latent features matter too. It >> matters how often you want to build the model too. I'm guessing you >> would get away with a handful of modern machines for a problem this >> size. >> >> >> In a way what you describe reminds me of Wibidata, since it built >> recommender-like solutions on top of data and results published to a >> NoSQL store. You might glance at the related OSS project Kiji >> (http://kiji.org/) for ideas about how to manage the schema. >> >> You should have a look at things like Nick's architecture for >> Graphflow, however it's more concerned with computing recommendation >> on the fly, and describes a shift from an architecture originally >> built around something like a NoSQL store: >> >> http://spark-summit.org/wp-content/uploads/2014/07/Using-Spark-and-Shark-to-Power-a-Realt-time-Recommendation-and-Customer-Intelligence-Platform-Nick-Pentreath.pdf >> >> This is also the kind of ground the oryx project is intended to cover, >> something I've worked on personally: >> https://github.com/OryxProject/oryx -- a layer on and around the >> core model building in Spark + Spark Streaming to provide a whole >> recommender (for example), down to the REST API. >> >> On Sun, Mar 15, 2015 at 10:45 AM, Shashidhar Rao >> <[email protected]> wrote: >> > Hi, >> > >> > Can anyone who has developed recommendation engine suggest what could be >> the >> > possible software stack for such an application. >> > >> > I am basically new to recommendation engine , I just found out Mahout and >> > Spark Mlib which are available . >> > I am thinking the below software stack. >> > >> > 1. The user is going to use Android app. >> > 2. Rest Api sent to app server from the android app to get >> recommendations. >> > 3. Spark Mlib core engine for recommendation engine >> > 4. MongoDB database backend. >> > >> > I would like to know more on the cluster configuration( how many nodes >> etc) >> > part of spark for calculating the recommendations for 500,000 items. This >> > items include products for day care etc. >> > >> > Other software stack suggestions would also be very useful.It has to run >> on >> > multiple vendor machines. >> > >> > Please suggest. >> > >> > Thanks >> > shashi >>
