I think you're assuming that you will pre-compute recommendations and store them in Mongo. That's one way to go, with certain tradeoffs. You can precompute offline easily, and serve results at large scale easily, but, you are forced to precompute everything -- lots of wasted effort, not completely up to date.
The front-end part of the stack looks right. Spark would do the model building; you'd have to write a process to score recommendations and store the result. Mahout is the same thing, really. 500K items isn't all that large. Your requirements aren't driven just by items though. Number of users and latent features matter too. It matters how often you want to build the model too. I'm guessing you would get away with a handful of modern machines for a problem this size. In a way what you describe reminds me of Wibidata, since it built recommender-like solutions on top of data and results published to a NoSQL store. You might glance at the related OSS project Kiji (http://kiji.org/) for ideas about how to manage the schema. You should have a look at things like Nick's architecture for Graphflow, however it's more concerned with computing recommendation on the fly, and describes a shift from an architecture originally built around something like a NoSQL store: http://spark-summit.org/wp-content/uploads/2014/07/Using-Spark-and-Shark-to-Power-a-Realt-time-Recommendation-and-Customer-Intelligence-Platform-Nick-Pentreath.pdf This is also the kind of ground the oryx project is intended to cover, something I've worked on personally: https://github.com/OryxProject/oryx -- a layer on and around the core model building in Spark + Spark Streaming to provide a whole recommender (for example), down to the REST API. On Sun, Mar 15, 2015 at 10:45 AM, Shashidhar Rao <raoshashidhar...@gmail.com> wrote: > Hi, > > Can anyone who has developed recommendation engine suggest what could be the > possible software stack for such an application. > > I am basically new to recommendation engine , I just found out Mahout and > Spark Mlib which are available . > I am thinking the below software stack. > > 1. The user is going to use Android app. > 2. Rest Api sent to app server from the android app to get recommendations. > 3. Spark Mlib core engine for recommendation engine > 4. MongoDB database backend. > > I would like to know more on the cluster configuration( how many nodes etc) > part of spark for calculating the recommendations for 500,000 items. This > items include products for day care etc. > > Other software stack suggestions would also be very useful.It has to run on > multiple vendor machines. > > Please suggest. > > Thanks > shashi --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org