Re: Software stack for Recommendation engine with spark mlib

Nick Pentreath Sun, 15 Mar 2015 10:12:06 -0700

As Sean says, precomputing recommendations is pretty inefficient. Though with 
500k items its easy to get all the item vectors in memory so pre-computing is 
not too bad.





Still, since you plan to serve these via a REST service anyway, computing on 
demand via a serving layer such as Oryx or PredictionIO (or the newly open 
sourced Seldon.io) is a good option. You can also cache the recommendations 
quite aggressively - once you compute a user or item top-K list, just stick the 
result in mem cache / redis / whatever and evict it when you recompute your 
offline model, or every hour or whatever.






—
Sent from Mailbox

On Sun, Mar 15, 2015 at 3:03 PM, Shashidhar Rao
<[email protected]> wrote:

> Thanks Sean, your suggestions and the links provided are just what I needed
> to start off with.
> On Sun, Mar 15, 2015 at 6:16 PM, Sean Owen <[email protected]> wrote:
>> I think you're assuming that you will pre-compute recommendations and
>> store them in Mongo. That's one way to go, with certain tradeoffs. You
>> can precompute offline easily, and serve results at large scale
>> easily, but, you are forced to precompute everything -- lots of wasted
>> effort, not completely up to date.
>>
>> The front-end part of the stack looks right.
>>
>> Spark would do the model building; you'd have to write a process to
>> score recommendations and store the result. Mahout is the same thing,
>> really.
>>
>> 500K items isn't all that large. Your requirements aren't driven just
>> by items though. Number of users and latent features matter too. It
>> matters how often you want to build the model too. I'm guessing you
>> would get away with a handful of modern machines for a problem this
>> size.
>>
>>
>> In a way what you describe reminds me of Wibidata, since it built
>> recommender-like solutions on top of data and results published to a
>> NoSQL store. You might glance at the related OSS project Kiji
>> (http://kiji.org/) for ideas about how to manage the schema.
>>
>> You should have a look at things like Nick's architecture for
>> Graphflow, however it's more concerned with computing recommendation
>> on the fly, and describes a shift from an architecture originally
>> built around something like a NoSQL store:
>>
>> http://spark-summit.org/wp-content/uploads/2014/07/Using-Spark-and-Shark-to-Power-a-Realt-time-Recommendation-and-Customer-Intelligence-Platform-Nick-Pentreath.pdf
>>
>> This is also the kind of ground the oryx project is intended to cover,
>> something I've worked on personally:
>> https://github.com/OryxProject/oryx   -- a layer on and around the
>> core model building in Spark + Spark Streaming to provide a whole
>> recommender (for example), down to the REST API.
>>
>> On Sun, Mar 15, 2015 at 10:45 AM, Shashidhar Rao
>> <[email protected]> wrote:
>> > Hi,
>> >
>> > Can anyone who has developed recommendation engine suggest what could be
>> the
>> > possible software stack for such an application.
>> >
>> > I am basically new to recommendation engine , I just found out Mahout and
>> > Spark Mlib which are available .
>> > I am thinking the below software stack.
>> >
>> > 1. The user is going to use Android app.
>> > 2.  Rest Api sent to app server from the android app to get
>> recommendations.
>> > 3. Spark Mlib core engine for recommendation engine
>> > 4. MongoDB database backend.
>> >
>> > I would like to know more on the cluster configuration( how many nodes
>> etc)
>> > part of spark for calculating the recommendations for 500,000 items. This
>> > items include products for day care etc.
>> >
>> > Other software stack suggestions would also be very useful.It has to run
>> on
>> > multiple vendor machines.
>> >
>> > Please suggest.
>> >
>> > Thanks
>> > shashi
>>

Re: Software stack for Recommendation engine with spark mlib

Reply via email to