Re: User based recommender

Pat Ferrel Sat, 29 Nov 2014 13:40:18 -0800

The Mahout site is a good starting point for using any of the recommenders.


http://mahout.apache.org/users/recommender/intro-itembased-hadoop.html

On Nov 29, 2014, at 1:33 PM, Yash Patel <[email protected]> wrote:

Can you give me some more details on the Hadoop mapreduce item-based
cooccurrence recommender.


Best Regards,
Yash Patel

On Fri, Nov 28, 2014 at 7:21 PM, Pat Ferrel <[email protected]> wrote:

> I built this app with it: https://guide.finderbots.com
> 
> The app uses MongoDB, Ruby on Rails, and Solr 4.3. Once the model comes
> out of the job it is csv text—therefore language and architecture neutral.
> I load the data from spark-itemsimilarity into MongoDB using java. Solr is
> set up for full-text indexing and queries using data from MongoDB. The
> queries are made to Solr through REST from Ruby UX code. You can replace
> any component in this stack with whatever you wish and use whatever
> language you are comfortable with.
> 
> Alternatively you could modify the UI of Solr or Elasticsearch—both are in
> Java.
> 
> If you use any of the other Mahout recommenders they create all recs for
> all known users so you’ll still need to build a way to serve those results.
> People often use DBs for this and integrate with their web app framework.
> 
> On Nov 28, 2014, at 10:03 AM, Yash Patel <[email protected]> wrote:
> 
> I looked up spark row similarity but i am not sure if it will suit my needs
> as i want to build my recommender as a java application possibly with an
> interface.
> 
> 
> On Fri, Nov 28, 2014 at 5:43 PM, Pat Ferrel <[email protected]> wrote:
> 
>> Some references:
>> 
>> small free book here, which talks about the general idea:
>> https://www.mapr.com/practical-machine-learning
>> preso, which talks about mixing actions or other indicators:
>> 
> http://occamsmachete.com/ml/2014/10/07/creating-a-unified-recommender-with-mahout-and-a-search-engine/
>> two blog posts:
>> 
> http://occamsmachete.com/ml/2014/08/11/mahout-on-spark-whats-new-in-recommenders/
>> 
> http://occamsmachete.com/ml/2014/09/09/mahout-on-spark-whats-new-in-recommenders-part-2/
>> mahout docs:
>> http://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html
>> 
>> Build Mahout from this source: https://github.com/apache/mahout This
> will
>> run stand-alone on a dev machine, then if your data is too big for a
> single
>> machine you can run it on a Spark + Hadoop cluster. The data this creates
>> can be put into a DB or indexed directly by a search engine (Solr or
>> Elasticsearch). Choose the search engine you want then queries of a
> user’s
>> item id history will go there--results will be an ordered list of item
> ids
>> to recommend.
>> 
>> The core piece is the command line job: “mahout spark-itemsimilarity”,
>> which can parse csv data. The options specify what columns are used for
> ids.
>> 
>> Start out simple by looking only at user and item IDs. Then you can add
>> other cross-cooccurrence indicators for multiple actions later pretty
>> easily.
>> 
>> 
>> On Nov 28, 2014, at 12:14 AM, Yash Patel <[email protected]>
> wrote:
>> 
>> The mahout + search engine recommender seems what would be best for the
>> data i have.
>> 
>> Kindly get back to me at your earliest convenience.
>> 
>> 
>> 
>> Best Regards,
>> Yash Patel
>> 
>> On Thu, Nov 27, 2014 at 9:58 PM, Pat Ferrel <[email protected]>
> wrote:
>> 
>>> Mahout has several recommenders so no need to create one from
> components.
>>> They all make use of the similarity of preferences between users—that’s
>> why
>>> they are in the category of collaborative filtering.
>>> 
>>> Primary Mahout Recommenders:
>>> 1) Hadoop mapreduce item-based cooccurrence recommender. Creates all
> recs
>>> for all users. Uses “Mahout IDs"
>>> 2) ALS-WR hadoop mapreduce, uses matrix factorization to reduce noise in
>>> the data. Sometimes better for small data sets than #1. Uses “Mahout
> IDs"
>>> 3) Mahout + search engine: cooccurrence type. Extremely flexible, works
>>> with multiple actions (multi-modal), works for new users that have some
>>> history, has a scalable server (from the search engine) but is more
>>> difficult to integrate than #1 or #2. Uses your own ids and reads csv
>> files.
>>> 
>>> The rest of the data seems to apply either to the user or the item and
> so
>>> would be used in different ways. #1 an #2 can only use user id and item
>> id
>>> but some post recommendation weighting or filtering can be applied. #3
>> can
>>> use multiple attributes in different ways. For instance if category is
> an
>>> item attribute you can create two actions, user-pref-for-an-item, and
>>> user-pref-for-a-category. Assuming you want to recommend an item (not
>>> category) you can create a cross-ccoccurrence indicator for the second
>>> action and use the data to make your item recs better. #3 is the only
>>> methods that supports this.
>>> 
>>> Pick a recommender and we can help more with data prep.
>>> 
>>> 
>>> On Nov 26, 2014, at 1:34 PM, Yash Patel <[email protected]>
> wrote:
>>> 
>>> Hello everyone,
>>> 
>>> wow i am quite happy to see so many inputs from people.
>>> 
>>> I apologize for not providing more details.
>>> 
>>> Although this is not my complete dataset the fields i have chosen to use
>>> are:
>>> 
>>> customer id - numeric
>>> item id - text
>>> postal code - text
>>> item category ´- text
>>> potential growth - text
>>> territory - text
>>> 
>>> 
>>> Basically i was thinking of finding similar users and recommending them
>>> items that users like them have bought but they haven't.
>>> 
>>> Although i would very much like to hear your opinions as i am not so
>>> familiar with clustering,classifiers etc.
>>> 
>>> I found that mahout takes sequence files converted into vectors but i
>>> couldn't understand how would i do it on my data specifically and more
>>> importantly make a recommender system out of it.
>>> 
>>> Also i am wondering how to combine the importance of a specific customer
>>> through the potential growth attribute.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> Best Regards,
>>> Yash Patel
>>> 
>>> On Wed, Nov 26, 2014 at 9:03 PM, Pat Ferrel <[email protected]>
>> wrote:
>>> 
>>>> All very good points but note that spark-itemsimilarity may take the
>>> input
>>>> directly since you specify column numbers for <UID><ITEMID><PREF_VALUE>
>>>> 
>>>> On Nov 26, 2014, at 11:43 AM, parnab kumar <[email protected]>
>>> wrote:
>>>> 
>>>> kindly elaborate... your requirements... your dataset fields ...and
> what
>>>> you want to recommend to an user... Usually a set of item is
> recommended
>>> to
>>>> an user. In your case what are your items ?
>>>> 
>>>> The standard input is <UID><ITEMID><PREF_VALUE> . Clearly your data is
>>> not
>>>> in this format which will let you use directly the algorithms in
> Mahout.
>>>> 
>>>> A little more info from your side will help us to give your the right
>>>> pointers.
>>>> 
>>>> On Wed, Nov 26, 2014 at 7:16 PM, Yash Patel <[email protected]>
>>>> wrote:
>>>> 
>>>>> Dear Mahout Team,
>>>>> 
>>>>> I am a student new to machine learning and i am trying to build a user
>>>>> based recommender using mahout.
>>>>> 
>>>>> My dataset is a csv file as an input but it has many fields as text
> and
>>> i
>>>>> understand mahout needs numeric values.
>>>>> 
>>>>> Can you give me a headstart as to where i should start and what kind
> of
>>>>> tools i need to parse the text colummns,
>>>>> 
>>>>> Also an idea on which classifiers or clustering methods i should use
>>>> would
>>>>> be highly appreciated.
>>>>> 
>>>>> 
>>>>> Best Regards;
>>>>> Yash Patel
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
> 
>

Re: User based recommender

Reply via email to