implementation of context-aware recommender in Mahout

2015-03-06 Thread Efi Koulouri
Hi all,

I am trying to implement an context-aware recommender in Mahout. As I
haven't use the library before I haven't a lot experience. So, I would
really appreciate your response!

What I want to do is to implement the two context- aware approaches that
have been proposed, pre-filtering and post-filtering. The former filters
out the dataset based on the value of contextual factor before the
collaborative filtering while the latter rescores the recommendations after
the collaborative filtering.

I have already read older similar questions regarding the context-aware
recommender implementation in mahout and I know that the post-filtering
method can be implemented using the IDRescorer. For the pre-filtering
approach there is the option to use the CandidateItemsStategy in case of
the item-based recommender. On the other hand if we want to implement this
approach using the user-bsed recommender no such option is available.

In order to implement the pre-filtering using the user-based recommender, I
was thinking to filter out the unrelated user,items pairs from the dataset
before the creation of the data model. This means that the data model will
take as input a subset of the initial dataset.
Does this approach sound correct? There are some concerns regarding the
evaluation of the recommender. Does it have any impact on this?

Thank you in advance!

Regards,
Efi


spark-itemsimilarity question: what's the difference between indicator-matrix and cross-indicator-matrix

2015-03-06 Thread Kevin Zhang
May I say indicator-matrix is for the main action for example purchase and the 
cross-indicator-matrix is for the secondary action?

Thanks a lot,
Kevin

Re: spark-itemsimilarity question: what's the difference between indicator-matrix and cross-indicator-matrix

2015-03-06 Thread Ted Dunning

The terms main and secondary are a bit confusing. 

The easiest definition is that cooccurrence analyzes the record of actions you 
want to recommend. Cross occurrence tries to transfer from one behavior to 
another. 

In practice, it has been common to conflate many behaviors into one precisely 
because cross occurrence analysis was not feasible. Now that it is available 
standard practice is moving toward retaining distinction where possible.  

Sent from my iPhone

> On Mar 6, 2015, at 11:08, Kevin Zhang  
> wrote:
> 
> May I say indicator-matrix is for the main action for example purchase and 
> the cross-indicator-matrix is for the secondary action?
> 
> Thanks a lot,
> Kevin


Re: spark-itemsimilarity question: what's the difference between indicator-matrix and cross-indicator-matrix

2015-03-06 Thread Pat Ferrel
Yes, you have it right. The user’s history of the primary acton (purchase) is 
used as a query against the indicator-matrix and the user’s history of the 
secondary action (detail-view for instance) is used against the 
“cross-indicator”

But the terminology is being changed to reflect what Ted is saying.
1) The new (current master) naming of the outputs are “similarity-matrix” and 
“cross-similarity-matrix”, which are LLR measured cooccurrence and 
cross-cooccurrence. A cross-indicator is not a thing really and is a confusing 
name.
2) The secondary actions may be many. The CLI job only supports 1 primary and 1 
secondary but you can run it in pairs with 1 primary and many secondaries. Also 
the internal code can calculate correlation between the action you want to 
recommend and many other actions. All of which create indicators which you 
query with different history.

On Mar 6, 2015, at 3:08 PM, Ted Dunning  wrote:


The terms main and secondary are a bit confusing. 

The easiest definition is that cooccurrence analyzes the record of actions you 
want to recommend. Cross occurrence tries to transfer from one behavior to 
another. 

In practice, it has been common to conflate many behaviors into one precisely 
because cross occurrence analysis was not feasible. Now that it is available 
standard practice is moving toward retaining distinction where possible.  

Sent from my iPhone

> On Mar 6, 2015, at 11:08, Kevin Zhang  
> wrote:
> 
> May I say indicator-matrix is for the main action for example purchase and 
> the cross-indicator-matrix is for the secondary action?
> 
> Thanks a lot,
> Kevin



Re: implementation of context-aware recommender in Mahout

2015-03-06 Thread Pat Ferrel
The new Spark based recommender can easily handle context in many forms. See 
the top references section here 
http://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html

It does not use the IDRescorer approach at all so perhaps you should describe 
what you want to use as context.

In the demo site for the new stuff (a guide to online video) 
https://guide.finderbots.com you’ll see a couple examples of “context”. For 
instance you are viewing a video that has several genre tags. You’ll see at 
least 3 lists of recommendations:
1) people who like the video you are looking at also like these other 
viedeos—non-personalized recs
2) people who like this video liked these, from similar genres
3) personalized recs from all genres based on your “liking” history

Many other things can be used as context like time of day, location, mobile or 
desktop, user profile attributes, etc. The way it does this is through the 
search engine, which can take filters and boost certain item attributes. So I 
could show only recommendations made in the same year as the viewed movie or 
use the year to bias recommendations by boosting the “release-date” field in 
the recommender query. The recommender is also multimodal and so can use many 
user actions to better the quality of recs.

Removing some of your data, in what you call pre-filtering may not get you what 
you want. Removing data that is actual user behavior can reduce the quality of 
recommendations so please give an example.

On Mar 6, 2015, at 4:45 AM, Efi Koulouri  wrote:

Hi all,

I am trying to implement an context-aware recommender in Mahout. As I
haven't use the library before I haven't a lot experience. So, I would
really appreciate your response!

What I want to do is to implement the two context- aware approaches that
have been proposed, pre-filtering and post-filtering. The former filters
out the dataset based on the value of contextual factor before the
collaborative filtering while the latter rescores the recommendations after
the collaborative filtering.

I have already read older similar questions regarding the context-aware
recommender implementation in mahout and I know that the post-filtering
method can be implemented using the IDRescorer. For the pre-filtering
approach there is the option to use the CandidateItemsStategy in case of
the item-based recommender. On the other hand if we want to implement this
approach using the user-bsed recommender no such option is available.

In order to implement the pre-filtering using the user-based recommender, I
was thinking to filter out the unrelated user,items pairs from the dataset
before the creation of the data model. This means that the data model will
take as input a subset of the initial dataset.
Does this approach sound correct? There are some concerns regarding the
evaluation of the recommender. Does it have any impact on this?

Thank you in advance!

Regards,
Efi



Random Forest on old mapred API

2015-03-06 Thread Wei Li
Hi All:

For some reasons, we need to re-implement the Random Forest in mahout
based on old MapRed API to run it on our Hadoop deployment, we know that
old MapRed API is different from new MapReduce API, could you please give
me some hint on how to do this? many thanks.

Best
Wei