[jira] Commented: (MAHOUT-305) Combine both cooccurrence-based CF M/R jobs

Ted Dunning (JIRA) Mon, 26 Apr 2010 07:23:56 -0700

    [ 
https://issues.apache.org/jira/browse/MAHOUT-305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12860931#action_12860931
 ]


Ted Dunning commented on MAHOUT-305:
------------------------------------


I think that the key to speed with a cooccurrence counter is not to prune small 
support items, but rather to prune the most common items.  The problem is that 
the overall speed is proportional to the average of the square of the number of 
items per user.   You can either prune each item to only be seen as related to 
a bounded number of users, or you can limit the number of items each user is 
considered to have seen.  Either has pretty much the desired impact and neither 
causes any significant loss of information (partly because users who touch 
toooo many items are often spammers and partly because we don't learn anything 
new about them after the first several hundred items).

Another alternative is to use a decomposition-based approximation of the 
cooccurrence matrix.  I think both techniques should be available so the direct 
counting approach is still valuable.

> Combine both cooccurrence-based CF M/R jobs
> -------------------------------------------
>
>                 Key: MAHOUT-305
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-305
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.2
>            Reporter: Sean Owen
>            Assignee: Ankur
>            Priority: Minor
>
> We have two different but essentially identical MapReduce jobs to make 
> recommendations based on item co-occurrence: 
> org.apache.mahout.cf.taste.hadoop.{item,cooccurrence}. They ought to be 
> merged. Not sure exactly how to approach that but noting this in JIRA, per 
> Ankur.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-305) Combine both cooccurrence-based CF M/R jobs

Reply via email to