[
https://issues.apache.org/jira/browse/MAHOUT-305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12860931#action_12860931
]
Ted Dunning commented on MAHOUT-305:
------------------------------------
I think that the key to speed with a cooccurrence counter is not to prune small
support items, but rather to prune the most common items. The problem is that
the overall speed is proportional to the average of the square of the number of
items per user. You can either prune each item to only be seen as related to
a bounded number of users, or you can limit the number of items each user is
considered to have seen. Either has pretty much the desired impact and neither
causes any significant loss of information (partly because users who touch
toooo many items are often spammers and partly because we don't learn anything
new about them after the first several hundred items).
Another alternative is to use a decomposition-based approximation of the
cooccurrence matrix. I think both techniques should be available so the direct
counting approach is still valuable.
> Combine both cooccurrence-based CF M/R jobs
> -------------------------------------------
>
> Key: MAHOUT-305
> URL: https://issues.apache.org/jira/browse/MAHOUT-305
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Affects Versions: 0.2
> Reporter: Sean Owen
> Assignee: Ankur
> Priority: Minor
>
> We have two different but essentially identical MapReduce jobs to make
> recommendations based on item co-occurrence:
> org.apache.mahout.cf.taste.hadoop.{item,cooccurrence}. They ought to be
> merged. Not sure exactly how to approach that but noting this in JIRA, per
> Ankur.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.