GitHub user jackylk opened a pull request:

    https://github.com/apache/spark/pull/2847

    [SPARK-4001][MLlib] adding apriori algorithm for frequent item set mining 
in Spark

    Apriori is the classic algorithm for frequent item set mining in a 
transactional data set. It will be useful if Apriori algorithm is added to 
MLLib in Spark. This PR add an implementation for it. 
    There is a point I am not sure wether it is most efficient. In order to 
filter out the eligible frequent item set, currently I am using a cartesian 
operation on two RDDs to calculate the degree of support of each item set, not 
sure wether it is better to use broadcast variable to achieve the same.
    
    I will add an example to use this algorithm if requires

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jackylk/spark apriori

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/2847.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2847
    
----
commit da2cba7e063745aacef74ff555e7bd7c55a24f56
Author: Jacky Li <[email protected]>
Date:   2014-10-19T09:19:27Z

    adding apriori algorithm for frequent item set mining in Spark

commit 889b33fdfabcc222c82e3bce619aeb6c7031fc58
Author: Jacky Li <[email protected]>
Date:   2014-10-19T09:31:04Z

    modify per scalastyle check

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to