Hello all, We have recently implemented two frequent itemset (pattern) mining algorithms for MapReduce. They are much faster than the available PFP implementation (see the paper). We would like to contribute these implementations to Mahout and maintain them as needed.You can find the code on the link below.
I know that the PFP is removed because of the lack of developer interest. As the research group we are willing to provide new frequent pattern mining algorithms and maintain existing ones. Can you please help me with the steps that I have to go through? I have a couple of questions in mind but any comments/suggestions are very welcome: - Shall I start a JIRA issue? - Shall I use the old fpm package for the algorithms? - We use a file for configuration because it makes runs easier to reproduce, is this OK with Mahout guidelines? - Is there a guideline for the trade off between code quality and performance? For example we preferred functional/integration tests to unit tests because making some classes unit testable adds too much class creation overhead. Link for the code and paper: http://adrem.ua.ac.be/bigfim Thank you in advance for your help. Cheers! -- M. Emin Akşehirli