I want to use prefixspan so I had a look at the code and the cited paper : "Distributed PrefixSpan Algorithm Based on MapReduce".
There is a result in the paper I didn't really undertstand and I could'nt find where it is used in the code. Suppose a sequence database S = {1,2...n}, a sequence <a...> is a length-(L-1) (2≤L≤n) sequential pattern, in projected databases which is a prefix of a length-(L-1) sequential pattern <a...a>, when the support count of <a> is not less than min_support, it is equal to obtaining a length-L sequential pattern < a ... a > from projected databases that obtaining a length-L sequential pattern < a ... a > from a sequence database S. According to the paper It's supposed to add a pruning step in the reduce function but I couldn't find where. This result seems to come from a previous paper : "Wang Linlin, Fan Jun. Improved Algorithm for Sequential Pattern Mining Based on PrefixSpan [J]. Computer Engineering, 2009, 35(23): 56-61" but it didn't help me to understand it and how it can improve the algorithm. -- Alexis GILLAIN