Re: MLlib Prefixspan implementation

alexis GILLAIN Wed, 26 Aug 2015 00:12:13 -0700

A first use case of gap constraint is included in the article.
Another application would be customer-shopping sequence analysis where you
want to put a constraint on the duration between two purchases for them to
be considered as a pertinent sequence.


Additional question regarding the code : what's the point of using
ReversedPrefix
in localprefispan ? The prefix is used neither in finding frequent items of
a projected database or computing a new projected database so it looks like
it's appended in inverse order just to be reversed when transformed to a
sequence.

2015-08-25 12:15 GMT+08:00 Feynman Liang <fli...@databricks.com>:

> CCing the mailing list again.
>
> It's currently not on the radar. Do you have a use case for it? I can
> bring it up during 1.6 roadmap planning tomorrow.
>
> On Mon, Aug 24, 2015 at 8:28 PM, alexis GILLAIN <ila...@hotmail.com>
> wrote:
>
>> Hi,
>>
>> I just realized the article I mentioned is cited in the jira and not in
>> the code so I guess you didn't use this result.
>>
>> Do you plan to implement sequence with timestamp and gap constraint as in
>> :
>>
>> https://people.mpi-inf.mpg.de/~rgemulla/publications/miliaraki13mg-fsm.pdf
>>
>> 2015-08-25 7:06 GMT+08:00 Feynman Liang <fli...@databricks.com>:
>>
>>> Hi Alexis,
>>>
>>> Unfortunately, both of the papers you referenced appear to be
>>> translations and are quite difficult to understand. We followed
>>> http://doi.org/10.1109/ICDE.2001.914830 when implementing PrefixSpan.
>>> Perhaps you can find the relevant lines in there so I can elaborate further?
>>>
>>> Feynman
>>>
>>> On Thu, Aug 20, 2015 at 9:07 AM, alexis GILLAIN <ila...@hotmail.com>
>>> wrote:
>>>
>>>> I want to use prefixspan so I had a look at the code and the cited
>>>> paper : "Distributed PrefixSpan Algorithm Based on MapReduce".
>>>>
>>>> There is a result in the paper I didn't really undertstand and I
>>>> could'nt find where it is used in the code.
>>>>
>>>> Suppose a sequence database S = {1,2...n}, a sequence <a...> is a
>>>> length-(L-1) (2≤L≤n) sequential pattern, in projected databases which is a
>>>> prefix of a length-(L-1) sequential pattern <a...a>, when the support count
>>>> of <a> is not less than min_support, it is equal to obtaining a length-L
>>>> sequential pattern < a ... a > from projected databases that obtaining a
>>>> length-L sequential pattern < a ... a > from a sequence database S.
>>>>
>>>> According to the paper It's supposed to add a pruning step in the
>>>> reduce function but I couldn't find where.
>>>>
>>>> This result seems to come from a previous paper : "Wang Linlin, Fan
>>>> Jun. Improved Algorithm for Sequential Pattern Mining Based on PrefixSpan
>>>> [J]. Computer Engineering, 2009, 35(23): 56-61" but it didn't help me to
>>>> understand it and how it can improve the algorithm.
>>>>
>>>
>>>
>>
>

Re: MLlib Prefixspan implementation

Reply via email to