Re: Spark FP-growth

Aditya Addepalli Thu, 07 May 2020 10:26:10 -0700

Hi,

I understand that this is not a priority with everything going on, but if
you think generating rules for only a single consequent adds value, I would
like to contribute.


Thanks & Regards,
Aditya

On Sat, May 2, 2020 at 9:34 PM Aditya Addepalli <dyex...@gmail.com> wrote:

> Hi Sean,
>
> I understand your approach, but there's a slight problem.
>
> If we generate rules after filtering for our desired consequent, we are
> introducing some bias into our rules.
> The confidence of the rules on the filtered input can be very high but
> this may not be the case on the entire dataset.
> Thus we can get biased rules which wrongly depict the patterns in the data.
> This is why I think having a parameter to mention the consequent would
> help greatly.
>
> Reducing the support doesn't really work in my case simply because rules
> for the consequents I am mining for occur very rarely in the data.
> Sometimes this can be 1e-4 or 1e-5, so my minSupport has to be less than
> that to capture the rules for that consequent.
>
> Thanks for your reply. Let me know what you think.
>
> Regards.
> Aditya Addepalli
>
>
>
>
> On Sat, 2 May, 2020, 9:13 pm Sean Owen, <sro...@gmail.com> wrote:
>
>> You could just filter the input for sets containing the desired item,
>> and discard the rest. That doesn't mean all of the item sets have that
>> item, and you'd still have to filter, but may be much faster to
>> compute.
>> Increasing min support might generally have the effect of smaller
>> rules, though it doesn't impose a cap. That could help perf, if that's
>> what you're trying to improve.
>> I don't know if it's worth new params in the implementation, maybe. I
>> think there would have to be an argument this generalizes.
>>
>> On Sat, May 2, 2020 at 3:13 AM Aditya Addepalli <dyex...@gmail.com>
>> wrote:
>> >
>> > Hi Everyone,
>> >
>> > I was wondering if we could make any enhancements to the FP-Growth
>> algorithm in spark/pyspark.
>> >
>> > Many times I am looking for a rule for a particular consequent, so I
>> don't need the rules for all the other consequents. I know I can filter the
>> rules to get the desired output, but if I could input this in the algorithm
>> itself, the execution time would reduce drastically.
>> >
>> > Also, sometimes I want the rules to be small, maybe of length 5-6.
>> Again, I can filter on length but I was wondering if we could take this as
>> input into the algo. Given the Depth first nature of FP-Growth, I am not
>> sure that is feasible.
>> >
>> >  I am willing to work on these suggestions, if someone thinks they are
>> feasible. Thanks to the dev team for all the hard work!
>> >
>> > Regards,
>> > Aditya Addepalli
>>
>

Re: Spark FP-growth

Reply via email to