Re: Spark FP-growth

2020-05-07 Thread Aditya Addepalli
Absolutely. I meant to say the confidence calculation depends on the support calculations and hence would reduce the time. Thanks for pointing that out. On Thu, 7 May, 2020, 11:56 pm Sean Owen, wrote: > The confidence calculation is pretty trivial, the work is finding the > supports needed. Not

Re: Spark FP-growth

2020-05-07 Thread Sean Owen
The confidence calculation is pretty trivial, the work is finding the supports needed. Not sure how to optimize that. On Thu, May 7, 2020, 1:12 PM Aditya Addepalli wrote: > Hi Sean, > > 1. > I was thinking that by specifying the consequent we can (somehow?) skip > the confidence calculation for

Re: Spark FP-growth

2020-05-07 Thread Aditya Addepalli
Hi Sean, 1. I was thinking that by specifying the consequent we can (somehow?) skip the confidence calculation for all the other consequents. This would greatly reduce the time taken as we avoid computation for consequents we don't care about. 2. Is limiting rule size even possible? I thought b

Re: Spark FP-growth

2020-05-07 Thread Sean Owen
Yes, you can get the correct support this way by accounting for how many rows were filtered out, but not the right confidence, as it depends on counting support in rows without the items of interest. But computing confidence depends on computing all that support; how would you optimize it even if

Re: Spark FP-growth

2020-05-07 Thread Aditya Addepalli
Hi, I understand that this is not a priority with everything going on, but if you think generating rules for only a single consequent adds value, I would like to contribute. Thanks & Regards, Aditya On Sat, May 2, 2020 at 9:34 PM Aditya Addepalli wrote: > Hi Sean, > > I understand your approac

Re: Spark FP-growth

2020-05-02 Thread Aditya Addepalli
Hi Sean, I understand your approach, but there's a slight problem. If we generate rules after filtering for our desired consequent, we are introducing some bias into our rules. The confidence of the rules on the filtered input can be very high but this may not be the case on the entire dataset. T

Re: Spark FP-growth

2020-05-02 Thread Sean Owen
You could just filter the input for sets containing the desired item, and discard the rest. That doesn't mean all of the item sets have that item, and you'd still have to filter, but may be much faster to compute. Increasing min support might generally have the effect of smaller rules, though it do

Spark FP-growth

2020-05-02 Thread Aditya Addepalli
Hi Everyone, I was wondering if we could make any enhancements to the FP-Growth algorithm in spark/pyspark. Many times I am looking for a rule for a particular consequent, so I don't need the rules for all the other consequents. I know I can filter the rules to get the desired output, but if I co