Absolutely. I meant to say the confidence calculation depends on the support calculations and hence would reduce the time. Thanks for pointing that out.
On Thu, 7 May, 2020, 11:56 pm Sean Owen, <sro...@gmail.com> wrote: > The confidence calculation is pretty trivial, the work is finding the > supports needed. Not sure how to optimize that. > > On Thu, May 7, 2020, 1:12 PM Aditya Addepalli <dyex...@gmail.com> wrote: > >> Hi Sean, >> >> 1. >> I was thinking that by specifying the consequent we can (somehow?) skip >> the confidence calculation for all the other consequents. >> >> This would greatly reduce the time taken as we avoid computation for >> consequents we don't care about. >> >> >> 2. >> Is limiting rule size even possible? I thought because of FP growth's >> depth first nature it might not be possible. >> >> My experience with Fp-growth has largely been in python where the API is >> limited. I will take a look at the scala source code and get back to you >> with more concrete answers. >> >> Thanks & Regards, >> Aditya >> >> On Thu, 7 May, 2020, 11:21 pm Sean Owen, <sro...@gmail.com> wrote: >> >>> Yes, you can get the correct support this way by accounting for how >>> many rows were filtered out, but not the right confidence, as it >>> depends on counting support in rows without the items of interest. >>> >>> But computing confidence depends on computing all that support; how >>> would you optimize it even if you knew the consequent you cared about? >>> maybe there's a way, sure, I don't know the code well but it wasn't >>> obvious at a glance how to take advantage of it. >>> >>> I can see how limiting the rule size could help. >>> >>> On Sat, May 2, 2020 at 11:04 AM Aditya Addepalli <dyex...@gmail.com> >>> wrote: >>> > >>> > Hi Sean, >>> > >>> > I understand your approach, but there's a slight problem. >>> > >>> > If we generate rules after filtering for our desired consequent, we >>> are introducing some bias into our rules. >>> > The confidence of the rules on the filtered input can be very high but >>> this may not be the case on the entire dataset. >>> > Thus we can get biased rules which wrongly depict the patterns in the >>> data. >>> > This is why I think having a parameter to mention the consequent would >>> help greatly. >>> > >>> > Reducing the support doesn't really work in my case simply because >>> rules for the consequents I am mining for occur very rarely in the data. >>> > Sometimes this can be 1e-4 or 1e-5, so my minSupport has to be less >>> than that to capture the rules for that consequent. >>> > >>> > Thanks for your reply. Let me know what you think. >>> > >>> > Regards. >>> > Aditya Addepalli >>> > >>> > >>> > >>> > >>> > On Sat, 2 May, 2020, 9:13 pm Sean Owen, <sro...@gmail.com> wrote: >>> >> >>> >> You could just filter the input for sets containing the desired item, >>> >> and discard the rest. That doesn't mean all of the item sets have that >>> >> item, and you'd still have to filter, but may be much faster to >>> >> compute. >>> >> Increasing min support might generally have the effect of smaller >>> >> rules, though it doesn't impose a cap. That could help perf, if that's >>> >> what you're trying to improve. >>> >> I don't know if it's worth new params in the implementation, maybe. I >>> >> think there would have to be an argument this generalizes. >>> >> >>> >> On Sat, May 2, 2020 at 3:13 AM Aditya Addepalli <dyex...@gmail.com> >>> wrote: >>> >> > >>> >> > Hi Everyone, >>> >> > >>> >> > I was wondering if we could make any enhancements to the FP-Growth >>> algorithm in spark/pyspark. >>> >> > >>> >> > Many times I am looking for a rule for a particular consequent, so >>> I don't need the rules for all the other consequents. I know I can filter >>> the rules to get the desired output, but if I could input this in the >>> algorithm itself, the execution time would reduce drastically. >>> >> > >>> >> > Also, sometimes I want the rules to be small, maybe of length 5-6. >>> Again, I can filter on length but I was wondering if we could take this as >>> input into the algo. Given the Depth first nature of FP-Growth, I am not >>> sure that is feasible. >>> >> > >>> >> > I am willing to work on these suggestions, if someone thinks they >>> are feasible. Thanks to the dev team for all the hard work! >>> >> > >>> >> > Regards, >>> >> > Aditya Addepalli >>> >>