Hi, I understand that this is not a priority with everything going on, but if you think generating rules for only a single consequent adds value, I would like to contribute.
Thanks & Regards, Aditya On Sat, May 2, 2020 at 9:34 PM Aditya Addepalli <dyex...@gmail.com> wrote: > Hi Sean, > > I understand your approach, but there's a slight problem. > > If we generate rules after filtering for our desired consequent, we are > introducing some bias into our rules. > The confidence of the rules on the filtered input can be very high but > this may not be the case on the entire dataset. > Thus we can get biased rules which wrongly depict the patterns in the data. > This is why I think having a parameter to mention the consequent would > help greatly. > > Reducing the support doesn't really work in my case simply because rules > for the consequents I am mining for occur very rarely in the data. > Sometimes this can be 1e-4 or 1e-5, so my minSupport has to be less than > that to capture the rules for that consequent. > > Thanks for your reply. Let me know what you think. > > Regards. > Aditya Addepalli > > > > > On Sat, 2 May, 2020, 9:13 pm Sean Owen, <sro...@gmail.com> wrote: > >> You could just filter the input for sets containing the desired item, >> and discard the rest. That doesn't mean all of the item sets have that >> item, and you'd still have to filter, but may be much faster to >> compute. >> Increasing min support might generally have the effect of smaller >> rules, though it doesn't impose a cap. That could help perf, if that's >> what you're trying to improve. >> I don't know if it's worth new params in the implementation, maybe. I >> think there would have to be an argument this generalizes. >> >> On Sat, May 2, 2020 at 3:13 AM Aditya Addepalli <dyex...@gmail.com> >> wrote: >> > >> > Hi Everyone, >> > >> > I was wondering if we could make any enhancements to the FP-Growth >> algorithm in spark/pyspark. >> > >> > Many times I am looking for a rule for a particular consequent, so I >> don't need the rules for all the other consequents. I know I can filter the >> rules to get the desired output, but if I could input this in the algorithm >> itself, the execution time would reduce drastically. >> > >> > Also, sometimes I want the rules to be small, maybe of length 5-6. >> Again, I can filter on length but I was wondering if we could take this as >> input into the algo. Given the Depth first nature of FP-Growth, I am not >> sure that is feasible. >> > >> > I am willing to work on these suggestions, if someone thinks they are >> feasible. Thanks to the dev team for all the hard work! >> > >> > Regards, >> > Aditya Addepalli >> >