Re: is there any tool to visualize the spark physical plan or spark plan

2020-05-02 Thread Enrico Minack
Kelly Zhang, You can add a SparkListenerto your spark context: sparkContext.addSparkListener(newSparkListener{}) That one can override onTaskEnd, which provides you a SparkListenerTaskEnd for each task. That instance provides you access to the metrics. See: - https://spark.apache.org/doc

Re: Spark FP-growth

2020-05-02 Thread Aditya Addepalli
Hi Sean, I understand your approach, but there's a slight problem. If we generate rules after filtering for our desired consequent, we are introducing some bias into our rules. The confidence of the rules on the filtered input can be very high but this may not be the case on the entire dataset. T

Re: Spark FP-growth

2020-05-02 Thread Sean Owen
You could just filter the input for sets containing the desired item, and discard the rest. That doesn't mean all of the item sets have that item, and you'd still have to filter, but may be much faster to compute. Increasing min support might generally have the effect of smaller rules, though it do

Spark FP-growth

2020-05-02 Thread Aditya Addepalli
Hi Everyone, I was wondering if we could make any enhancements to the FP-Growth algorithm in spark/pyspark. Many times I am looking for a rule for a particular consequent, so I don't need the rules for all the other consequents. I know I can filter the rules to get the desired output, but if I co