Hi
I had a similar problem. For me, using the rdd stat counter helped a lot.
Check out
http://stackoverflow.com/questions/41169873/spark-dynamic-dag-is-a-lot-slower-and-different-from-hard-coded-dag
and
http://stackoverflow.com/questions/41445571/spark-migrate-sql-window-function-to-rdd-for-better
Hi guys,
I have this situation:
1. Data frame with 22 columns
2. I need to add some columns (feature engineering) using existing columns,
12 columns will be add by each column in list.
3. I created a loop, but in the 5 item(col) on the loop this starts to go
very slow in the join part, I can obse
Hi guys,
I have this situation:
1. Data frame with 22 columns
2. I need to add some columns (feature engineering) using existing columns,
12 columns will be add by each column in list.
3. I created a loop, but in the 5 item(col) on the loop this starts to go
very slow in the join part, I can obse