Re: Problem with Execution plan using loop

2017-04-15 Thread Georg Heiler
Hi I had a similar problem. For me, using the rdd stat counter helped a lot. Check out http://stackoverflow.com/questions/41169873/spark-dynamic-dag-is-a-lot-slower-and-different-from-hard-coded-dag and http://stackoverflow.com/questions/41445571/spark-migrate-sql-window-function-to-rdd-for-better

Fwd: Problem with Execution plan using loop

2017-04-15 Thread Javier Rey
Hi guys, I have this situation: 1. Data frame with 22 columns 2. I need to add some columns (feature engineering) using existing columns, 12 columns will be add by each column in list. 3. I created a loop, but in the 5 item(col) on the loop this starts to go very slow in the join part, I can obse

Problem with Execution plan using loop

2017-04-15 Thread Javier Rey
Hi guys, I have this situation: 1. Data frame with 22 columns 2. I need to add some columns (feature engineering) using existing columns, 12 columns will be add by each column in list. 3. I created a loop, but in the 5 item(col) on the loop this starts to go very slow in the join part, I can obse