Re: Job hangs in blocked task in final parquet write stage

2018-11-29 Thread Christopher Petrino
ther jobs that > hang in the same manner, the thread dump didn't have any blocked threads, > so that might be a red herring. > > On Wed, Nov 28, 2018 at 4:34 PM Christopher Petrino < > christopher.petr...@gmail.com> wrote: > >> I ran into problems using 5.19 s

Re: Job hangs in blocked task in final parquet write stage

2018-11-28 Thread Christopher Petrino
I ran into problems using 5.19 so I referred to 5.17 and it resolved my issues. On Wed, Nov 28, 2018 at 2:48 AM Conrad Lee wrote: > Hello Vadim, > > Interesting. I've only been running this job at scale for a couple weeks > so I can't say whether this is related to recent EMR changes. > > Much

Spark column combinations and combining multiple dataframes (pyspark)

2018-11-26 Thread Christopher Petrino
Hi all, I'm working on a problem where it is necessary to find all combinations of columns for a dataframe. THE PROBLEM: Let's say there is a dataframe with columns: [ col_a, col_b, col_c, col_d, col_e, result ] The number of combinations can vary between 1 and 5 but lets say 3 for this case. T