subject:"ORC file writing hangs in pyspark"

Re: ORC file writing hangs in pyspark

2016-02-24 Thread James Barney

Thank you for the suggestions. We looked at the live spark UI and yarn app logs and found what we think to be the issue: in spark 1.5.2, the FPGrowth algorithm doesn't require you to specify the number of partitions in your input data. Without specifying, FPGrowth puts all of its data into one part

Re: ORC file writing hangs in pyspark

2016-02-23 Thread Zhan Zhang

Hi James, You can try to write with other format, e.g., parquet to see whether it is a orc specific issue or more generic issue. Thanks. Zhan Zhang On Feb 23, 2016, at 6:05 AM, James Barney mailto:jamesbarne...@gmail.com>> wrote: I'm trying to write an ORC file after running the FPGrowth alg

Re: ORC file writing hangs in pyspark

2016-02-23 Thread Jeff Zhang

Have you checked the live spark UI and yarn app logs ? On Tue, Feb 23, 2016 at 10:05 PM, James Barney wrote: > I'm trying to write an ORC file after running the FPGrowth algorithm on a > dataset of around just 2GB in size. The algorithm performs well and can > display results if I take(n) the fr

ORC file writing hangs in pyspark

2016-02-23 Thread James Barney

I'm trying to write an ORC file after running the FPGrowth algorithm on a dataset of around just 2GB in size. The algorithm performs well and can display results if I take(n) the freqItemSets() of the result after converting that to a DF. I'm using Spark 1.5.2 on HDP 2.3.4 and Python 3.4.2 on Yarn