Thank you for the suggestions. We looked at the live spark UI and yarn app
logs and found what we think to be the issue: in spark 1.5.2, the FPGrowth
algorithm doesn't require you to specify the number of partitions in your
input data. Without specifying, FPGrowth puts all of its data into one
part
Hi James,
You can try to write with other format, e.g., parquet to see whether it is a
orc specific issue or more generic issue.
Thanks.
Zhan Zhang
On Feb 23, 2016, at 6:05 AM, James Barney
mailto:jamesbarne...@gmail.com>> wrote:
I'm trying to write an ORC file after running the FPGrowth alg
Have you checked the live spark UI and yarn app logs ?
On Tue, Feb 23, 2016 at 10:05 PM, James Barney
wrote:
> I'm trying to write an ORC file after running the FPGrowth algorithm on a
> dataset of around just 2GB in size. The algorithm performs well and can
> display results if I take(n) the fr
I'm trying to write an ORC file after running the FPGrowth algorithm on a
dataset of around just 2GB in size. The algorithm performs well and can
display results if I take(n) the freqItemSets() of the result after
converting that to a DF.
I'm using Spark 1.5.2 on HDP 2.3.4 and Python 3.4.2 on Yarn