Re: ORC file writing hangs in pyspark

Jeff Zhang Tue, 23 Feb 2016 18:24:08 -0800

Have you checked the live spark UI and yarn app logs ?

On Tue, Feb 23, 2016 at 10:05 PM, James Barney <jamesbarne...@gmail.com>
wrote:


> I'm trying to write an ORC file after running the FPGrowth algorithm on a
> dataset of around just 2GB in size. The algorithm performs well and can
> display results if I take(n) the freqItemSets() of the result after
> converting that to a DF.
>
> I'm using Spark 1.5.2 on HDP 2.3.4 and Python 3.4.2 on Yarn.
>
> I get the results from querying a Hive table, also ORC format, running a
> number of maps, joins, and filters on the data.
>
> When the program attempts to write the files:
>     result.write.orc('/data/staged/raw_result')
>   size_1_buckets.write.orc('/data/staged/size_1_results')
>   filter_size_2_buckets.write.orc('/data/staged/size_2_results')
>
> The first path, /data/staged/raw_result, is created with a _temporary
> folder, but the data is never written. The job hangs at this point,
> apparently indefinitely.
>
> Additionally, no logs are recorded or available for the jobs on the
> history server.
>
> What could be the problem?
>



-- 
Best Regards

Jeff Zhang

Re: ORC file writing hangs in pyspark

Reply via email to