Have you checked the live spark UI and yarn app logs ? On Tue, Feb 23, 2016 at 10:05 PM, James Barney <jamesbarne...@gmail.com> wrote:
> I'm trying to write an ORC file after running the FPGrowth algorithm on a > dataset of around just 2GB in size. The algorithm performs well and can > display results if I take(n) the freqItemSets() of the result after > converting that to a DF. > > I'm using Spark 1.5.2 on HDP 2.3.4 and Python 3.4.2 on Yarn. > > I get the results from querying a Hive table, also ORC format, running a > number of maps, joins, and filters on the data. > > When the program attempts to write the files: > result.write.orc('/data/staged/raw_result') > size_1_buckets.write.orc('/data/staged/size_1_results') > filter_size_2_buckets.write.orc('/data/staged/size_2_results') > > The first path, /data/staged/raw_result, is created with a _temporary > folder, but the data is never written. The job hangs at this point, > apparently indefinitely. > > Additionally, no logs are recorded or available for the jobs on the > history server. > > What could be the problem? > -- Best Regards Jeff Zhang