SQL warehouse dir

2017-02-10 Thread Joseph Naegele
Hi all, I've read the docs for Spark SQL 2.1.0 but I'm still having issues with the warehouse and related details. I'm not using Hive proper, so my hive-site.xml consists only of: javax.jdo.option.ConnectionURL jdbc:derby:;databaseName=/mnt/data/spark/metastore_db;create=true I've set "sp

Spark SQL 1.6.3 ORDER BY and partitions

2017-01-06 Thread Joseph Naegele
I have two separate but similar issues that I've narrowed down to a pretty good level of detail. I'm using Spark 1.6.3, particularly Spark SQL. I'm concerned with a single dataset for now, although the details apply to other, larger datasets. I'll call it "table". It's around 160 M records, ave

Storage history in web UI

2017-01-03 Thread Joseph Naegele
Hi all, Is there any way to observe Storage history in Spark, i.e. which RDDs were cached and where, etc. after an application completes? It appears the Storage tab in the History Server UI is useless. Thanks --- Joe Naegele Grier Forensics ---

RE: [Spark SQL] Task failed while writing rows

2016-12-19 Thread Joseph Naegele
. Thanks --- Joe Naegele Grier Forensics From: Michael Stratton [mailto:michael.strat...@komodohealth.com] Sent: Monday, December 19, 2016 10:00 AM To: Joseph Naegele Cc: user Subject: Re: [Spark SQL] Task failed while writing rows It seems like an issue w/ Hadoop. What do you get when

[Spark SQL] Task failed while writing rows

2016-12-18 Thread Joseph Naegele
Hi all, I'm having trouble with a relatively simple Spark SQL job. I'm using Spark 1.6.3. I have a dataset of around 500M rows (average 128 bytes per record). It's current compressed size is around 13 GB, but my problem started when it was much smaller, maybe 5 GB. This dataset is generated by

spark nightly builds with Hadoop 2.7

2016-09-09 Thread Joseph Naegele
Hello, I'm using the Spark nightly build "spark-2.1.0-SNAPSHOT-bin-hadoop2.7" from http://people.apache.org/~pwendell/spark-nightly/spark-master-bin/ due to bugs in Spark 2.0.0 (SPARK-16740, SPARK-16802), however I noticed that the recent builds only come in "-hadoop2.4-without-hive" and "-without