Re: Spark SQL High GC time

2015-05-25 Thread Nick Travers
Hi Yuming - I was running into the same issue with larger worker nodes a few weeks ago. The way I managed to get around the high GC time, as per the suggestion of some others, was to break each worker node up into individual workers of around 10G in size. Divide your cores accordingly. The other

Spark SQL and java.lang.RuntimeException

2015-05-09 Thread Nick Travers
I'm getting the following error when reading a table from Hive. Note the spelling of the 'Primitve' in the stack trace. I can't seem to find it anywhere else online. It seems to only occur with this one particular table I am reading from. Occasionally the task will completely fail, other times it

Re: Long GC pauses with Spark SQL 1.3.0 and billion row tables

2015-05-03 Thread Nick Travers
Could you be more specific in how this is done? A DataFrame class doesn't have that method. On Sun, May 3, 2015 at 11:07 PM, ayan guha wrote: > You can use custom partitioner to redistribution using partitionby > On 4 May 2015 15:37, "Nick Travers" wrote: > >> I&

Long GC pauses with Spark SQL 1.3.0 and billion row tables

2015-05-03 Thread Nick Travers
I'm currently trying to join two large tables (order 1B rows each) using Spark SQL (1.3.0) and am running into long GC pauses which bring the job to a halt. I'm reading in both tables using a HiveContext with the underlying files stored as Parquet Files. I'm using something along the lines of Hiv

Re: Spark, snappy and HDFS

2015-04-02 Thread Nick Travers
/ byte[]. Review what > you are writing since it is not BytesWritable / Text. > > On Thu, Apr 2, 2015 at 3:40 AM, Nick Travers > wrote: > > I'm actually running this in a separate environment to our HDFS cluster. > > > > I think I've been able to sort out th

Re: Spark, snappy and HDFS

2015-04-01 Thread Nick Travers
node(usually the executor) which gives the > java.lang.UnsatisfiedLinkError to see whether the libsnappy.so is in the > hadoop native lib path. > > On Thursday, April 2, 2015 at 10:22 AM, Nick Travers wrote: > > Thanks for the super quick response! > > I can read the file j

Re: Spark, snappy and HDFS

2015-04-01 Thread Nick Travers
Apr 1, 2015 at 7:19 PM, Xianjin YE wrote: > Can you read snappy compressed file in hdfs? Looks like the libsnappy.so > is not in the hadoop native lib path. > > On Thursday, April 2, 2015 at 10:13 AM, Nick Travers wrote: > > Has anyone else encountered the following error when

Spark, snappy and HDFS

2015-04-01 Thread Nick Travers
Has anyone else encountered the following error when trying to read a snappy compressed sequence file from HDFS? *java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z* The following works for me when the file is uncompressed: import org.apache.hadoop.io.

java.io.FileNotFoundException when using HDFS in cluster mode

2015-03-29 Thread Nick Travers
Hi List, I'm following this example here with the following: $SPARK_HOME/bin/spark-submit \ --deploy-mode cluster \ --master spark://host.domain.ex:7077 \ --class com.oreilly.learningsparkexamples.mini.scal