Re: How to access Spark UI through AWS

2015-08-25 Thread Kelly, Jonathan
I'm not sure why the UI appears broken like that either and haven't investigated it myself yet, but if you instead go to the YARN ResourceManager UI (port 8088 if you are using emr-4.x; port 9026 for 3.x, I believe), then you should be able to click on the ApplicationMaster link (or the History lin

Re: Unable to use dynamicAllocation if spark.executor.instances is set in spark-defaults.conf

2015-07-15 Thread Kelly, Jonathan
k existing users with these properties. I'm hoping to figure out a way to reconcile these by Spark 1.5. -Sandy On Wed, Jul 15, 2015 at 3:18 PM, Kelly, Jonathan mailto:jonat...@amazon.com>> wrote: Would there be any problem in having spark.executor.instances (or --num-executors) be

Re: Unable to use dynamicAllocation if spark.executor.instances is set in spark-defaults.conf

2015-07-15 Thread Kelly, Jonathan
Would there be any problem in having spark.executor.instances (or --num-executors) be completely ignored (i.e., even for non-zero values) if spark.dynamicAllocation.enabled is true (i.e., rather than throwing an exception)? I can see how the exception would be helpful if, say, you tried to pass

Re: Unable to use dynamicAllocation if spark.executor.instances is set in spark-defaults.conf

2015-07-15 Thread Kelly, Jonathan
bump From: Jonathan Kelly mailto:jonat...@amazon.com>> Date: Tuesday, July 14, 2015 at 4:23 PM To: "user@spark.apache.org" mailto:user@spark.apache.org>> Subject: Unable to use dynamicAllocation if spark.executor.instances is set in spark-defaults.conf I've set up

Unable to use dynamicAllocation if spark.executor.instances is set in spark-defaults.conf

2015-07-14 Thread Kelly, Jonathan
I've set up my cluster with a pre-calcualted value for spark.executor.instances in spark-defaults.conf such that I can run a job and have it maximize the utilization of the cluster resources by default. However, if I want to run a job with dynamicAllocation (by passing -c spark.dynamicAllocation

Re: Spark ec2 cluster lost worker

2015-06-24 Thread Kelly, Jonathan
luster lost worker Hi Jonathan, Thanks for this information! I will take a look into it. However is there a way to reconnect the lost node? Or there's no way that I could do to find back the lost worker? Thanks! Anny On Wed, Jun 24, 2015 at 6:06 PM, Kelly, Jonathan mailto:jonat...@amazon.com

Re: Spark ec2 cluster lost worker

2015-06-24 Thread Kelly, Jonathan
Just curious, would you be able to use Spark on EMR rather than on EC2? Spark on EMR will handle lost nodes for you, and it will let you scale your cluster up and down or clone a cluster (its config, that is, not the data stored in HDFS), among other things. We also recently announced official supp

Re: [ERROR] Insufficient Space

2015-06-19 Thread Kelly, Jonathan
Would you be able to use Spark on EMR rather than on EC2? EMR clusters allow easy resizing of the cluster, and EMR also now supports Spark 1.3.1 as of EMR AMI 3.8.0. See http://aws.amazon.com/emr/spark ~ Jonathan From: Vadim Bichutskiy mailto:vadim.bichuts...@gmail.com>> Date: Friday, June 19

Re: Spark on EMR

2015-06-17 Thread Kelly, Jonathan
Yes, for now it is a wrapper around the old install-spark BA, but that will change soon. The currently supported version in AMI 3.8.0 is 1.3.1, as 1.4.0 was released too late to include it in AMI 3.8.0. Spark 1.4.0 support is coming soon though, of course. Unfortunately, though install-spark is

Re: Spark + Kinesis

2015-04-03 Thread Kelly, Jonathan
t;mailto:user@spark.apache.org>" mailto:user@spark.apache.org>> Subject: Re: Spark + Kinesis Thanks. So how do I fix it? [https://mailfoogae.appspot.com/t?sender=admFkaW0uYmljaHV0c2tpeUBnbWFpbC5jb20%3D&type=zerocontent&guid=51a86f6a-7130-4760-aab3-f4368d8176b9]ᐧ On Fri,

Re: Spark + Kinesis

2015-04-03 Thread Kelly, Jonathan
quot;consumer-assembly.jar" assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala=false) Any help appreciated. Thanks, Vadim [https://mailfoogae.appspot.com/t?sender=admFkaW0uYmljaHV0c2tpeUBnbWFpbC5jb20%3D&type=zerocontent&guid=3d9e0d72-3cbe-4d6f-b262-829b9

Re: Spark + Kinesis

2015-04-02 Thread Kelly, Jonathan
It looks like you're attempting to mix Scala versions, so that's going to cause some problems. If you really want to use Scala 2.11.5, you must also use Spark package versions built for Scala 2.11 rather than 2.10. Anyway, that's not quite the correct way to specify Scala dependencies in build

Re: Spark and OpenJDK - jar: No such file or directory

2015-03-30 Thread Kelly, Jonathan
Ah, never mind, I found the jar command in the java-1.7.0-openjdk-devel package. I only had java-1.7.0-openjdk installed. Looks like I just need to install java-1.7.0-openjdk-devel then set JAVA_HOME to /usr/lib/jvm/java instead of /usr/lib/jvm/jre. ~ Jonathan Kelly From: , Jonathan Kelly ma

Spark and OpenJDK - jar: No such file or directory

2015-03-30 Thread Kelly, Jonathan
I'm trying to use OpenJDK 7 with Spark 1.3.0 and noticed that the compute-classpath.sh script is not adding the datanucleus jars to the classpath because compute-classpath.sh is assuming to find the jar command in $JAVA_HOME/bin/jar, which does not exist for OpenJDK. Is this an issue anybody e

Re: When will 1.3.1 release?

2015-03-30 Thread Kelly, Jonathan
Are you referring to SPARK-6330? If you are able to build Spark from source yourself, I believe you should just need to cherry-pick the following commits in order to backport the fix: 67fa6d1f830dee37244b5a30684d797093c7c134 [SPARK-6330] Fix fil

Using Spark with a SOCKS proxy

2015-03-17 Thread Kelly, Jonathan
I'm trying to figure out how I might be able to use Spark with a SOCKS proxy. That is, my dream is to be able to write code in my IDE then run it without much trouble on a remote cluster, accessible only via a SOCKS proxy between the local development machine and the master node of the cluster

Re: problems with spark-streaming-kinesis-asl and "sbt assembly" ("different file contents found")

2015-03-16 Thread Kelly, Jonathan
"org.apache.spark", name = "spark-streaming") ) (Note that ExclusionRule(organization = "org.apache.spark") without the "name" attribute does not work because that apparently causes it to exclude even spark-streaming-kinesis-asl.) Jonathan Kelly Elastic Map

Re: problems with spark-streaming-kinesis-asl and "sbt assembly" ("different file contents found")

2015-03-16 Thread Kelly, Jonathan
Date: Monday, March 16, 2015 at 12:45 PM To: Jonathan Kelly mailto:jonat...@amazon.com>> Cc: "user@spark.apache.org<mailto:user@spark.apache.org>" mailto:user@spark.apache.org>> Subject: Re: problems with spark-streaming-kinesis-asl and "sbt assembly" ("dif

Re: sqlContext.parquetFile doesn't work with s3n in version 1.3.0

2015-03-16 Thread Kelly, Jonathan
See https://issues.apache.org/jira/browse/SPARK-6351 ~ Jonathan From: Shuai Zheng mailto:szheng.c...@gmail.com>> Date: Monday, March 16, 2015 at 11:46 AM To: "user@spark.apache.org" mailto:user@spark.apache.org>> Subject: sqlContext.parquetFile doesn't work with s3n

problems with spark-streaming-kinesis-asl and "sbt assembly" ("different file contents found")

2015-03-16 Thread Kelly, Jonathan
I'm attempting to use the Spark Kinesis Connector, so I've added the following dependency in my build.sbt: libraryDependencies += "org.apache.spark" %% "spark-streaming-kinesis-asl" % "1.3.0" My app works fine with "sbt run", but I can't seem to get "sbt assembly" to work without failing with

Re: Spark v1.2.1 failing under BigTop build in External Flume Sink (due to missing Netty library)

2015-03-05 Thread Kelly, Jonathan
er -Pyarn > >- Patrick > >On Thu, Mar 5, 2015 at 12:47 PM, Kelly, Jonathan >wrote: >> I confirmed that this has nothing to do with BigTop by running the same >>mvn >> command directly in a fresh clone of the Spark package at the v1.2.1 >>tag. I >&g

Re: Spark v1.2.1 failing under BigTop build in External Flume Sink (due to missing Netty library)

2015-03-05 Thread Kelly, Jonathan
I confirmed that this has nothing to do with BigTop by running the same mvn command directly in a fresh clone of the Spark package at the v1.2.1 tag. I got the same exact error. Jonathan Kelly Elastic MapReduce - SDE Port 99 (SEA35) 08.220.C2 From: , Jonathan Kelly mailto:jonat...@amazon.com>>

Spark v1.2.1 failing under BigTop build in External Flume Sink (due to missing Netty library)

2015-03-05 Thread Kelly, Jonathan
I'm running into an issue building Spark v1.2.1 (as well as the latest in branch-1.2 and v1.3.0-rc2 and the latest in branch-1.3) with BigTop (v0.9, which is not quite released yet). The build fails in the External Flume Sink subproject with the following error: [INFO] Compiling 5 Scala source

Re: kinesis multiple records adding into stream

2015-01-16 Thread Kelly, Jonathan
Are you referring to the PutRecords method, which was added in 1.9.9? (See http://aws.amazon.com/releasenotes/1369906126177804) If so, can't you just depend upon this later version of the SDK in your app even though spark-streaming-kinesis-asl is depending upon this earlier 1.9.3 version that

Re: Issue with Parquet on Spark 1.2 and Amazon EMR

2015-01-05 Thread Kelly, Jonathan
I've noticed the same thing recently and will contact the appropriate owner soon. (I work for Amazon, so I'll go through internal channels and report back to this list.) In the meantime, I've found that editing spark-env.sh and putting the Spark assembly first in the classpath fixes the issue.

Re: SchemaRDD.saveAsTable() when schema contains arrays and was loaded from a JSON file using schema auto-detection

2014-11-27 Thread Kelly, Jonathan
the notion of "containsNull" for array values. So, for a Hive table, the containsNull will be always true for an array and we should ignore this field for Hive. This issue has been fixed by https://issues.apache.org/jira/browse/SPARK-4245, which will be released with 1.2. Thanks, Yin

Re: SchemaRDD.saveAsTable() when schema contains arrays and was loaded from a JSON file using schema auto-detection

2014-11-26 Thread Kelly, Jonathan
e other way around (which is the case here) should work. ~ Jonathan On 11/26/14, 5:23 PM, "Kelly, Jonathan" wrote: >I've noticed some strange behavior when I try to use >SchemaRDD.saveAsTable() with a SchemaRDD that I¹ve loaded from a JSON file >that contains elements wi

SchemaRDD.saveAsTable() when schema contains arrays and was loaded from a JSON file using schema auto-detection

2014-11-26 Thread Kelly, Jonathan
I've noticed some strange behavior when I try to use SchemaRDD.saveAsTable() with a SchemaRDD that I¹ve loaded from a JSON file that contains elements with nested arrays. For example, with a file test.json that contains the single line: {"values":[1,2,3]} and with code like the following