spark-submit problems with --packages and --deploy-mode cluster

2015-12-11 Thread Greg Hill
I'm using Spark 1.5.0 with the standalone scheduler, and for the life of me I can't figure out why this isn't working. I have an application that works fine with --deploy-mode client that I'm trying to get to run in cluster mode so I can use --supervise. I ran into a few issues with my configu

Re: Can't submit job to stand alone cluster

2015-12-29 Thread Greg Hill
On 12/28/15, 5:16 PM, "Daniel Valdivia" wrote: >Hi, > >I'm trying to submit a job to a small spark cluster running in stand >alone mode, however it seems like the jar file I'm submitting to the >cluster is "not found" by the workers nodes. > >I might have understood wrong, but I though the Driv

Spark on YARN question

2014-09-02 Thread Greg Hill
I'm working on setting up Spark on YARN using the HDP technical preview - http://hortonworks.com/kb/spark-1-0-1-technical-preview-hdp-2-1-3/ I have installed the Spark JARs on all the slave nodes and configured YARN to find the JARs. It seems like everything is working. Unless I'm misunderstan

Re: Spark on YARN question

2014-09-02 Thread Greg Hill
Thanks. That sounds like how I was thinking it worked. I did have to install the JARs on the slave nodes for yarn-cluster mode to work, FWIW. It's probably just whichever node ends up spawning the application master that needs it, but it wasn't passed along from spark-submit. Greg From: And

spark history server trying to hit port 8021

2014-09-03 Thread Greg Hill
My Spark history server won't start because it's trying to hit the namenode on 8021, but the namenode is on 8020 (the default). How can I configure the history server to use the right port? I can't find any relevant setting on the docs: http://people.apache.org/~tdas/spark-1.0.0-rc11-docs/moni

Re: spark history server trying to hit port 8021

2014-09-03 Thread Greg Hill
Nevermind, PEBKAC. I had put in the wrong port in the $LOG_DIR environment variable. Greg From: Greg mailto:greg.h...@rackspace.com>> Date: Wednesday, September 3, 2014 1:56 PM To: "user@spark.apache.org" mailto:user@spark.apache.org>> Subject: spark history serve

Re: pyspark on yarn hdp hortonworks

2014-09-05 Thread Greg Hill
I'm running into a problem getting this working as well. I have spark-submit and spark-shell working fine, but pyspark in interactive mode can't seem to find the lzo jar: java.lang.ClassNotFoundException: Class com.hadoop.compression.lzo.LzoCodec not found This is in /usr/lib/hadoop/lib/hadoo

clarification for some spark on yarn configuration options

2014-09-08 Thread Greg Hill
Is SPARK_EXECUTOR_INSTANCES the total number of workers in the cluster or the workers per slave node? Is spark.executor.instances an actual config option? I found that in a commit, but it's not in the docs. What is the difference between spark.yarn.executor.memoryOverhead and spark.executor.m

spark on yarn history server + hdfs permissions issue

2014-09-09 Thread Greg Hill
I am running Spark on Yarn with the HDP 2.1 technical preview. I'm having issues getting the spark history server permissions to read the spark event logs from hdfs. Both sides are configured to write/read logs from: hdfs:///apps/spark/events The history server is running as user spark, the j

Re: spark on yarn history server + hdfs permissions issue

2014-09-11 Thread Greg Hill
To answer my own question, in case someone else runs into this. The spark user needs to be in the same group on the namenode, and hdfs caches that information for it seems like at least an hour. Magically started working on its own. Greg From: Greg mailto:greg.h...@rackspace.com>> Date: Tuesd

Re: clarification for some spark on yarn configuration options

2014-09-22 Thread Greg Hill
bmit. If you are using Spark 1.1+, you may set "spark.driver.extraClassPath" in your spark-defaults.conf. There is also an environment variable you could set (SPARK_CLASSPATH), though this is now deprecated. Let me know if you have more questions about these options, -Andrew 2014-09-

Re: clarification for some spark on yarn configuration options

2014-09-22 Thread Greg Hill
k.driver.extraClassPath" in your spark-defaults.conf. There is also an environment variable you could set (SPARK_CLASSPATH), though this is now deprecated. Let me know if you have more questions about these options, -Andrew 2014-09-08 6:59 GMT-07:00 Greg Hill mailto:greg.h...@rackspace.com

Re: clarification for some spark on yarn configuration options

2014-09-22 Thread Greg Hill
ser@spark.apache.org>> Subject: Re: clarification for some spark on yarn configuration options Greg, if you look carefully, the code is enforcing that the memoryOverhead be lower (and not higher) than spark.driver.memory. Thanks, Nishkam On Mon, Sep 22, 2014 at 1:26 PM, Greg Hill mailto

Re: clarification for some spark on yarn configuration options

2014-09-23 Thread Greg Hill
Nishkam Ravi mailto:nr...@cloudera.com>>: Maybe try --driver-memory if you are using spark-submit? Thanks, Nishkam On Mon, Sep 22, 2014 at 1:41 PM, Greg Hill mailto:greg.h...@rackspace.com>> wrote: Ah, I see. It turns out that my problem is that that comparison is ignoring SPARK

recommended values for spark driver memory?

2014-09-23 Thread Greg Hill
I know the recommendation is "it depends", but can people share what sort of memory allocations they're using for their driver processes? I'd like to get an idea of what the range looks like so we can provide sensible defaults without necessarily knowing what the jobs will look like. The custo

Re: Spark with YARN

2014-09-24 Thread Greg Hill
Do you have YARN_CONF_DIR set in your environment to point Spark to where your yarn configs are? Greg From: Raghuveer Chanda mailto:raghuveer.cha...@gmail.com>> Date: Wednesday, September 24, 2014 12:25 PM To: "u...@spark.incubator.apache.org" mailto:u..

weird YARN errors on new Spark on Yarn cluster

2014-10-02 Thread Greg Hill
I haven't run into this until today. I spun up a fresh cluster to do some more testing, and it seems that every single executor fails because it can't connect to the driver. This is in the YARN logs: 14/10/02 16:24:11 INFO executor.CoarseGrainedExecutorBackend: Connecting to driver: akka.tcp:

Re: weird YARN errors on new Spark on Yarn cluster

2014-10-02 Thread Greg Hill
I or through: yarn logs -applicationId If an AM throws an exception then the executors may not be started properly. -Andrew 2014-10-02 9:47 GMT-07:00 Greg Hill mailto:greg.h...@rackspace.com>>: I haven't run into this until today. I spun up a fresh cluster to do some more test

Spark on YARN driver memory allocation bug?

2014-10-08 Thread Greg Hill
So, I think this is a bug, but I wanted to get some feedback before I reported it as such. On Spark on YARN, 1.1.0, if you specify the --driver-memory value to be higher than the memory available on the client machine, Spark errors out due to failing to allocate enough memory. This happens eve

Re: Spark on YARN driver memory allocation bug?

2014-10-09 Thread Greg Hill
memory allocation bug? Hi Greg, It does seem like a bug. What is the particular exception message that you see? Andrew 2014-10-08 12:12 GMT-07:00 Greg Hill mailto:greg.h...@rackspace.com>>: So, I think this is a bug, but I wanted to get some feedback before I reported it as such. On Sp

SPARK_SUBMIT_CLASSPATH question

2014-10-14 Thread Greg Hill
It seems to me that SPARK_SUBMIT_CLASSPATH does not follow the same ability as other tools to put wildcards in the paths you add. For some reason it doesn't pick up the classpath information from yarn-site.xml either, it seems, when running on YARN. I'm having to manually add every single depe

Re: SPARK_SUBMIT_CLASSPATH question

2014-10-15 Thread Greg Hill
I guess I was a little light on the details in my haste. I'm using Spark on YARN, and this is in the driver process in yarn-client mode (most notably spark-shell). I've had to manually add a bunch of JARs that I had thought it would just pick up like everything else does: export SPARK_SUBMIT