Standalone Spark, How to find (driver's ) final status for an application

2019-09-25 Thread Nilkanth Patel
I am setting up *Spark 2.2.0 in standalone mode* ( https://spark.apache.org/docs/latest/spark-standalone.html) and submitting spark jobs programatically using SparkLauncher sparkAppLauncher = new SparkLauncher(userNameMap).setMaster(sparkMaster).setAppName(appName).; SparkAppHandle sparkAppH

Re: Question about standalone Spark cluster reading from Kerberosed hadoop

2017-06-23 Thread Mu Kong
t; > see also: > https://www.gitbook.com/book/steveloughran/kerberos_and_hadoop/details > > On Fri, Jun 23, 2017 at 5:10 PM, Mu Kong wrote: > >> Hi, all! >> >> I was trying to read from a Kerberosed hadoop cluster from a standalone >> spark cluster. >>

Re: Question about standalone Spark cluster reading from Kerberosed hadoop

2017-06-23 Thread Steve Loughran
a Kerberosed hadoop cluster from a standalone spark cluster. Right now, I encountered some authentication issues with Kerberos: java.io.IOException: Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBERO

Re: Question about standalone Spark cluster reading from Kerberosed hadoop

2017-06-23 Thread Saisai Shao
Jerry On Fri, Jun 23, 2017 at 5:10 PM, Mu Kong wrote: > Hi, all! > > I was trying to read from a Kerberosed hadoop cluster from a standalone > spark cluster. > Right now, I encountered some authentication issues with Kerberos: > > > java.io.IOException: Failed on lo

Question about standalone Spark cluster reading from Kerberosed hadoop

2017-06-23 Thread Mu Kong
Hi, all! I was trying to read from a Kerberosed hadoop cluster from a standalone spark cluster. Right now, I encountered some authentication issues with Kerberos: java.io.IOException: Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client

Native libraries using only one core in standalone spark cluster

2016-09-26 Thread guangweiyu
Hi, I'm trying to run a spark job that uses multiple cpu cores per spark executor in a spark job. Specifically, it runs the gemm matrix multiply routine from each partition on a large matrix that cannot be distributed. For test purpose, I have a machine with 8 cores running standalone spa

Re: Building standalone spark application via sbt

2016-07-20 Thread Sachin Mittal
appen often) > > Btw did you get the NoClassDefFoundException at compile time or run > time?if at run time, what is your Spark Version and what is the spark > libraries version you used in your sbt? > are you using a Spark version pre 1.4? > > kr > marco > > > > >

Re: Building standalone spark application via sbt

2016-07-20 Thread Marco Mistroni
Spark version pre 1.4? kr marco On Wed, Jul 20, 2016 at 6:13 PM, Sachin Mittal wrote: > NoClassDefFound error was for spark classes like say SparkConext. > When running a standalone spark application I was not passing external > jars using --jars option. > > However I have fi

Re: Building standalone spark application via sbt

2016-07-20 Thread Sachin Mittal
NoClassDefFound error was for spark classes like say SparkConext. When running a standalone spark application I was not passing external jars using --jars option. However I have fixed this by making a fat jar using sbt assembly plugin. Now all the dependencies are included in that jar and I use

Re: Building standalone spark application via sbt

2016-07-20 Thread Marco Mistroni
Hello Sachin pls paste the NoClassDefFound Exception so we can see what's failing, aslo please advise how are you running your Spark App For an extremely simple case, let's assume you have your MyFirstSparkApp packaged in your myFirstSparkApp.jar Then all you need to do would be to kick off

Re: Building standalone spark application via sbt

2016-07-20 Thread Mich Talebzadeh
you need an uber jar file. Have you actually followed the dependencies and project sub-directory build? check this. http://stackoverflow.com/questions/28459333/how-to-build-an-uber-jar-fat-jar-using-sbt-within-intellij-idea under three answers the top one. I started reading the official SBT tu

Re: Building standalone spark application via sbt

2016-07-20 Thread Sachin Mittal
Hi, I am following the example under https://spark.apache.org/docs/latest/quick-start.html For standalone scala application. I added all my dependencies via build.sbt (one dependency is under lib folder). When I run sbt package I see the jar created under target/scala-2.10/ So compile seems to b

Re: Building standalone spark application via sbt

2016-07-19 Thread Andrew Ehrlich
Yes, spark-core will depend on Hadoop and several other jars. Here’s the list of dependencies: https://github.com/apache/spark/blob/master/core/pom.xml#L35 Whether you need spark-sql depends on whether you will use the DataFrame API

Building standalone spark application via sbt

2016-07-19 Thread Sachin Mittal
Hi, Can someone please guide me what all jars I need to place in my lib folder of the project to build a standalone scala application via sbt. Note I need to provide static dependencies and I cannot download the jars using libraryDependencies. So I need to provide all the jars upfront. So far I f

Re: [ Standalone Spark Cluster ] - Track node status

2016-06-09 Thread Mich Talebzadeh
>> >> >> >> LinkedIn * >> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >> >> >> >> http://talebzadehmich.wordpress.com >> >> >

Re: [ Standalone Spark Cluster ] - Track node status

2016-06-09 Thread Rutuja Kulkarni
6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > > On 9 June 2016 at 01:27, Rutuja Kulkarni > wrote: > >> Thank you for the quick response. >> So the workers section

Re: [ Standalone Spark Cluster ] - Track node status

2016-06-08 Thread Mich Talebzadeh
Rutuja Kulkarni wrote: > Thank you for the quick response. > So the workers section would list all the running worker nodes in the > standalone Spark cluster? > I was also wondering if this is the only way to retrieve worker nodes or > is there something like a Web API or CLI I could use?

Re: [ Standalone Spark Cluster ] - Track node status

2016-06-08 Thread Rutuja Kulkarni
Thank you for the quick response. So the workers section would list all the running worker nodes in the standalone Spark cluster? I was also wondering if this is the only way to retrieve worker nodes or is there something like a Web API or CLI I could use? Thanks. Regards, Rutuja On Wed, Jun 8

Re: [ Standalone Spark Cluster ] - Track node status

2016-06-08 Thread Mich Talebzadeh
8Pw>* http://talebzadehmich.wordpress.com On 8 June 2016 at 23:56, Rutuja Kulkarni wrote: > Hello! > > I'm trying to setup a standalone spark cluster and wondering how to track > status of all of it's nodes. I wonder if something like Yarn REST API or > HDFS CLI exists

[ Standalone Spark Cluster ] - Track node status

2016-06-08 Thread Rutuja Kulkarni
Hello! I'm trying to setup a standalone spark cluster and wondering how to track status of all of it's nodes. I wonder if something like Yarn REST API or HDFS CLI exists in Spark world that can provide status of nodes on such a cluster. Any pointers would be greatly appreciated. --

python application cluster mode in standalone spark cluster

2016-05-25 Thread Jan Sourek
A the official documentation states 'Currently only YARN supports cluster mode for Python applications.' I would like to know if work is being done or planned to support cluster mode for Python applications on standalone spark clusters? -- View this message in context: http://ap

Re: YARN vs Standalone Spark Usage in production

2016-04-14 Thread Takeshi Yamamuro
er Pivovarov >>> wrote: >>> > AWS EMR includes Spark on Yarn >>> > Hortonworks and Cloudera platforms include Spark on Yarn as well >>> > >>> > >>> > On Thu, Apr 14, 2016 at 7:29 AM, Arkadiusz Bicz < >

Re: YARN vs Standalone Spark Usage in production

2016-04-14 Thread Mark Hamstra
s Spark on Yarn >> > Hortonworks and Cloudera platforms include Spark on Yarn as well >> > >> > >> > On Thu, Apr 14, 2016 at 7:29 AM, Arkadiusz Bicz < >> arkadiusz.b...@gmail.com> >> > wrote: >> >> >> >>

Re: YARN vs Standalone Spark Usage in production

2016-04-14 Thread Alexander Pivovarov
@gmail.com> > > wrote: > >> > >> Hello, > >> > >> Is there any statistics regarding YARN vs Standalone Spark Usage in > >> production ? > >> > >> I would like to choose most supported and used technology in > >> production fo

Re: YARN vs Standalone Spark Usage in production

2016-04-14 Thread Sean Owen
n > Hortonworks and Cloudera platforms include Spark on Yarn as well > > > On Thu, Apr 14, 2016 at 7:29 AM, Arkadiusz Bicz > wrote: >> >> Hello, >> >> Is there any statistics regarding YARN vs Standalone Spark Usage in >> production ? >> >&g

Re: YARN vs Standalone Spark Usage in production

2016-04-14 Thread Mich Talebzadeh
lo, >> >> Is there any statistics regarding YARN vs Standalone Spark Usage in >> production ? >> >> I would like to choose most supported and used technology in >> production for our project. >> >> >> BR, >> >> Arkadiusz Bicz >>

Re: YARN vs Standalone Spark Usage in production

2016-04-14 Thread Alexander Pivovarov
AWS EMR includes Spark on Yarn Hortonworks and Cloudera platforms include Spark on Yarn as well On Thu, Apr 14, 2016 at 7:29 AM, Arkadiusz Bicz wrote: > Hello, > > Is there any statistics regarding YARN vs Standalone Spark Usage in > production ? > > I would like to choose

YARN vs Standalone Spark Usage in production

2016-04-14 Thread Arkadiusz Bicz
Hello, Is there any statistics regarding YARN vs Standalone Spark Usage in production ? I would like to choose most supported and used technology in production for our project. BR, Arkadiusz Bicz - To unsubscribe, e-mail

Re: Spark jobs run extremely slow on yarn cluster compared to standalone spark

2016-02-14 Thread Yuval.Itzchakov
-run-extremely-slow-on-yarn-cluster-compared-to-standalone-spark-tp26215p26221.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional

Spark jobs run extremely slow on yarn cluster compared to standalone spark

2016-02-12 Thread pdesai
Hi there, I am doing a POC with Spark and I have noticed that if I run my job on standalone spark installation, it finishes in a second(It's a small sample job). But when I run same job on spark cluster with Yarn, it takes 4-5 min in simple execution. Are there any best practices that I ne

Re: Cannot connect to standalone spark cluster

2015-10-14 Thread Akhil Das
27;m trying to run a java application that connects to a local standalone > spark cluster. I start the cluster with the default configuration, using > start-all.sh. When I go to the web page for the cluster, it is started ok. > I can connect to this cluster with SparkR, but when I use the

Cannot connect to standalone spark cluster

2015-10-09 Thread ekraffmiller
Hi, I'm trying to run a java application that connects to a local standalone spark cluster. I start the cluster with the default configuration, using start-all.sh. When I go to the web page for the cluster, it is started ok. I can connect to this cluster with SparkR, but when I use the

Re: Convert Simple Kafka Consumer to standalone Spark JavaStream Consumer

2015-07-21 Thread Tathagata Das
LConsumer simpleHLConsumer = new > SimpleHLConsumer("localhost:2181", "testgroup", topic); > simpleHLConsumer.testConsumer(); >} > > } > > I want to get my messages through Spark Java Streaming with Kaf

Convert Simple Kafka Consumer to standalone Spark JavaStream Consumer

2015-07-21 Thread Hafsa Asif
); } } I want to get my messages through Spark Java Streaming with Kafka integration. Can anyone help me to reform this code so that I can get same output with Spark Kafka integration. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble

Re: When to use underlying data management layer versus standalone Spark?

2015-06-24 Thread Sandy Ryza
> On Jun 24, 2015 1:17 AM, "commtech" wrote: > >> Hi, >> >> I work at a large financial institution in New York. We're looking into >> Spark and trying to learn more about the deployment/use cases for >> real-time >> analytics with Spark. W

Re: When to use underlying data management layer versus standalone Spark?

2015-06-23 Thread Sonal Goyal
case. On Jun 24, 2015 1:17 AM, "commtech" wrote: > Hi, > > I work at a large financial institution in New York. We're looking into > Spark and trying to learn more about the deployment/use cases for real-time > analytics with Spark. When would it be better to deploy s

Re: When to use underlying data management layer versus standalone Spark?

2015-06-23 Thread canan chen
> > I work at a large financial institution in New York. We're looking into > Spark and trying to learn more about the deployment/use cases for real-time > analytics with Spark. When would it be better to deploy standalone Spark > versus Spark on top of a more comprehensive

When to use underlying data management layer versus standalone Spark?

2015-06-23 Thread commtech
Hi, I work at a large financial institution in New York. We're looking into Spark and trying to learn more about the deployment/use cases for real-time analytics with Spark. When would it be better to deploy standalone Spark versus Spark on top of a more comprehensive data management

How to start Thrift JDBC server as part of standalone spark application?

2015-04-23 Thread Vladimir Grigor
Hello, I would like to export RDD/DataFrames via JDBC SQL interface from the standalone application for currently stable Spark v1.3.1. I found one way of doing it but it requires the use of @DeveloperAPI method HiveThriftServer2.startWithContext(sqlContext) Is there a better, production level ap

distcp problems on ec2 standalone spark cluster

2015-03-09 Thread roni
I got pass the issues with the cluster not started problem by adding Yarn to mapreduce.framework.name . But when I try to to distcp , if I use uRI with s3://path to my bucket .. I get invalid path even though the bucket exists. If I use s3n:// it just hangs. Did anyone else face anything like that

Re: distcp on ec2 standalone spark cluster

2015-03-08 Thread Akhil Das
roblem > I am having problem where distcp with s3 URI says incorrect forlder path > and > s3n:// hangs. > stuck for 2 days :( > Thanks > -R > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/distcp-on-ec2-standalone-sp

Re: distcp on ec2 standalone spark cluster

2015-03-07 Thread roni
/distcp-on-ec2-standalone-spark-cluster-tp13652p21957.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h

Re: Standalone spark

2015-02-25 Thread boci
Thanks dude... I think I will pull up a docker container for integration test -- Skype: boci13, Hangout: boci.b...@gmail.com On Thu, Feb 26, 2015 at 12:22 AM, Sean Owen

Re: Standalone spark

2015-02-25 Thread Sean Owen
Yes, been on the books for a while ... https://issues.apache.org/jira/browse/SPARK-2356 That one just may always be a known 'gotcha' in Windows; it's kind of a Hadoop gotcha. I don't know that Spark 100% works on Windows and it isn't tested on Windows. On Wed, Feb 25, 2015 at 11:05 PM, boci wrote

Re: Standalone spark

2015-02-25 Thread boci
Thanks your fast answer... in windows it's not working, because hadoop (surprise suprise) need winutils.exe. Without this it's not working, but if you not set the hadoop directory You simply get 15/02/26 00:03:16 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path java.io.I

Re: Standalone spark

2015-02-25 Thread Sean Owen
Spark and Hadoop should be listed as 'provided' dependency in your Maven or SBT build. But that should make it available at compile time. On Wed, Feb 25, 2015 at 10:42 PM, boci wrote: > Hi, > > I have a little question. I want to develop a spark based application, but > spark depend to hadoop-cli

Standalone spark

2015-02-25 Thread boci
Hi, I have a little question. I want to develop a spark based application, but spark depend to hadoop-client library. I think it's not necessary (spark standalone) so I excluded from sbt file.. the result is interesting. My trait where I create the spark context not compiled. The error: ... scal

Re: Whether standalone spark support kerberos?

2015-02-05 Thread Kostas Sakellis
: > >> We have a standalone spark cluster for kerberos test. >> >> But when reading from hdfs, i get error output: Can't get Master Kerberos >> principal for use as renewer. >> >> So Whether standalone spark support kerberos? can anyone con

Re: Whether standalone spark support kerberos?

2015-02-04 Thread Jander g
Hope someone helps me. Thanks. On Wed, Feb 4, 2015 at 6:14 PM, Jander g wrote: > We have a standalone spark cluster for kerberos test. > > But when reading from hdfs, i get error output: Can't get Master Kerberos > principal for use as renewer. > > So Whether standalone

Whether standalone spark support kerberos?

2015-02-04 Thread Jander g
We have a standalone spark cluster for kerberos test. But when reading from hdfs, i get error output: Can't get Master Kerberos principal for use as renewer. So Whether standalone spark support kerberos? can anyone confirm it? or what i missed? Thanks in advance. -- Thanks, Jander

Re: Standalone Spark program

2014-12-18 Thread Andrew Or
Hey Akshat, What is the class that is not found, is it a Spark class or classes that you define in your own application? If the latter, then Akhil's solution should work (alternatively you can also pass the jar through the --jars command line option in spark-submit). If it's a Spark class, howeve

Re: Standalone Spark program

2014-12-18 Thread Akhil Das
You can build a jar of your project and add it to the sparkContext (sc.addJar("/path/to/your/project.jar")) then it will get shipped to the worker and hence no classNotfoundException! Thanks Best Regards On Thu, Dec 18, 2014 at 10:06 PM, Akshat Aranya wrote: > > Hi, > > I am building a Spark-bas

Standalone Spark program

2014-12-18 Thread Akshat Aranya
Hi, I am building a Spark-based service which requires initialization of a SparkContext in a main(): def main(args: Array[String]) { val conf = new SparkConf(false) .setMaster("spark://foo.example.com:7077") .setAppName("foobar") val sc = new SparkContext(conf) val rdd =

Re: Standalone spark cluster. Can't submit job programmatically -> java.io.InvalidClassException

2014-12-11 Thread sivarani
.nabble.com/Standalone-spark-cluster-Can-t-submit-job-programmatically-java-io-InvalidClassException-tp13456p20624.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr

Re: distcp on ec2 standalone spark cluster

2014-09-08 Thread Ye Xianjin
well, this means you didn't start a compute cluster. Most likely because the wrong value of mapreduce.jobtracker.address cause the slave node cannot start the node manager. ( I am not familiar with the ec2 script, so I don't know whether the slave node has node manager installed or not.) Can yo

Re: distcp on ec2 standalone spark cluster

2014-09-08 Thread Tomer Benyamini
No tasktracker or nodemanager. This is what I see: On the master: org.apache.hadoop.yarn.server.resourcemanager.ResourceManager org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode org.apache.hadoop.hdfs.server.namenode.NameNode On the data node (slave): org.apache.hadoop.hdfs.server.datano

Re: distcp on ec2 standalone spark cluster

2014-09-08 Thread Ye Xianjin
what did you see in the log? was there anything related to mapreduce? can you log into your hdfs (data) node, use jps to list all java process and confirm whether there is a tasktracker process (or nodemanager) running with datanode process -- Ye Xianjin Sent with Sparrow (http://www.sparrowma

Re: distcp on ec2 standalone spark cluster

2014-09-08 Thread Tomer Benyamini
Still no luck, even when running stop-all.sh followed by start-all.sh. On Mon, Sep 8, 2014 at 5:57 PM, Nicholas Chammas wrote: > Tomer, > > Did you try start-all.sh? It worked for me the last time I tried using > distcp, and it worked for this guy too. > > Nick > > > On Mon, Sep 8, 2014 at 3:28 A

Re: distcp on ec2 standalone spark cluster

2014-09-08 Thread Nicholas Chammas
Tomer, Did you try start-all.sh? It worked for me the last time I tried using distcp, and it worked for this guy too . Nick ​ On Mon, Sep 8, 2014 at 3:28 AM, Tomer Benyamini wrote: > ~/ephemeral-hdfs/sbin/start-mapred.sh does not exist on spark-1.0.2

Re: Standalone spark cluster. Can't submit job programmatically -> java.io.InvalidClassException

2014-09-08 Thread DrKhu
cation that was 2.4. When I changed the version of hadoop client to 1.2.1 in my app, I'm able to execute spark code on cluster. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Standalone-spark-cluster-Can-t-submit-job-programmatically-java-io-I

Re: distcp on ec2 standalone spark cluster

2014-09-08 Thread Frank Austin Nothaft
Tomer, To use distcp, you need to have a Hadoop compute cluster up. start-dfs just restarts HDFS. I don’t have a Spark 1.0.2 cluster up right now, but there should be a start-mapred*.sh or start-all.sh script that will launch the Hadoop MapReduce cluster that you will need for distcp. Regards,

Re: distcp on ec2 standalone spark cluster

2014-09-08 Thread Tomer Benyamini
~/ephemeral-hdfs/sbin/start-mapred.sh does not exist on spark-1.0.2; I restarted hdfs using ~/ephemeral-hdfs/sbin/stop-dfs.sh and ~/ephemeral-hdfs/sbin/start-dfs.sh, but still getting the same error when trying to run distcp: ERROR tools.DistCp (DistCp.java:run(126)) - Exception encountered java

Re: distcp on ec2 standalone spark cluster

2014-09-07 Thread Josh Rosen
If I recall, you should be able to start Hadoop MapReduce using ~/ephemeral-hdfs/sbin/start-mapred.sh. On Sun, Sep 7, 2014 at 6:42 AM, Tomer Benyamini wrote: > Hi, > > I would like to copy log files from s3 to the cluster's > ephemeral-hdfs. I tried to use distcp, but I guess mapred is not > run

Re: distcp on ec2 standalone spark cluster

2014-09-07 Thread Nicholas Chammas
I think you need to run start-all.sh or something similar on the EC2 cluster. MR is installed but is not running by default on EC2 clusters spun up by spark-ec2. ​ On Sun, Sep 7, 2014 at 12:33 PM, Tomer Benyamini wrote: > I've installed a spark standalone cluster on ec2 as defined here - > https

Re: distcp on ec2 standalone spark cluster

2014-09-07 Thread Tomer Benyamini
I've installed a spark standalone cluster on ec2 as defined here - https://spark.apache.org/docs/latest/ec2-scripts.html. I'm not sure if mr1/2 is part of this installation. On Sun, Sep 7, 2014 at 7:25 PM, Ye Xianjin wrote: > Distcp requires a mr1(or mr2) cluster to start. Do you have a mapreduc

Re: distcp on ec2 standalone spark cluster

2014-09-07 Thread Ye Xianjin
Distcp requires a mr1(or mr2) cluster to start. Do you have a mapreduce cluster on your hdfs? And from the error message, it seems that you didn't specify your jobtracker address. -- Ye Xianjin Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Sunday, September 7, 2014 at 9:42 PM, T

distcp on ec2 standalone spark cluster

2014-09-07 Thread Tomer Benyamini
Hi, I would like to copy log files from s3 to the cluster's ephemeral-hdfs. I tried to use distcp, but I guess mapred is not running on the cluster - I'm getting the exception below. Is there a way to activate it, or is there a spark alternative to distcp? Thanks, Tomer mapreduce.Cluster (Clust

Re: Adding quota to the ephemeral hdfs on a standalone spark cluster on ec2

2014-09-07 Thread Tomer Benyamini
Thanks! I found the hdfs ui via this port - http://[master-ip]:50070/. It shows 1 node hdfs though, although I have 4 slaves on my cluster. Any idea why? On Sun, Sep 7, 2014 at 4:29 PM, Ognen Duzlevski wrote: > > On 9/7/2014 7:27 AM, Tomer Benyamini wrote: >> >> 2. What should I do to increase th

Re: Adding quota to the ephemeral hdfs on a standalone spark cluster on ec2

2014-09-07 Thread Ognen Duzlevski
On 9/7/2014 7:27 AM, Tomer Benyamini wrote: 2. What should I do to increase the quota? Should I bring down the existing slaves and upgrade to ones with more storage? Is there a way to add disks to existing slaves? I'm using the default m1.large slaves set up using the spark-ec2 script. Take a l

Adding quota to the ephemeral hdfs on a standalone spark cluster on ec2

2014-09-07 Thread Tomer Benyamini
Hi, I would like to make sure I'm not exceeding the quota on the local cluster's hdfs. I have a couple of questions: 1. How do I know the quota? Here's the output of hadoop fs -count -q which essentially does not tell me a lot root@ip-172-31-7-49 ~]$ hadoop fs -count -q / 2147483647 21474

Re: can't submit my application on standalone spark cluster

2014-08-06 Thread Andrew Or
Hi Andres, If you're using the EC2 scripts to start your standalone cluster, you can use "~/spark-ec2/copy-dir --delete ~/spark" to sync your jars across the cluster. Note that you will need to restart the Master and the Workers afterwards through "sbin/start-all.sh" and "sbin/stop-all.sh". If you

Re: can't submit my application on standalone spark cluster

2014-08-06 Thread Akhil Das
Looks like a netty conflict there, most likely you are having mutiple versions of netty jars (eg: netty-3.6.6.Final.jar, netty-3.2.2.Final.jar, netty-all-4.0.13.Final.jar), you only require 3.6.6 i believe. a quick fix would be to remove the rest of them. Thanks Best Regards On Wed, Aug 6, 2014

can't submit my application on standalone spark cluster

2014-08-06 Thread Andres Gomez Ferrer
Hi all, My name is Andres and I'm starting to use Apache Spark. I try to submit my spark.jar to my cluster using this: spark-submit --class "net.redborder.spark.RedBorderApplication" --master spark://pablo02:7077 redborder-spark-selfcontained.jar But when I did it .. My worker die .. and my dr