Distance Calculation in Spark K means clustering

2015-08-30 Thread ashensw
Hi all, I am currently working on some K means clustering project. I want to get the distances of each data point to it's cluster center after building the K means model. Currently I get the cluster centers of each data point by sending the JavaRDD which includes all the data points to K means pre

Re: Help Explain Tasks in WebUI:4040

2015-08-30 Thread Akhil Das
Are you doing a join/groupBy such operation? In that case i would suspect that the keys are not evenly distributed and that's why few of the tasks are spending way too much time doing the actual processing. You might want to look into custom partitioners

Slow Mongo Read from Spark

2015-08-30 Thread Deepesh Maheshwari
Hi, I am trying to read mongodb in Spark newAPIHadoopRDD. / Code */ config.set("mongo.job.input.format", "com.mongodb.hadoop.MongoInputFormat"); config.set("mongo.input.uri",SparkProperties.MONGO_OUTPUT_URI); config.set("mongo.input.query","{host: 'abc.com'}"); JavaSparkContext sc=new Ja

Re: spark-submit issue

2015-08-30 Thread Akhil Das
You can also add a System.exit(0) after the sc.stop. On 30 Aug 2015 23:55, "Pranay Tonpay" wrote: > yes, the context is being closed at the end. > -- > *From:* Akhil Das > *Sent:* Sunday, August 30, 2015 9:03 AM > *To:* Pranay Tonpay > *Cc:* user@spark.apache.org > *S

Re: Unable to build Spark 1.5, is build broken or can anyone successfully build?

2015-08-30 Thread Kevin Jung
I expect it because the versions are not in the range defined in pom.xml. You should upgrade your maven version to 3.3.3 and JDK to 1.7. Spark team already knows this issue so you can get some information on community board of developers. Kevin -- View this message in context: http://apache-sp

Spark SQL vs Spark Programming

2015-08-30 Thread satish chandra j
HI All, As a developer I understand certain scenario's can be achieved by Spark SQL and Spark Programming(RDD transformation). More over I need to consider the below points: Performance Implementation approach Specific use cases suitable for each of the approach Could you

Re: submit_spark_job_to_YARN

2015-08-30 Thread Ajay Chander
Thanks everyone for your valuable time and information. It was helpful. On Sunday, August 30, 2015, Ted Yu wrote: > This is related: > SPARK-10288 Add a rest client for Spark on Yarn > > FYI > > On Sun, Aug 30, 2015 at 12:12 PM, Dawid Wysakowicz < > wysakowicz.da...@gmail.com > > wrote: > >> Hi

Re: submit_spark_job_to_YARN

2015-08-30 Thread Ted Yu
This is related: SPARK-10288 Add a rest client for Spark on Yarn FYI On Sun, Aug 30, 2015 at 12:12 PM, Dawid Wysakowicz < wysakowicz.da...@gmail.com> wrote: > Hi Ajay, > > In short story: No, there is no easy way to do that. But if you'd like to > play around this topic a good starting point wou

RE: Re: Job aborted due to stage failure: java.lang.StringIndexOutOfBoundsException: String index out of range: 18

2015-08-30 Thread Cheng, Hao
Hi, can you try something like: val rowRDD=sc.textFile("/user/spark/short_model").map{ line => val p = line.split("\\t") if (p.length >=72) { Row(p(0), p(1)…) } else { throw new RuntimeException(s“failed in parsing $line”) } } From the log “java.lang.ArrayIndexOutOfBoundsE

Re: spark-submit issue

2015-08-30 Thread Ted Yu
Pranay: Please take a look at the Redirector class inside: ./launcher/src/test/java/org/apache/spark/launcher/SparkLauncherSuite.java Cheers On Sun, Aug 30, 2015 at 11:25 AM, Pranay Tonpay wrote: > yes, the context is being closed at the end. > -- > *From:* Akhil Da

Re: submit_spark_job_to_YARN

2015-08-30 Thread Dawid Wysakowicz
Hi Ajay, In short story: No, there is no easy way to do that. But if you'd like to play around this topic a good starting point would be this blog post from sequenceIQ: blog . I heard rumors that there are some work going on to pre

Re: submit_spark_job_to_YARN

2015-08-30 Thread Ajay Chander
Hi David, Thanks for responding! My main intention was to submit spark Job/jar to yarn cluster from my eclipse with in the code. Is there any way that I could pass my yarn configuration somewhere in the code to submit the jar to the cluster? Thank you, Ajay On Sunday, August 30, 2015, David Mitc

Re: Spark shell and StackOverFlowError

2015-08-30 Thread Sean Owen
I'm not sure how to reproduce it? this code does not produce an error in master. On Sun, Aug 30, 2015 at 7:26 PM, Ashish Shrowty wrote: > Do you think I should create a JIRA? > > > On Sun, Aug 30, 2015 at 12:56 PM Ted Yu wrote: >> >> I got StackOverFlowError as well :-( >> >> On Sun, Aug 30, 201

Re: Spark shell and StackOverFlowError

2015-08-30 Thread Ashish Shrowty
Do you think I should create a JIRA? On Sun, Aug 30, 2015 at 12:56 PM Ted Yu wrote: > I got StackOverFlowError as well :-( > > On Sun, Aug 30, 2015 at 9:47 AM, Ashish Shrowty > wrote: > >> Yep .. I tried that too earlier. Doesn't make a difference. Are you able >> to replicate on your side? >>

Re: spark-submit issue

2015-08-30 Thread Pranay Tonpay
yes, the context is being closed at the end. From: Akhil Das Sent: Sunday, August 30, 2015 9:03 AM To: Pranay Tonpay Cc: user@spark.apache.org Subject: Re: spark-submit issue Did you try putting a sc.stop at the end of your pipeline? Thanks Best Regards On Thu,

Re: submit_spark_job_to_YARN

2015-08-30 Thread David Mitchell
Hi Ajay, Are you trying to save to your local file system or to HDFS? // This would save to HDFS under "/user/hadoop/counter" counter.saveAsTextFile("/user/hadoop/counter"); David On Sun, Aug 30, 2015 at 11:21 AM, Ajay Chander wrote: > Hi Everyone, > > Recently we have installed spark on yar

Re: Spark shell and StackOverFlowError

2015-08-30 Thread Ashish Shrowty
Yep .. I tried that too earlier. Doesn't make a difference. Are you able to replicate on your side? On Sun, Aug 30, 2015 at 12:08 PM Ted Yu wrote: > I see. > > What about using the following in place of variable a ? > > http://spark.apache.org/docs/latest/programming-guide.html#broadcast-variab

Re: Apache Spark Suitable JDBC Driver not found

2015-08-30 Thread shawon
Could you please elaborate ? Spark Classpath in Spark.env.sonf file ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Apache-Spark-Suitable-JDBC-Driver-not-found-tp24505p24511.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --

Re: Spark shell and StackOverFlowError

2015-08-30 Thread Ted Yu
I see. What about using the following in place of variable a ? http://spark.apache.org/docs/latest/programming-guide.html#broadcast-variables Cheers On Sun, Aug 30, 2015 at 8:54 AM, Ashish Shrowty wrote: > @Sean - Agree that there is no action, but I still get the > stackoverflowerror, its ver

submit_spark_job_to_YARN

2015-08-30 Thread Ajay Chander
Hi Everyone, Recently we have installed spark on yarn in hortonworks cluster. Now I am trying to run a wordcount program in my eclipse and I did setMaster("local") and I see the results that's as expected. Now I want to submit the same job to my yarn cluster from my eclipse. In storm basically I w

Re: Spark Version upgrade isue:Exception in thread "main" java.lang.NoSuchMethodError

2015-08-30 Thread Ted Yu
Manohar: See if adding the following dependency to your project helps: +com.fasterxml.jackson.core +jackson-databind +${fasterxml.jackson.version} + + +com.fasterxml.jackson.module +jackson-module-scala_2.10 +${fasterxml.jackson.v

Re: Spark shell and StackOverFlowError

2015-08-30 Thread Ted Yu
Using Spark shell : scala> import scala.collection.mutable.MutableList import scala.collection.mutable.MutableList scala> val lst = MutableList[(String,String,Double)]() lst: scala.collection.mutable.MutableList[(String, String, Double)] = MutableList() scala> Range(0,1).foreach(i=>lst+=(("1

Re: Spark Python with SequenceFile containing numpy deserialized data in str form

2015-08-30 Thread Peter Aberline
Hi, I saw the posting about storing NumPy values in sequence files: http://mail-archives.us.apache.org/mod_mbox/spark-user/201506.mbox/%3cCAJQK-mg1PUCc_hkV=q3n-01ioq_pkwe1g-c39ximco3khqn...@mail.gmail.com%3e I’ve had a go at implementing this, and issued a PR request at https://github.com/apach

Re: Calculating Min and Max Values using Spark Transformations?

2015-08-30 Thread Ashen Weerathunga
Thanks everyone for the help! On Sat, Aug 29, 2015 at 2:55 AM, Alexey Grishchenko wrote: > If the data is already in RDD, the easiest way to calculate min/max for > each column would be an aggregate() function. It takes 2 functions as > arguments - first is used to aggregate RDD values to your

Re: Spark shell and StackOverFlowError

2015-08-30 Thread Sean Owen
That can't cause any error, since there is no action in your first snippet. Even calling count on the result doesn't cause an error. You must be executing something different. On Sun, Aug 30, 2015 at 4:21 AM, ashrowty wrote: > I am running the Spark shell (1.2.1) in local mode and I have a simple