Issue with high no of skipped task

2015-09-20 Thread Saurav Sinha
Hi Users, I am new Spark I have written flow.When we deployed our code it is completing jobs in 4-5 min. But now it is taking 20+ min in completing with almost same set of data. Can you please help me to figure out reason for it. -- Thanks and Regards, Saurav Sinha Contact: 9742879062 -- Th

Fwd: Issue with high no of skipped task

2015-09-20 Thread Saurav Sinha
Hi Users, I am new Spark I have written flow.When we deployed our code it is completing jobs in 4-5 min. But now it is taking 20+ min in completing with almost same set of data. Can you please help me to figure out reason for it. -- Thanks and Regards, Saurav Sinha Contact: 9742879062 -- T

Re: Problem at sbt/sbt assembly

2015-09-20 Thread Sean Owen
Sbt asked for a bigger initial heap than the host had space for. It is a JVM error you can and should search for first. You will need more memory. On Mon, Sep 21, 2015, 2:11 AM Aaroncq4 <475715...@qq.com> wrote: > When I used “sbt/sbt assembly" to compile spark code of spark-1.5.0,I got a > probl

Re: word count (group by users) in spark

2015-09-20 Thread Aniket Bhatnagar
Unless I am mistaken, in a group by operation, it spills to disk in case values for a key don't fit in memory. Thanks, Aniket On Mon, Sep 21, 2015 at 10:43 AM Huy Banh wrote: > Hi, > > If your input format is user -> comment, then you could: > > val comments = sc.parallelize(List(("u1", "one tw

Re: Class cast exception : Spark 1.5

2015-09-20 Thread sim
You likely need to add the Cassandra connector JAR to spark.jars so it is available to the executors. Hope this helps, Sim -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Class-cast-exception-Spark-1-5-tp24732p24753.html Sent from the Apache Spark User List

Re: question building spark in a virtual machine

2015-09-20 Thread Eyal Altshuler
Anyone? On Sun, Sep 20, 2015 at 7:49 AM, Eyal Altshuler wrote: > I allocated almost 6GB of RAM to the ubuntu virtual machine and got the > same problem. > I will go over this post and try to zoom in into the java vm settings. > > meanwhile - can someone with a working ubuntu machine can specify

What is a taskBinary for a ShuffleMapTask? What is its purpose?

2015-09-20 Thread Muler
Hi, What is the purpose of the taskBinary for a ShuffleMapTask? What does it contain and how is it useful? Is it the representation of all the RDD operations that will be applied for the partition that task will be processing? (in the case below the task will process stage 0, partition 0) If it is

Re: word count (group by users) in spark

2015-09-20 Thread Huy Banh
Hi, If your input format is user -> comment, then you could: val comments = sc.parallelize(List(("u1", "one two one"), ("u2", "three four three"))) val wordCounts = comments. flatMap({case (user, comment) => for (word <- comment.split(" ")) yield(((user, word), 1)) }). reduceByKey(_

Re: Problem at sbt/sbt assembly

2015-09-20 Thread Ted Yu
Have you seen this thread: http://search-hadoop.com/m/q3RTtVJJ3I15OJ251 Cheers On Sun, Sep 20, 2015 at 6:11 PM, Aaroncq4 <475715...@qq.com> wrote: > When I used “sbt/sbt assembly" to compile spark code of spark-1.5.0,I got a > problem and I did not know why.It signs that: > > NOTE: The sbt/sbt s

Problem at sbt/sbt assembly

2015-09-20 Thread Aaroncq4
When I used “sbt/sbt assembly" to compile spark code of spark-1.5.0,I got a problem and I did not know why.It signs that: NOTE: The sbt/sbt script has been relocated to build/sbt. Please update references to point to the new location. Invoking 'build/sbt assembly' now ... Using /usr/

Re: in joins, does one side stream?

2015-09-20 Thread Reynold Xin
We do - but I don't think it is feasible to duplicate every single algorithm in DF and in RDD. The only way for this to work is to make one underlying implementation work for both. Right now DataFrame knows how to serialize individual elements well and can manage memory that way -- the RDD API doe

Re: in joins, does one side stream?

2015-09-20 Thread Koert Kuipers
sorry that was a typo. i meant to say: why do we have these features (broadcast join and sort-merge join) in DataFrame but not in RDD? they don't seem specific to structured data analysis to me. thanks! koert On Sun, Sep 20, 2015 at 2:46 PM, Koert Kuipers wrote: > why dont we want these (broa

Re: in joins, does one side stream?

2015-09-20 Thread Koert Kuipers
why dont we want these (broadcast join and sort-merge join) in DataFrame but not in RDD? they dont seem specific to structured data analysis to me. On Sun, Sep 20, 2015 at 2:41 AM, Rishitesh Mishra wrote: > Got it..thnx Reynold.. > On 20 Sep 2015 07:08, "Reynold Xin" wrote: > >> The RDDs thems

Re: Using Spark for portfolio manager app

2015-09-20 Thread Thúy Hằng Lê
Thanks all, Using external storage seems to be the best solution for now. Btw, have any one heard about following spark streaming module from Intel? https://github.com/Intel-bigdata/spark-streamingsql Seems it allow us to query on Spark stream on the fly, however it haven't updated for 9 months,

Re: PrunedFilteredScan does not work for UDTs and Struct fields

2015-09-20 Thread Richard Eggert
Having to restructure my queries isn't a very satisfactory solution, unfortunately. I did notice that if I implement the CatalystScan interface instead, then the filters DO get passed in, but the column identifiers would need to be translated somewhat to be usable, so that's another option. Unfortu

Re: Kafka createDirectStream ​issue

2015-09-20 Thread Petr Novak
val topics="first" shouldn't it be val topics = Set("first") ? On Sun, Sep 20, 2015 at 1:01 PM, Petr Novak wrote: > val topics="first" > > shouldn't it be val topics = Set("first") ? > > On Sat, Sep 19, 2015 at 10:07 PM, kali.tumm...@gmail.com < > kali.tumm...@gmail.com> wrote: > >> Hi , >> >>

Re: KafkaDirectStream can't be recovered from checkpoint

2015-09-20 Thread Petr Novak
Hi Michal, yes, it is there logged twice, it can be seen in attached log in one of previous post with more details: 15/09/17 23:06:37 INFO StreamingContext: Invoking stop(stopGracefully=false) from shutdown hook 15/09/17 23:06:37 INFO StreamingContext: Invoking stop(stopGracefully=false) from shut