date:20150920

Issue with high no of skipped task

2015-09-20 Thread Saurav Sinha

Hi Users, I am new Spark I have written flow.When we deployed our code it is completing jobs in 4-5 min. But now it is taking 20+ min in completing with almost same set of data. Can you please help me to figure out reason for it. -- Thanks and Regards, Saurav Sinha Contact: 9742879062 -- Th

Fwd: Issue with high no of skipped task

2015-09-20 Thread Saurav Sinha

Hi Users, I am new Spark I have written flow.When we deployed our code it is completing jobs in 4-5 min. But now it is taking 20+ min in completing with almost same set of data. Can you please help me to figure out reason for it. -- Thanks and Regards, Saurav Sinha Contact: 9742879062 -- T

Re: Problem at sbt/sbt assembly

2015-09-20 Thread Sean Owen

Sbt asked for a bigger initial heap than the host had space for. It is a JVM error you can and should search for first. You will need more memory. On Mon, Sep 21, 2015, 2:11 AM Aaroncq4 <475715...@qq.com> wrote: > When I used “sbt/sbt assembly" to compile spark code of spark-1.5.0,I got a > probl

Re: word count (group by users) in spark

2015-09-20 Thread Aniket Bhatnagar

Unless I am mistaken, in a group by operation, it spills to disk in case values for a key don't fit in memory. Thanks, Aniket On Mon, Sep 21, 2015 at 10:43 AM Huy Banh wrote: > Hi, > > If your input format is user -> comment, then you could: > > val comments = sc.parallelize(List(("u1", "one tw

Re: Class cast exception : Spark 1.5

2015-09-20 Thread sim

You likely need to add the Cassandra connector JAR to spark.jars so it is available to the executors. Hope this helps, Sim -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Class-cast-exception-Spark-1-5-tp24732p24753.html Sent from the Apache Spark User List

Re: question building spark in a virtual machine

2015-09-20 Thread Eyal Altshuler

Anyone? On Sun, Sep 20, 2015 at 7:49 AM, Eyal Altshuler wrote: > I allocated almost 6GB of RAM to the ubuntu virtual machine and got the > same problem. > I will go over this post and try to zoom in into the java vm settings. > > meanwhile - can someone with a working ubuntu machine can specify

What is a taskBinary for a ShuffleMapTask? What is its purpose?

2015-09-20 Thread Muler

Hi, What is the purpose of the taskBinary for a ShuffleMapTask? What does it contain and how is it useful? Is it the representation of all the RDD operations that will be applied for the partition that task will be processing? (in the case below the task will process stage 0, partition 0) If it is

Re: word count (group by users) in spark

2015-09-20 Thread Huy Banh

Hi, If your input format is user -> comment, then you could: val comments = sc.parallelize(List(("u1", "one two one"), ("u2", "three four three"))) val wordCounts = comments. flatMap({case (user, comment) => for (word <- comment.split(" ")) yield(((user, word), 1)) }). reduceByKey(_

Re: Problem at sbt/sbt assembly

2015-09-20 Thread Ted Yu

Have you seen this thread: http://search-hadoop.com/m/q3RTtVJJ3I15OJ251 Cheers On Sun, Sep 20, 2015 at 6:11 PM, Aaroncq4 <475715...@qq.com> wrote: > When I used “sbt/sbt assembly" to compile spark code of spark-1.5.0,I got a > problem and I did not know why.It signs that: > > NOTE: The sbt/sbt s

Problem at sbt/sbt assembly

2015-09-20 Thread Aaroncq4

When I used “sbt/sbt assembly"　to compile spark code of spark-1.5.0,I got a problem and I did not know why.It signs that: NOTE: The sbt/sbt script has been relocated to build/sbt. Please update references to point to the new location. Invoking 'build/sbt assembly' now ... Using /usr/

Re: in joins, does one side stream?

2015-09-20 Thread Reynold Xin

We do - but I don't think it is feasible to duplicate every single algorithm in DF and in RDD. The only way for this to work is to make one underlying implementation work for both. Right now DataFrame knows how to serialize individual elements well and can manage memory that way -- the RDD API doe

Re: in joins, does one side stream?

2015-09-20 Thread Koert Kuipers

sorry that was a typo. i meant to say: why do we have these features (broadcast join and sort-merge join) in DataFrame but not in RDD? they don't seem specific to structured data analysis to me. thanks! koert On Sun, Sep 20, 2015 at 2:46 PM, Koert Kuipers wrote: > why dont we want these (broa

Re: in joins, does one side stream?

2015-09-20 Thread Koert Kuipers

why dont we want these (broadcast join and sort-merge join) in DataFrame but not in RDD? they dont seem specific to structured data analysis to me. On Sun, Sep 20, 2015 at 2:41 AM, Rishitesh Mishra wrote: > Got it..thnx Reynold.. > On 20 Sep 2015 07:08, "Reynold Xin" wrote: > >> The RDDs thems

Re: Using Spark for portfolio manager app

2015-09-20 Thread Thúy Hằng Lê

Thanks all, Using external storage seems to be the best solution for now. Btw, have any one heard about following spark streaming module from Intel? https://github.com/Intel-bigdata/spark-streamingsql Seems it allow us to query on Spark stream on the fly, however it haven't updated for 9 months,

Re: PrunedFilteredScan does not work for UDTs and Struct fields

2015-09-20 Thread Richard Eggert

Having to restructure my queries isn't a very satisfactory solution, unfortunately. I did notice that if I implement the CatalystScan interface instead, then the filters DO get passed in, but the column identifiers would need to be translated somewhat to be usable, so that's another option. Unfortu

Re: Kafka createDirectStream issue

2015-09-20 Thread Petr Novak

val topics="first" shouldn't it be val topics = Set("first") ? On Sun, Sep 20, 2015 at 1:01 PM, Petr Novak wrote: > val topics="first" > > shouldn't it be val topics = Set("first") ? > > On Sat, Sep 19, 2015 at 10:07 PM, kali.tumm...@gmail.com < > kali.tumm...@gmail.com> wrote: > >> Hi , >> >>

Re: KafkaDirectStream can't be recovered from checkpoint

2015-09-20 Thread Petr Novak

Hi Michal, yes, it is there logged twice, it can be seen in attached log in one of previous post with more details: 15/09/17 23:06:37 INFO StreamingContext: Invoking stop(stopGracefully=false) from shutdown hook 15/09/17 23:06:37 INFO StreamingContext: Invoking stop(stopGracefully=false) from shut

Issue with high no of skipped task

Fwd: Issue with high no of skipped task

Re: Problem at sbt/sbt assembly

Re: word count (group by users) in spark

Re: Class cast exception : Spark 1.5

Re: question building spark in a virtual machine

What is a taskBinary for a ShuffleMapTask? What is its purpose?

Re: word count (group by users) in spark

Re: Problem at sbt/sbt assembly

Problem at sbt/sbt assembly

Re: in joins, does one side stream?

Re: in joins, does one side stream?

Re: in joins, does one side stream?

Re: Using Spark for portfolio manager app

Re: PrunedFilteredScan does not work for UDTs and Struct fields

Re: Kafka createDirectStream issue

Re: KafkaDirectStream can't be recovered from checkpoint

17 matches

Site Navigation

Mail list logo

Footer information