from:"davidkl"

ClassCastException: BlockManagerId cannot be cast to [B

2015-06-11 Thread davidkl

Hello, I am running a streaming app in Spark 1.2.1. When running local everything works fine. When I try on yarn-cluster it fails and I see ClassCastException in the log (see below). I can run Spark (non-streaming) apps in the cluster with no problem. Any ideas here? Thanks in advance! WARN sche

Re: ClassCastException: BlockManagerId cannot be cast to [B

2015-06-12 Thread davidkl

Hello, Just in case someone finds the same issue, it was caused by running the streaming app with different version of the cluster jars (the uber jar contained both yarn and spark). Regards J -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/ClassCastExcept

laziness in textFile reading from HDFS?

2015-09-28 Thread davidkl

Hello, I need to process a significant amount of data every day, about 4TB. This will be processed in batches of about 140GB. The cluster this will be running on doesn't have enough memory to hold the dataset at once, so I am trying to understand how this works internally. When using textFile to

Re: How to make spark partition sticky, i.e. stay with node?

2015-01-22 Thread davidkl

+1 :) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-make-spark-partition-sticky-i-e-stay-with-node-tp21322p21323.html Sent from the Apache Spark User List mailing list archive at Nabble.com. -

Re: Identify the performance bottleneck from hardware prospective

2015-03-05 Thread davidkl

Hello Julaiti, Maybe I am just asking the obvious :-) but did you check disk IO? Depending on what you are doing that could be the bottleneck. In my case none of the HW resources was a bottleneck, but using some distributed features that were blocking execution (e.g. Hazelcast). Could that be you

Re: Extra output from Spark run

2015-03-05 Thread davidkl

If you do not want those progress indication to appear, just set spark.ui.showConsoleProgress to false, e.g: System.setProperty("spark.ui.showConsoleProgress", "false"); Regards -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Extra-output-from-Spark-run-tp

Re: Custom paritioning of DSTream

2015-04-23 Thread davidkl

Hello Evo, Ranjitiyer, I am also looking for the same thing. Using foreach is not useful for me as processing the RDD as a whole won't be distributed across workers and that would kill performance in my application :-/ Let me know if you find a solution for this. Regards -- View this message

bug with MapPartitions?

2014-10-17 Thread davidkl

Hello, Maybe there is something I do not get to understand, but I believe this code should not throw any serialization error when I run this in the spark shell. Using similar code with map instead of mapPartitions works just fine. import java.io.BufferedInputStream import java.io.FileInputStream

Re: bug with MapPartitions?

2014-10-17 Thread davidkl

Hi Sowen, the constructor just reads from the stream and stores only read data, but it's not keeping a reference to the stream itself, so I think this should work as it is. Akshat, will try your suggestion in any case, thanks! -- View this message in context: http://apache-spark-user-list.10

Spark Streaming scheduling control

2014-10-19 Thread davidkl

Hello, I have a cluster 1 master and 2 slaves running on 1.1.0. I am having problems to get both slaves working at the same time. When I launch the driver on the master, one of the slaves is assigned the receiver task, and initially both slaves start processing tasks. After a few tens of batches,

Re: Spark Streaming scheduling control

2014-10-20 Thread davidkl

Thanks Akhil Das-2: actually I tried setting spark.default.parallelism but no effect :-/ I am running standalone and performing a mix of map/filter/foreachRDD. I had to force parallelism with repartition to get both workers to process tasks, but I do not think this should be required (and I am n

Re: Spark Streaming scheduling control

2014-10-20 Thread davidkl

One detail, even forcing partitions (/repartition/), spark is still holding some tasks; if I increase the load of the system (increasing /spark.streaming.receiver.maxRate/), even if all workers are used, the one with the receiver gets twice as many tasks compared with the other workers. Total del

Re: Does Spark automatically run different stages concurrently when possible?

2015-01-19 Thread davidkl

Hi Jon, I am looking for an answer for a similar question in the doc now, so far no clue. I would need to know what is spark behaviour in a situation like the example you provided, but taking into account also that there are multiple partitions/workers. I could imagine it's possible that differen

Re: Newbie Question on How Tasks are Executed

2015-01-19 Thread davidkl

Hello Mixtou, if you want to look at partition ID, I believe you want to use mapPartitionsWithIndex -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Newbie-Question-on-How-Tasks-are-Executed-tp21064p21228.html Sent from the Apache Spark User List mailing li

ClassCastException: BlockManagerId cannot be cast to [B

Re: ClassCastException: BlockManagerId cannot be cast to [B

laziness in textFile reading from HDFS?

Re: How to make spark partition sticky, i.e. stay with node?

Re: Identify the performance bottleneck from hardware prospective

Re: Extra output from Spark run

Re: Custom paritioning of DSTream

bug with MapPartitions?

Re: bug with MapPartitions?

Spark Streaming scheduling control

Re: Spark Streaming scheduling control

Re: Spark Streaming scheduling control

Re: Does Spark automatically run different stages concurrently when possible?

Re: Newbie Question on How Tasks are Executed

14 matches

Site Navigation

Mail list logo

Footer information