Hello,
I am running a streaming app in Spark 1.2.1. When running local everything
works fine. When I try on yarn-cluster it fails and I see ClassCastException
in the log (see below). I can run Spark (non-streaming) apps in the cluster
with no problem.
Any ideas here? Thanks in advance!
WARN sche
Hello,
Just in case someone finds the same issue, it was caused by running the
streaming app with different version of the cluster jars (the uber jar
contained both yarn and spark).
Regards
J
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/ClassCastExcept
Hello,
I need to process a significant amount of data every day, about 4TB. This
will be processed in batches of about 140GB. The cluster this will be
running on doesn't have enough memory to hold the dataset at once, so I am
trying to understand how this works internally.
When using textFile to
+1 :)
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-make-spark-partition-sticky-i-e-stay-with-node-tp21322p21323.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
Hello Julaiti,
Maybe I am just asking the obvious :-) but did you check disk IO? Depending
on what you are doing that could be the bottleneck.
In my case none of the HW resources was a bottleneck, but using some
distributed features that were blocking execution (e.g. Hazelcast). Could
that be you
If you do not want those progress indication to appear, just set
spark.ui.showConsoleProgress to false, e.g:
System.setProperty("spark.ui.showConsoleProgress", "false");
Regards
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Extra-output-from-Spark-run-tp
Hello Evo, Ranjitiyer,
I am also looking for the same thing. Using foreach is not useful for me as
processing the RDD as a whole won't be distributed across workers and that
would kill performance in my application :-/
Let me know if you find a solution for this.
Regards
--
View this message
Hello,
Maybe there is something I do not get to understand, but I believe this code
should not throw any serialization error when I run this in the spark shell.
Using similar code with map instead of mapPartitions works just fine.
import java.io.BufferedInputStream
import java.io.FileInputStream
Hi Sowen, the constructor just reads from the stream and stores only read
data, but it's not keeping a reference to the stream itself, so I think this
should work as it is.
Akshat, will try your suggestion in any case, thanks!
--
View this message in context:
http://apache-spark-user-list.10
Hello,
I have a cluster 1 master and 2 slaves running on 1.1.0. I am having
problems to get both slaves working at the same time. When I launch the
driver on the master, one of the slaves is assigned the receiver task, and
initially both slaves start processing tasks. After a few tens of batches,
Thanks Akhil Das-2: actually I tried setting spark.default.parallelism but no
effect :-/
I am running standalone and performing a mix of map/filter/foreachRDD.
I had to force parallelism with repartition to get both workers to process
tasks, but I do not think this should be required (and I am n
One detail, even forcing partitions (/repartition/), spark is still holding
some tasks; if I increase the load of the system (increasing
/spark.streaming.receiver.maxRate/), even if all workers are used, the one
with the receiver gets twice as many tasks compared with the other workers.
Total del
Hi Jon, I am looking for an answer for a similar question in the doc now, so
far no clue.
I would need to know what is spark behaviour in a situation like the example
you provided, but taking into account also that there are multiple
partitions/workers.
I could imagine it's possible that differen
Hello Mixtou, if you want to look at partition ID, I believe you want to use
mapPartitionsWithIndex
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Newbie-Question-on-How-Tasks-are-Executed-tp21064p21228.html
Sent from the Apache Spark User List mailing li
14 matches
Mail list logo