I think the real problem is "spark.akka.frameSize". It is to small for
passing the data. every executor failed, and there is no executor, then the
task hangs up.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/parallelize-for-a-large-Seq-is-extreamly-slow-tp
It's my fault! I upload a wrong jar when I changed the number of partitions.
and Now it just works fine:)
The size of word_mapping is 2444185.
So it will take very long time for large object serialization? I don't think
two million is very large, because the cost at local for such size is
typical
How many values are in that sequence? I.e. what is its size?
You can also profile your program while it’s running to see where it’s spending
time. The easiest way is to get a single stack trace with jstack .
Maybe some of the serialization methods for this data are super inefficient, or
toSeq o
That's not work. I don't think it is just slow, It never ends(with 30+ hours,
and I killed it).
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/parallelize-for-a-large-Seq-is-extreamly-slow-tp4801p4900.html
Sent from the Apache Spark User List mailing list
Could it be that you're using the default number of partitions of
parallelize() is too small in this case? Try something like
spark.parallelize(word_mapping.value.toSeq, 60). (Given your setup, it
should already be 30, but perhaps that's not the case in YARN mode...)
On Fri, Apr 25, 2014 at 11:38
parallelize is still so slow.
package com.semi.nlp
import org.apache.spark._
import SparkContext._
import scala.io.Source
import com.esotericsoftware.kryo.Kryo
import org.apache.spark.serializer.KryoRegistrator
class MyRegistrator extends KryoRegistrator {
override def registerCla
reduceByKey(_+_).countByKey instead of countByKey seems to be fast.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/parallelize-for-a-large-Seq-is-extreamly-slow-tp4801p4870.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
This error come just because I killed my App:(
Is there something wrong? the reduceByKey operation is extremely slow(than
default Serializer).
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/parallelize-for-a-large-Seq-is-extreamly-slow-tp4801p4869.html
Sen
I've tried to set larger buffer, but reduceByKey seems to be failed. need
help:)
14/04/26 12:31:12 INFO cluster.CoarseGrainedSchedulerBackend: Shutting down
all executors
14/04/26 12:31:12 INFO cluster.CoarseGrainedSchedulerBackend: Asking each
executor to shut down
14/04/26 12:31:12 INFO schedule
Kryo With Exception below:
com.esotericsoftware.kryo.KryoException
(com.esotericsoftware.kryo.KryoException: Buffer overflow. Available: 0,
required: 1)
com.esotericsoftware.kryo.io.Output.require(Output.java:138)
com.esotericsoftware.kryo.io.Output.writeAscii_slow(Output.java:446)
com.esotericsof
Try setting the serializer to org.apache.spark.serializer.KryoSerializer (see
http://spark.apache.org/docs/0.9.1/tuning.html), it should be considerably
faster.
Matei
On Apr 24, 2014, at 8:01 PM, Earthson Lu wrote:
> spark.parallelize(word_mapping.value.toSeq).saveAsTextFile("hdfs://ns1/nlp/w
Try setting the serializer to org.apache.spark.serializer.KryoSerializer (see
http://spark.apache.org/docs/0.9.1/tuning.html), it should be considerably
faster.
Matei
On Apr 24, 2014, at 8:01 PM, Earthson Lu wrote:
> spark.parallelize(word_mapping.value.toSeq).saveAsTextFile("hdfs://ns1/nlp/w
12 matches
Mail list logo