Re: spark 1.2 three times slower than spark 1.1, why?

JaeBoo Jung Wed, 21 Jan 2015 01:02:42 -0800

Title: Samsung Enterprise Portal mySingle

I was recently faced with a similar issue, but unfortunately I could not find out why it happened.

Here's jira ticket https://issues.apache.org/jira/browse/SPARK-5081 of my previous post.

Please check your shuffle I/O differences between the two in spark web UI because it can be possibly related to my case.

Thanks

Kevin

------- Original Message -------

Sender : Fengyun RAO<raofeng...@gmail.com>

Date : 2015-01-21 17:41 (GMT+09:00)

Title : Re: spark 1.2 three times slower than spark 1.1, why?

maybe you mean different spark-submit script?

we also use the same spark-submit script, thus the same memory, cores, etc configuration.

2015-01-21 15:45 GMT+08:00 Sean Owen <so...@cloudera.com>:

I don't know of any reason to think the singleton pattern doesn't work or works differently. I wonder if, for example, task scheduling is different in 1.2 and you have more partitions across more workers and so are loading more copies more slowly into your singletons.

On Jan 21, 2015 7:13 AM, "Fengyun RAO" <raofeng...@gmail.com> wrote:

the LogParser instance is not serializable, and thus cannot be a broadcast,

what’s worse, it contains an LRU cache, which is essential to the performance, and we would like to share among all the tasks on the same node.

If it is the case, what’s the recommended way to share a variable among all the tasks within the same executor.

2015-01-21 15:04 GMT+08:00 Davies Liu <dav...@databricks.com>:

Maybe some change related to serialize the closure cause LogParser is
not a singleton any more, then it is initialized for every task.

Could you change it to a Broadcast?

On Tue, Jan 20, 2015 at 10:39 PM, Fengyun RAO <raofeng...@gmail.com> wrote:
> Currently we are migrating from spark 1.1 to spark 1.2, but found the
> program 3x slower, with nothing else changed.
> note: our program in spark 1.1 has successfully processed a whole year data,
> quite stable.
>
> the main script is as below
>
> sc.textFile(inputPath)
> .flatMap(line => LogParser.parseLine(line))
> .groupByKey(new HashPartitioner(numPartitions))
> .mapPartitionsWithIndex(...)
> .foreach(_ => {})
>
> where LogParser is a singleton which may take some time to initialized and
> is shared across the execuator.
>
> the flatMap stage is 3x slower.
>
> We tried to change spark.shuffle.manager back to hash, and
> spark.shuffle.blockTransferService back to nio, but didn’t help.
>
> May somebody explain possible causes, or what should we test or change to
> find it out

Re: spark 1.2 three times slower than spark 1.1, why?

Reply via email to