Re: spark 1.2 three times slower than spark 1.1, why?

Fengyun RAO Wed, 21 Jan 2015 00:42:17 -0800

maybe you mean different spark-submit script?

we also use the same spark-submit script, thus the same memory, cores, etc
configuration.


2015-01-21 15:45 GMT+08:00 Sean Owen <so...@cloudera.com>:

> I don't know of any reason to think the singleton pattern doesn't work or
> works differently. I wonder if, for example, task scheduling is different
> in 1.2 and you have more partitions across more workers and so are loading
> more copies more slowly into your singletons.
> On Jan 21, 2015 7:13 AM, "Fengyun RAO" <raofeng...@gmail.com> wrote:
>
>> the LogParser instance is not serializable, and thus cannot be a
>> broadcast,
>>
>> what’s worse, it contains an LRU cache, which is essential to the
>> performance, and we would like to share among all the tasks on the same
>> node.
>>
>> If it is the case, what’s the recommended way to share a variable among
>> all the tasks within the same executor.
>> 
>>
>> 2015-01-21 15:04 GMT+08:00 Davies Liu <dav...@databricks.com>:
>>
>>> Maybe some change related to serialize the closure cause LogParser is
>>> not a singleton any more, then it is initialized for every task.
>>>
>>> Could you change it to a Broadcast?
>>>
>>> On Tue, Jan 20, 2015 at 10:39 PM, Fengyun RAO <raofeng...@gmail.com>
>>> wrote:
>>> > Currently we are migrating from spark 1.1 to spark 1.2, but found the
>>> > program 3x slower, with nothing else changed.
>>> > note: our program in spark 1.1 has successfully processed a whole year
>>> data,
>>> > quite stable.
>>> >
>>> > the main script is as below
>>> >
>>> > sc.textFile(inputPath)
>>> > .flatMap(line => LogParser.parseLine(line))
>>> > .groupByKey(new HashPartitioner(numPartitions))
>>> > .mapPartitionsWithIndex(...)
>>> > .foreach(_ => {})
>>> >
>>> > where LogParser is a singleton which may take some time to initialized
>>> and
>>> > is shared across the execuator.
>>> >
>>> > the flatMap stage is 3x slower.
>>> >
>>> > We tried to change spark.shuffle.manager back to hash, and
>>> > spark.shuffle.blockTransferService back to nio, but didn’t help.
>>> >
>>> > May somebody explain possible causes, or what should we test or change
>>> to
>>> > find it out
>>>
>>
>>

Re: spark 1.2 three times slower than spark 1.1, why?

Reply via email to