Is it possible to re-run your job with spark.eventLog.enabled to true, and
send the resulting logs to the list? Those have more per-task information
that can help diagnose this.

-Kay

On Wed, Jan 21, 2015 at 1:57 AM, Fengyun RAO <raofeng...@gmail.com> wrote:

> btw: Shuffle Write(11 GB) mean 11 GB per Executor, for each task, it's ~40
> MB
>
>
> 2015-01-21 17:53 GMT+08:00 Fengyun RAO <raofeng...@gmail.com>:
>
>> I don't know how to debug distributed application, any tools or
>> suggestion?
>>
>> but from spark web UI,
>>
>> the GC time (~0.1 s), Shuffle Write(11 GB) are similar for spark 1.1 and
>> 1.2.
>> there are no Shuffle Read and Spill.
>> The only difference is Duration
>> DurationMin25th percentileMedian75th percentileMaxspark 1.24s37s45s53s1.9
>> minspark 1.12 s17 s18 s18 s34 s
>>
>> 2015-01-21 16:56 GMT+08:00 Sean Owen <so...@cloudera.com>:
>>
>>> I mean that if you had tasks running on 10 machines now instead of 3 for
>>> some reason you would have more than 3 times the read load on your source
>>> of data all at once. Same if you made more executors per machine. But from
>>> your additional info it does not sound like this is the case. I think you
>>> need more debugging to pinpoint what is slower.
>>> On Jan 21, 2015 9:30 AM, "Fengyun RAO" <raofeng...@gmail.com> wrote:
>>>
>>>> thanks, Sean.
>>>>
>>>> I don't quite understand "you have *more *partitions across *more *
>>>> workers".
>>>>
>>>> It's within the same cluster, and the same data, thus I think the same
>>>> partition, the same workers.
>>>>
>>>> we switched from spark 1.1 to 1.2, then it's 3x slower.
>>>>
>>>> (We upgrade from CDH 5.2.1 to CDH 5.3, hence spark 1.1 to 1.2, and
>>>> found the problem.
>>>> then we installed a standalone spark 1.1, stop the 1.2, run the same
>>>> script, it's 3x faster.
>>>> stop 1.1, start 1.2, 3x slower again)
>>>>
>>>>
>>>> 2015-01-21 15:45 GMT+08:00 Sean Owen <so...@cloudera.com>:
>>>>
>>>>> I don't know of any reason to think the singleton pattern doesn't work
>>>>> or works differently. I wonder if, for example, task scheduling is
>>>>> different in 1.2 and you have more partitions across more workers and so
>>>>> are loading more copies more slowly into your singletons.
>>>>> On Jan 21, 2015 7:13 AM, "Fengyun RAO" <raofeng...@gmail.com> wrote:
>>>>>
>>>>>> the LogParser instance is not serializable, and thus cannot be a
>>>>>> broadcast,
>>>>>>
>>>>>> what’s worse, it contains an LRU cache, which is essential to the
>>>>>> performance, and we would like to share among all the tasks on the same
>>>>>> node.
>>>>>>
>>>>>> If it is the case, what’s the recommended way to share a variable
>>>>>> among all the tasks within the same executor.
>>>>>> ​
>>>>>>
>>>>>> 2015-01-21 15:04 GMT+08:00 Davies Liu <dav...@databricks.com>:
>>>>>>
>>>>>>> Maybe some change related to serialize the closure cause LogParser is
>>>>>>> not a singleton any more, then it is initialized for every task.
>>>>>>>
>>>>>>> Could you change it to a Broadcast?
>>>>>>>
>>>>>>> On Tue, Jan 20, 2015 at 10:39 PM, Fengyun RAO <raofeng...@gmail.com>
>>>>>>> wrote:
>>>>>>> > Currently we are migrating from spark 1.1 to spark 1.2, but found
>>>>>>> the
>>>>>>> > program 3x slower, with nothing else changed.
>>>>>>> > note: our program in spark 1.1 has successfully processed a whole
>>>>>>> year data,
>>>>>>> > quite stable.
>>>>>>> >
>>>>>>> > the main script is as below
>>>>>>> >
>>>>>>> > sc.textFile(inputPath)
>>>>>>> > .flatMap(line => LogParser.parseLine(line))
>>>>>>> > .groupByKey(new HashPartitioner(numPartitions))
>>>>>>> > .mapPartitionsWithIndex(...)
>>>>>>> > .foreach(_ => {})
>>>>>>> >
>>>>>>> > where LogParser is a singleton which may take some time to
>>>>>>> initialized and
>>>>>>> > is shared across the execuator.
>>>>>>> >
>>>>>>> > the flatMap stage is 3x slower.
>>>>>>> >
>>>>>>> > We tried to change spark.shuffle.manager back to hash, and
>>>>>>> > spark.shuffle.blockTransferService back to nio, but didn’t help.
>>>>>>> >
>>>>>>> > May somebody explain possible causes, or what should we test or
>>>>>>> change to
>>>>>>> > find it out
>>>>>>>
>>>>>>
>>>>>>
>>>>
>>
>

Reply via email to