Is there a recommended performance test for sort based shuffle? Something
similar to terasort on Hadoop. I couldn't find one on the spark-perf code
base.
https://github.com/databricks/spark-perf
--
Kannan
According to hive documentation, "sort by" is supposed to order the results
for each reducer. So if we set a single reducer, then the results should be
sorted, right? But this is not happening. Any idea why? Looks like the
settings I am using to restrict the number of reducers is not having an
effe
I am working on SPARK-1529. I ran into an issue with my change, where the
same shuffle file was being reused across 2 jobs. Please note this only
happens when I use a hard coded location to use for shuffle files, say
"/tmp". It does not happen with normal code path that uses DiskBlockManager
to pic
you run the same job twice, the shuffle dependency as well as shuffle id
> is different, so the shuffle file name which is combined by
> (shuffleId+mapId+reduceId) will be changed, so there's no name conflict
> even in the same directory as I know.
>
> Thanks
> Jerry
>
>
not globally unique,
>> their full paths should be unique due to these unique per-application
>> subdirectories. Have you observed an instance where this isn't the case?
>>
>> - Josh
>>
>> On Tue, Mar 24, 2015 at 11:04 PM, Kannan Rajah
>> wrote:
&
>> Jerry
>>
>>
>>
>> -Original Message-
>> From: Cheng Lian [mailto:lian.cs@gmail.com]
>> Sent: Wednesday, March 25, 2015 7:40 PM
>> To: Saisai Shao; Kannan Rajah
>> Cc: dev@spark.apache.org
>> Subject: Re: Unders
DiskStore.getBytes uses memory mapped files if the length is more than a
configured limit. This code path is used during map side shuffle in
ExternalSorter. I want to know if its possible for the length to exceed the
limit in the case of shuffle. The reason I ask is in the case of Hadoop,
each map
files. I suppose the
>> comment
>> is a little better in FileSegmentManagedBuffer:
>>
>>
>> https://github.com/apache/spark/blob/master/network/common/src/main/java/org/apache/spark/network/buffer/FileSegmentManagedBuffer.java#L62
>>
>> On Tue, Apr 14,