This was supposedly fixed
>> in newer versions of Hadoop but I haven't verified it.
>>
>> -Kay
>>
>>>
>>>
>>> -- Forwarded message --
>>> From: Andrew Ash
>>> Date: Tue, Sep 30, 2014 at 1:33 PM
>>> Subje
gt; fixed in newer versions of Hadoop but I haven't verified it.
>
> -Kay
>
>
>>
>> -- Forwarded message ------
>> From: Andrew Ash
>> Date: Tue, Sep 30, 2014 at 1:33 PM
>> Subject: Re: Short Circuit Local Reads
>> To: Matei Zaharia
y fixed
in newer versions of Hadoop but I haven't verified it.
-Kay
>
> -- Forwarded message --
> From: Andrew Ash
> Date: Tue, Sep 30, 2014 at 1:33 PM
> Subject: Re: Short Circuit Local Reads
> To: Matei Zaharia
> Cc: "user@spark.apache.org" , G
Hi Gary,
I gave this a shot on a test cluster of CDH4.7 and actually saw a
regression in performance when running the numbers. Have you done any
benchmarking? Below are my numbers:
Experimental method:
1. Write 14GB of data to HDFS via [1]
2. Read data multiple times via [2]
*Experiment 1:
I'm pretty sure it does help, though I don't have any numbers for it. In any
case, Spark will automatically benefit from this if you link it to a version of
HDFS that contains this.
Matei
On September 17, 2014 at 5:15:47 AM, Gary Malouf (malouf.g...@gmail.com) wrote:
Cloudera had a blog post a