Re: Spark to utilize HDFS's mmap caching

Marcelo Vanzin Mon, 12 May 2014 18:17:28 -0700

Is that true? I believe that API Chanwit is talking about requires
explicitly asking for files to be cached in HDFS.

Spark automatically benefits from the kernel's page cache (i.e. if
some block is in the kernel's page cache, it will be read more
quickly). But the explicit HDFS cache is a different thing; Spark
applications that want to use it would have to explicitly call the
respective HDFS APIs.

On Sun, May 11, 2014 at 11:04 PM, Matei Zaharia <matei.zaha...@gmail.com> wrote:
> Yes, Spark goes through the standard HDFS client and will automatically 
> benefit from this.
>
> Matei
>
> On May 8, 2014, at 4:43 AM, Chanwit Kaewkasi <chan...@gmail.com> wrote:
>
>> Hi all,
>>
>> Can Spark (0.9.x) utilize the caching feature in HDFS 2.3 via
>> sc.textFile() and other HDFS-related APIs?
>>
>> http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-hdfs/CentralizedCacheManagement.html
>>
>> Best regards,
>>
>> -chanwit
>>
>> --
>> Chanwit Kaewkasi
>> linkedin.com/in/chanwit
>

-- 
Marcelo

Re: Spark to utilize HDFS's mmap caching

Reply via email to