Re: Short Circuit Local Reads

2014-10-01 Thread Colin McCabe
This was supposedly fixed >> in newer versions of Hadoop but I haven't verified it. >> >> -Kay >> >>> >>> >>> -- Forwarded message -- >>> From: Andrew Ash >>> Date: Tue, Sep 30, 2014 at 1:33 PM >>> Subje

Re: Short Circuit Local Reads

2014-09-30 Thread Andrew Ash
gt; fixed in newer versions of Hadoop but I haven't verified it. > > -Kay > > >> >> -- Forwarded message ------ >> From: Andrew Ash >> Date: Tue, Sep 30, 2014 at 1:33 PM >> Subject: Re: Short Circuit Local Reads >> To: Matei Zaharia

Re: Short Circuit Local Reads

2014-09-30 Thread Kay Ousterhout
y fixed in newer versions of Hadoop but I haven't verified it. -Kay > > -- Forwarded message -- > From: Andrew Ash > Date: Tue, Sep 30, 2014 at 1:33 PM > Subject: Re: Short Circuit Local Reads > To: Matei Zaharia > Cc: "user@spark.apache.org" , G

Re: Short Circuit Local Reads

2014-09-30 Thread Andrew Ash
Hi Gary, I gave this a shot on a test cluster of CDH4.7 and actually saw a regression in performance when running the numbers. Have you done any benchmarking? Below are my numbers: Experimental method: 1. Write 14GB of data to HDFS via [1] 2. Read data multiple times via [2] *Experiment 1:

Re: Short Circuit Local Reads

2014-09-17 Thread Matei Zaharia
I'm pretty sure it does help, though I don't have any numbers for it. In any case, Spark will automatically benefit from this if you link it to a version of HDFS that contains this. Matei On September 17, 2014 at 5:15:47 AM, Gary Malouf (malouf.g...@gmail.com) wrote: Cloudera had a blog post a