Depends on which operation you are doing, If you are doing a .count() on a
parquet, it might not download the entire file i think, but if you do a
.count() on a normal text file it might pull the entire file.

Thanks
Best Regards

On Sat, Aug 8, 2015 at 3:12 AM, Akshat Aranya <aara...@gmail.com> wrote:

> Hi,
>
> I've been trying to track down some problems with Spark reads being very
> slow with s3n:// URIs (NativeS3FileSystem).  After some digging around, I
> realized that this file system implementation fetches the entire file,
> which isn't really a Spark problem, but it really slows down things when
> trying to just read headers from a Parquet file or just creating partitions
> in the RDD.  Is this something that others have observed before, or am I
> doing something wrong?
>
> Thanks,
> Akshat
>

Reply via email to