Depends on which operation you are doing, If you are doing a .count() on a parquet, it might not download the entire file i think, but if you do a .count() on a normal text file it might pull the entire file.
Thanks Best Regards On Sat, Aug 8, 2015 at 3:12 AM, Akshat Aranya <aara...@gmail.com> wrote: > Hi, > > I've been trying to track down some problems with Spark reads being very > slow with s3n:// URIs (NativeS3FileSystem). After some digging around, I > realized that this file system implementation fetches the entire file, > which isn't really a Spark problem, but it really slows down things when > trying to just read headers from a Parquet file or just creating partitions > in the RDD. Is this something that others have observed before, or am I > doing something wrong? > > Thanks, > Akshat >