Re: Spark 1.4.0: read.df() causes excessive IO

2015-06-30 Thread Exie
Just to add to this, here's some more info: val myDF = hiveContext.read.parquet("s3n://myBucket/myPath/") Produces these... 2015-07-01 03:25:50,450 INFO [pool-14-thread-4] (org.apache.hadoop.fs.s3native.NativeS3FileSystem) - Opening 's3n://myBucket/myPath/part-r-00339.parquet' for reading That

Spark 1.4.0: read.df() causes excessive IO

2015-06-29 Thread Exie
Hi Folks, I just stepped up from 1.3.1 to 1.4.0, the most notable difference for me so far is the data frame reader/writer. Previously: val myData = hiveContext.load("s3n://someBucket/somePath/","parquet") Now: val myData = hiveContext.read.parquet("s3n://someBucket/somePath") Using the ori