I wonder if this could be a side effect of Spark-3928. Does ending the path with *.parquet work?
<div>-------- Original message --------</div><div>From: Exie <tfind...@prodevelop.com.au> </div><div>Date:06/30/2015 9:20 PM (GMT-05:00) </div><div>To: user@spark.apache.org </div><div>Subject: 1.4.0 </div><div> </div>So I was delighted with Spark 1.3.1 using Parquet 1.6.0 which would "partition" data into folders. So I set up some parquet data paritioned by date. This enabled is to reference a single day/month/year minimizing how much data was scanned. eg: val myDataFrame = hiveContext.read.parquet("s3n://myBucket/myPath/2014/07/01") or val myDataFrame = hiveContext.read.parquet("s3n://myBucket/myPath/2014/07") However since upgrading to Spark 1.4.0 it doesnt seem to be working the same way. The first line works, in the "01" folder is all the normal files: 2015-06-02 20:01 0 s3://myBucket/myPath/2014/07/01/_SUCCESS 2015-06-02 20:01 2066 s3://myBucket/myPath/2014/07/01/_common_metadata 2015-06-02 20:01 1077190 s3://myBucket/myPath/2014/07/01/_metadata 2015-06-02 19:57 119933 s3://myBucket/myPath/2014/07/01/part-r-00001.parquet 2015-06-02 19:57 48478 s3://myBucket/myPath/2014/07/01/part-r-00002.parquet 2015-06-02 19:57 576878 s3://myBucket/myPath/2014/07/01/part-r-00003.parquet ... but if I now use the second line above, to read in all days, it comes back empty. Is there an option I can set somewhere to fix this ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/1-4-0-tp23556.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org