RE: 1.4.0

yana Tue, 30 Jun 2015 18:40:39 -0700

I wonder if this could be a side effect of Spark-3928. Does ending the path 
with *.parquet work?


<div>-------- Original message --------</div><div>From: Exie 
<tfind...@prodevelop.com.au> </div><div>Date:06/30/2015  9:20 PM  (GMT-05:00) 
</div><div>To: user@spark.apache.org </div><div>Subject: 1.4.0 </div><div>
</div>So I was delighted with Spark 1.3.1 using Parquet 1.6.0 which would
"partition" data into folders. So I set up some parquet data paritioned by
date. This enabled is to reference a single day/month/year minimizing how
much data was scanned.

eg:
val myDataFrame =
hiveContext.read.parquet("s3n://myBucket/myPath/2014/07/01")
or
val myDataFrame = hiveContext.read.parquet("s3n://myBucket/myPath/2014/07")

However since upgrading to Spark 1.4.0 it doesnt seem to be working the same
way. 
The first line works, in the "01" folder is all the normal files:
2015-06-02 20:01         0   s3://myBucket/myPath/2014/07/01/_SUCCESS
2015-06-02 20:01      2066  
s3://myBucket/myPath/2014/07/01/_common_metadata
2015-06-02 20:01   1077190   s3://myBucket/myPath/2014/07/01/_metadata
2015-06-02 19:57    119933  
s3://myBucket/myPath/2014/07/01/part-r-00001.parquet
2015-06-02 19:57     48478  
s3://myBucket/myPath/2014/07/01/part-r-00002.parquet
2015-06-02 19:57    576878  
s3://myBucket/myPath/2014/07/01/part-r-00003.parquet

... but if I now use the second line above, to read in all days, it comes
back empty.

Is there an option I can set somewhere to fix this ?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/1-4-0-tp23556.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

RE: 1.4.0

Reply via email to