Appreciate for sharing problem and solution! Best, moon
On Tue, Jun 23, 2015 at 10:45 PM Wush Wu <w...@bridgewell.com> wrote: > Dear all, > > I found the reason. > > After enabling the "spark.sql.parquet.useDataSourceApi" in sqlContext, the > partition of parquet works correctly. > > example code: > > ``` > sqlContext.setConf("spark.sql.parquet.useDataSourceApi", "true") > val ecrtb20150622 = > sqlContext.parquetFile("hdfs:///bwlogs/beta/archive/EC.RTB/_year=2015/_month=06/_day=22") > ``` > > Hope this might help others in the future. > > Best, > Wush > > 2015-06-23 10:00 GMT+08:00 Wush Wu <w...@bridgewell.com>: > >> Dear all, >> >> Today we try to load parquet file with partition as instructed in < >> https://spark.apache.org/docs/1.3.1/sql-programming-guide.html#partition-discovery> >> : >> >> ``` >> >> sqlContext.parquetFile("hdfs:///bwlogs/beta/archive/EC.Buy/_year=2015/_month=06/_day=11") >> ``` >> >> but we got `java.lang.IllegalArgumentException: Could not find Parquet >> metadata at path >> hdfs://bwhdfscluster/bwlogs/beta/archive/EC.Buy/_year=2015/_month=06/_day=11` >> >> However, if I new a HiveContext by myself: >> >> ``` >> val hc = new org.apache.spark.sql.hive.HiveContext(sc) >> >> hc.parquetFile("hdfs:///bwlogs/beta/archive/EC.Buy/_year=2015/_month=06/_day=11") >> ``` >> >> It works. >> >> Is this a bug? Or did I make a mistake in configuration my hdfs cluster? >> >> Thanks, >> Wush >> > >