Re: sqlContext fails to discover parquet partition

Wush Wu Tue, 23 Jun 2015 22:46:26 -0700

Dear all,

I found the reason.


After enabling the "spark.sql.parquet.useDataSourceApi" in sqlContext, the
partition of parquet works correctly.

example code:

```
sqlContext.setConf("spark.sql.parquet.useDataSourceApi", "true")
val ecrtb20150622 =
sqlContext.parquetFile("hdfs:///bwlogs/beta/archive/EC.RTB/_year=2015/_month=06/_day=22")
```

Hope this might help others in the future.

Best,
Wush

2015-06-23 10:00 GMT+08:00 Wush Wu <w...@bridgewell.com>:

> Dear all,
>
> Today we try to load parquet file with partition as instructed in <
> https://spark.apache.org/docs/1.3.1/sql-programming-guide.html#partition-discovery>
> :
>
> ```
>
> sqlContext.parquetFile("hdfs:///bwlogs/beta/archive/EC.Buy/_year=2015/_month=06/_day=11")
> ```
>
> but we got `java.lang.IllegalArgumentException: Could not find Parquet
> metadata at path
> hdfs://bwhdfscluster/bwlogs/beta/archive/EC.Buy/_year=2015/_month=06/_day=11`
>
> However, if I new a HiveContext by myself:
>
> ```
> val hc = new org.apache.spark.sql.hive.HiveContext(sc)
>
> hc.parquetFile("hdfs:///bwlogs/beta/archive/EC.Buy/_year=2015/_month=06/_day=11")
> ```
>
> It works.
>
> Is this a bug? Or did I make a mistake in configuration my hdfs cluster?
>
> Thanks,
> Wush
>

Re: sqlContext fails to discover parquet partition

Reply via email to