Re: sqlContext fails to discover parquet partition

moon soo Lee Thu, 25 Jun 2015 12:43:04 -0700

Appreciate for sharing problem and solution!

Best,
moon


On Tue, Jun 23, 2015 at 10:45 PM Wush Wu <w...@bridgewell.com> wrote:

> Dear all,
>
> I found the reason.
>
> After enabling the "spark.sql.parquet.useDataSourceApi" in sqlContext, the
> partition of parquet works correctly.
>
> example code:
>
> ```
> sqlContext.setConf("spark.sql.parquet.useDataSourceApi", "true")
> val ecrtb20150622 =
> sqlContext.parquetFile("hdfs:///bwlogs/beta/archive/EC.RTB/_year=2015/_month=06/_day=22")
> ```
>
> Hope this might help others in the future.
>
> Best,
> Wush
>
> 2015-06-23 10:00 GMT+08:00 Wush Wu <w...@bridgewell.com>:
>
>> Dear all,
>>
>> Today we try to load parquet file with partition as instructed in <
>> https://spark.apache.org/docs/1.3.1/sql-programming-guide.html#partition-discovery>
>> :
>>
>> ```
>>
>> sqlContext.parquetFile("hdfs:///bwlogs/beta/archive/EC.Buy/_year=2015/_month=06/_day=11")
>> ```
>>
>> but we got `java.lang.IllegalArgumentException: Could not find Parquet
>> metadata at path
>> hdfs://bwhdfscluster/bwlogs/beta/archive/EC.Buy/_year=2015/_month=06/_day=11`
>>
>> However, if I new a HiveContext by myself:
>>
>> ```
>> val hc = new org.apache.spark.sql.hive.HiveContext(sc)
>>
>> hc.parquetFile("hdfs:///bwlogs/beta/archive/EC.Buy/_year=2015/_month=06/_day=11")
>> ```
>>
>> It works.
>>
>> Is this a bug? Or did I make a mistake in configuration my hdfs cluster?
>>
>> Thanks,
>> Wush
>>
>
>

Re: sqlContext fails to discover parquet partition

Reply via email to