Re: Partitioned Parquet based external table

Michal Klos Thu, 12 Nov 2015 05:02:53 -0800

You must add the partitions to the Hive table with something like "alter table 
your_table add if not exists partition (country='us');".


If you have dynamic partitioning turned on,  you can do 'msck repair table 
your_table' to recover the partitions.

I would recommend reviewing the Hive documentation on partitions 

M



> On Nov 12, 2015, at 6:38 AM, Chandra Mohan, Ananda Vel Murugan 
> <ananda.muru...@honeywell.com> wrote:
> 
> Hi,
>  
> I am using Spark 1.5.1.
>  
> https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/sql/JavaSparkSQL.java.
>  I have slightly modified this example to create partitioned parquet file
>  
> Instead of this line
>  
> schemaPeople.write().parquet("people.parquet");
>  
> I use this line
>  
> schemaPeople.write().partitionBy("country").parquet("/user/Ananda/people.parquet");
>  
> I have also updated the Person class and added country attribute. I have also 
> updated my input file accordingly.
>  
> When I run this code in spark, it seems to work. I could see partitioned 
> folder and parquet file inside it in HDFS where I store this parquet file.
>  
> But when I create a external table in Hive, it does not work. When I do 
> “select  *  from person5”, it returns no rows.
>  
> This is how I create the table
>  
> CREATE EXTERNAL TABLE person5(name string, age int,city string)
> PARTITIONED BY (country string)
> STORED AS PARQUET
> LOCATION '/user/ananda/people.parquet/';
>  
> When I create a non partitioned table, it works fine.
>  
> Please help if you have any idea.
>  
> Regards,
> Anand.C

Re: Partitioned Parquet based external table

Reply via email to