Any one tried this? Please help me if you have any knowledge on this kind of 
use case.
 
 
From: yogesh.keshe...@outlook.com
To: user@hive.apache.org
Subject: Dynamic partitioned parquet tables
Date: Fri, 9 Oct 2015 11:20:57 +0530




 Hello,
 
I have a question regarding parquet tables. We have POS data, we want to store 
the data on per day partition basis.  We sqoop the data into an external table 
which is in text file format and then try to insert into an external table 
which is partitioned by date and, due to some requirements, we wanted to keep 
these files as parquet files. The average file size per day is around 2 MB. I 
know that parquet is not meant to be for lot of small files. But, we wanted to 
keep it that way. The problem is during the initial historical data load we are 
trying to create dynamic partitions, however no matter how much memory I set 
the jobs keeps failing because of memory issues. But after some research I 
found out that turning ,"set hive.optimize.sort.dynamic.partition = true", this 
property on we could create dynamic partitioned tables. But this is taking 
longer time than what we expected, is there anyway that we can boost the 
performance? Also, in spite of turning the property on when we try to create 
dynamic partitions for multiple years data at a time we are again running into 
heap error. How can we handle this problem? Please help us.
 
Thanks in advance!
 
Thank you,
Yogesh  
                                                                                
  

Reply via email to