Hi Dean,

Thanks for the inputs !

I guess have to go by the tedious/hard way :P


Regards,
Omkar Joshi


From: Dean Wampler [mailto:deanwamp...@gmail.com]
Sent: Friday, July 26, 2013 4:26 PM
To: user@hive.apache.org
Subject: Re: Hive - external (dynamically) partitioned table

Hive doesn't support dynamically-partitioned, external tables, in part because 
the rational for external is that the data already exists or will exist outside 
the control of Hive. Sqoop can create internal tables directly as it imports 
and I believe it handles internal, dynamically-partitioned tables (but I'm not 
certain...).

In general, for external, partitioned tables, you have to use the ALTER TABLE 
command to add the partitions:

ALTER TABLE table_name ADD PARTITION (partCol = 'value1') location 'loc1';

That's obviously tedious if you have a lot of them. There's a variant for 
adding many at once:

ALTER TABLE table_name ADD [IF NOT EXISTS] PARTITION partition_spec [LOCATION 
'location1'] partition_spec [LOCATION 'location2'] ...

partition_spec:
  : (partition_col = partition_col_value, partition_col = partiton_col_value, 
...)

But you still have to specify every one of them.

I think the easiest solution is to use bash or another scripting language to 
generate the commands for you. Run "hadoop fs -lsr /root/path/of/table", then 
hack the output into a script for the ALTER TABLE commands.

HTH,
Dean

On Fri, Jul 26, 2013 at 6:00 AM, Omkar Joshi 
<omkar.jo...@lntinfotech.com<mailto:omkar.jo...@lntinfotech.com>> wrote:
I'm having a table in MySQL viz. nas_comps.
select comp_code, count(leg_id) from nas_comps_01012011_31012011 n group by 
comp_code;
comp_code     count(leg_id)
'J'           20640
'Y'           39680
First, I imported data onto HDFSHadoop version 1.0.2) using Sqoop :
sqoop import --connect 
jdbc:mysql://172.25.37.135/pros_olap2<http://172.25.37.135/pros_olap2> 
--username hadoopranch --password hadoopranch --query "select * from nas_comps 
where dep_date between '2011-01-01' and '2011-01-10' AND \$CONDITIONS" -m 1 
--target-dir /pros/olap2/dataimports/nas_comps
Then, I created an external, partitioned Hive table :
/*shows the partitions on 'describe' but not 'show partitions'*/
create external table  nas_comps(DS_NAME string,DEP_DATE string, CRR_CODE 
string,FLIGHT_NO string,ORGN string,DSTN string,PHYSICAL_CAP int,ADJUSTED_CAP 
int,CLOSED_CAP int) PARTITIONED BY (LEG_ID int, month INT, COMP_CODE string) 
location '/pros/olap2/dataimports/nas_comps'
The partition columns are shown when described :
hive> describe extended nas_comps;
OK
ds_name string
dep_date        string
crr_code        string
flight_no       string
orgn    string
dstn    string
physical_cap    int
adjusted_cap    int
closed_cap      int
leg_id  int
month   int
comp_code       string

Detailed Table Information      Table(tableName:nas_comps, 
dbName:pros_olap2_optim, owner:hadoopranch, createTime:1374849456, 
lastAccessTime:0, retention:0, 
sd:StorageDescriptor(cols:[FieldSchema(name:ds_name, type:string, 
comment:null), FieldSchema(name:dep_date, type:string, comment:null), 
FieldSchema(name:crr_code, type:string, comment:null), 
FieldSchema(name:flight_no, type:string, comment:null), FieldSchema(name:orgn, 
type:string, comment:null), FieldSchema(name:dstn, type:string, comment:null), 
FieldSchema(name:physical_cap, type:int, comment:null), 
FieldSchema(name:adjusted_cap, type:int, comment:null), 
FieldSchema(name:closed_cap, type:int, comment:null), FieldSchema(name:leg_id, 
type:int, comment:null), FieldSchema(name:month, type:int, comment:null), 
FieldSchema(name:comp_code, type:string, comment:null)], 
location:hdfs://172.25.37.21:54300/pros/olap2/dataimports/nas_comps<http://172.25.37.21:54300/pros/olap2/dataimports/nas_comps>,
 inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, 
compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, 
serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
parameters:{serialization.format=1}), bucketCols:[], sortCols:[], 
parameters:{}), partitionKeys:[FieldSchema(name:leg_id, type:int, 
comment:null), FieldSchema(name:month, type:int, comment:null), 
FieldSchema(name:comp_code, type:string, comment:null)], 
parameters:{EXTERNAL=TRUE, transient_lastDdlTime=1374849456}, 
viewOriginalText:null, viewExpandedText:null, tableType:EXTERNAL_TABLE)
But I'm not sure if the partitions are created because :
hive> show partitions nas_comps;
OK
Time taken: 0.599 seconds


select count(1) from nas_comps;
returns 0 records
How do I create an external Hive table with dynamic partitions?




Regards,
Omkar Joshi


________________________________
The contents of this e-mail and any attachment(s) may contain confidential or 
privileged information for the intended recipient(s). Unintended recipients are 
prohibited from taking action on the basis of information in this e-mail and 
using or disseminating the information, and must notify the sender and delete 
it from their system. L&T Infotech will not accept responsibility or liability 
for the accuracy or completeness of, or the presence of any virus or disabling 
code in this e-mail"



--
Dean Wampler, Ph.D.
@deanwampler
http://polyglotprogramming.com

Reply via email to