Re: Hive on EMR on S3 : Beginner

Pedro Figueiredo Fri, 24 Aug 2012 06:32:29 -0700

Hi,

On 24 Aug 2012, at 14:08, Ravi Shetye wrote:


> 
> Is this all I need to do to load the data?
> how will the system know what data will go into what partition?
> As I understand the partition columns should be psedo columns and not part of 
> the actual data.

Sorry, I just copy&pasted your table definition, obviously the results table 
will be something else. The partitions will come from the select statement.

> 
> Also if I have to load just 2 of the files say 
> s3://logs/ad1date1.log.gz and 
> s3://logs/ad2date4.log.gz  how do I specify it.

You have to have them in a separate directory. You could have dailies:

s3://logs/date1/files.for.day1.gz
s3://logs/date2/files.for.day2.gz
etc.

If you create this table as a partitioned table (PARTITIONED BY (date STRING) 
LOCATION 's3n://logs/') , you can then filter on date = 'date1'. To get the 
partitions in the table do a recover or add the partitions statically.

Cheers,

Pedro

Pedro Figueiredo
Skype: pfig.89clouds
http://89clouds.com/ - Big Data Consulting

Re: Hive on EMR on S3 : Beginner

Reply via email to