Hi, On 24 Aug 2012, at 14:08, Ravi Shetye wrote:
> > Is this all I need to do to load the data? > how will the system know what data will go into what partition? > As I understand the partition columns should be psedo columns and not part of > the actual data. Sorry, I just copy&pasted your table definition, obviously the results table will be something else. The partitions will come from the select statement. > > Also if I have to load just 2 of the files say > s3://logs/ad1date1.log.gz and > s3://logs/ad2date4.log.gz how do I specify it. You have to have them in a separate directory. You could have dailies: s3://logs/date1/files.for.day1.gz s3://logs/date2/files.for.day2.gz etc. If you create this table as a partitioned table (PARTITIONED BY (date STRING) LOCATION 's3n://logs/') , you can then filter on date = 'date1'. To get the partitions in the table do a recover or add the partitions statically. Cheers, Pedro Pedro Figueiredo Skype: pfig.89clouds http://89clouds.com/ - Big Data Consulting