Thanks Prashanth. But, it means I have to fire one alter table <tname> add partition query for every date-sub-directory I have inside '/abc/xyz'. Although, this doesn't seem unreasonable but it would have been simpler if hive could automatically identify the arrival of data. There was a similar example on this, but it doesnt seem to work with partitions- http://www.simon-fortelny.com/?p=137
Thanks, Aniket On Wed, Jul 6, 2011 at 9:17 AM, Prashanth R <r.prasha...@gmail.com> wrote: > Hey Aniket, > > Well. I dont think there is a way to insert data as you had described via > your second command. However you could have a cron that invokes a script > that keeps changing the insertdate and you could point it to the directory > where it has nothing but only the files (that has data) which will be loaded > on to hive. > > Let me know. > > - Prashanth > > > On Tue, Jul 5, 2011 at 2:22 PM, Aniket Mokashi <aniket...@gmail.com>wrote: > >> Hi, >> >> I would like hive to detect the partition automatically as the directory >> gets updated with new data (by MR job). Is it possible to do away with >> "alter table tablename add partition (insertdate='2008-01-01') LOCATION >> 's3n://' or 'hdfs://<path>/abc/xyz/'" command everytime I get some new >> partition. >> Can I have- >> CREATE EXTERNAL TABLE IF NOT EXISTS tablename (.......) partitioned by >> (insertdate string) Location '/abc/xyz'; >> and hive would start scanning through all available partitions >> (sub-directories inside /abc/xyz) >> >> Thanks, >> Aniket >> >> >> On Fri, Jul 1, 2011 at 4:39 PM, Aniket Mokashi <aniket...@gmail.com>wrote: >> >>> Thanks Prashanth, >>> >>> select Count(*) from segmentation_data where (dt='2011-07-01'); >>> >>> java.io.IOException: Not a file: >>> hdfs://hadoop01:9000/data_feed/sophia/segmentation_data/1970-01-01 >>> at >>> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:206) >>> at >>> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:261) >>> >>> I am not sure why it looks for 1970 year! >>> Also, I am assuming I have to add all the partitions manually, but that >>> seems reasonable. >>> >>> Thanks, >>> Aniket >>> >>> On Fri, Jul 1, 2011 at 4:11 PM, Prashanth R <r.prasha...@gmail.com>wrote: >>> >>>> Pasting an example here: >>>> >>>> CREATE EXTERNAL TABLE IF NOT EXISTS tablename (.......) partitioned by >>>> (insertdate string) ROW FORMAT SERDE >>>> 'org.apache.hadoop.hive.contrib.serde2.JsonSerde'; >>>> >>>> alter table tablename add partition (insertdate='2008-01-01') LOCATION >>>> 's3n://' or 'hdfs://<path>/abc/xyz/' >>>> >>>> - Prashanth >>>> >>>> >>>> >>>> >>>> On Fri, Jul 1, 2011 at 3:57 PM, Aniket Mokashi <aniket...@gmail.com>wrote: >>>> >>>>> Hi, >>>>> >>>>> I have a data on HDFS that is already stored into directories as per >>>>> date. for example- /abc/xyz/yyyy-mm-d1, /abc/xyz/yyyy-mm-d2. How do I >>>>> create >>>>> external table with partition key as date to point to data in this >>>>> directory? >>>>> Please advise. >>>>> >>>>> Thanks, >>>>> Aniket >>>>> >>>>> >>>> >>>> >>>> -- >>>> - Prash >>>> >>> >>> >>> >>> -- >>> "...:::Aniket:::... Quetzalco@tl" >>> >> >> >> >> -- >> "...:::Aniket:::... Quetzalco@tl" >> > > > > -- > - Prash > -- "...:::Aniket:::... Quetzalco@tl"