Re: Partitions

Aniket Mokashi Wed, 06 Jul 2011 12:53:45 -0700

Thanks Prashanth.

But, it means I have to fire one alter table <tname> add partition query for
every date-sub-directory I have inside '/abc/xyz'. Although, this doesn't
seem unreasonable but it would have been simpler if hive could automatically
identify the arrival of data. There was a similar example on this, but it
doesnt seem to work with partitions- http://www.simon-fortelny.com/?p=137


Thanks,
Aniket

On Wed, Jul 6, 2011 at 9:17 AM, Prashanth R <r.prasha...@gmail.com> wrote:

> Hey Aniket,
>
> Well. I dont think there is a way to insert data as you had described via
> your second command. However you could have a cron that invokes a script
> that keeps changing the insertdate and you could point it to the directory
> where it has nothing but only the files (that has data) which will be loaded
> on to hive.
>
> Let me know.
>
> - Prashanth
>
>
> On Tue, Jul 5, 2011 at 2:22 PM, Aniket Mokashi <aniket...@gmail.com>wrote:
>
>> Hi,
>>
>> I would like hive to detect the partition automatically as the directory
>> gets updated with new data (by MR job). Is it possible to do away with
>> "alter table tablename add partition (insertdate='2008-01-01') LOCATION
>> 's3n://' or 'hdfs://<path>/abc/xyz/'" command everytime I get some new
>> partition.
>> Can I have-
>> CREATE EXTERNAL TABLE IF NOT EXISTS tablename (.......) partitioned by
>> (insertdate string) Location '/abc/xyz';
>> and hive would start scanning through all available partitions
>> (sub-directories inside /abc/xyz)
>>
>> Thanks,
>> Aniket
>>
>>
>> On Fri, Jul 1, 2011 at 4:39 PM, Aniket Mokashi <aniket...@gmail.com>wrote:
>>
>>> Thanks Prashanth,
>>>
>>> select Count(*) from segmentation_data where (dt='2011-07-01');
>>>
>>> java.io.IOException: Not a file:
>>> hdfs://hadoop01:9000/data_feed/sophia/segmentation_data/1970-01-01
>>>    at
>>> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:206)
>>>    at
>>> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:261)
>>>
>>> I am not sure why it looks for 1970 year!
>>> Also, I am assuming I have to add all the partitions manually, but that
>>> seems reasonable.
>>>
>>> Thanks,
>>> Aniket
>>>
>>> On Fri, Jul 1, 2011 at 4:11 PM, Prashanth R <r.prasha...@gmail.com>wrote:
>>>
>>>> Pasting an example here:
>>>>
>>>> CREATE EXTERNAL TABLE IF NOT EXISTS tablename (.......) partitioned by
>>>> (insertdate string) ROW FORMAT SERDE
>>>> 'org.apache.hadoop.hive.contrib.serde2.JsonSerde';
>>>>
>>>> alter table tablename add partition (insertdate='2008-01-01') LOCATION
>>>> 's3n://' or 'hdfs://<path>/abc/xyz/'
>>>>
>>>> - Prashanth
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Jul 1, 2011 at 3:57 PM, Aniket Mokashi <aniket...@gmail.com>wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I have a data on HDFS that is already stored into directories as per
>>>>> date. for example- /abc/xyz/yyyy-mm-d1, /abc/xyz/yyyy-mm-d2. How do I 
>>>>> create
>>>>> external table with partition key as date to point to data in this
>>>>> directory?
>>>>> Please advise.
>>>>>
>>>>> Thanks,
>>>>> Aniket
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> - Prash
>>>>
>>>
>>>
>>>
>>> --
>>> "...:::Aniket:::... Quetzalco@tl"
>>>
>>
>>
>>
>> --
>> "...:::Aniket:::... Quetzalco@tl"
>>
>
>
>
> --
> - Prash
>



-- 
"...:::Aniket:::... Quetzalco@tl"

Re: Partitions

Reply via email to