No indexing in hive.

On Sunday, May 13, 2012, Ranjith wrote:

> Indexes can be built on tables managed by hive. For external tables I do
> not believe that to be true. Please feel to correct if I am wrong.
>
> Thanks,
> Ranjith
>
> On May 12, 2012, at 9:24 PM, Nanda Vijaydev 
> <nanda.vijay...@gmail.com<javascript:_e({}, 'cvml', 
> 'nanda.vijay...@gmail.com');>>
> wrote:
>
> In hive, the raw data is in HDFS and there is a metadata layer that
> defines the structure of the raw data. Table is usually a reference to
> metadata, probably in a mySQL server and it contains a reference to the
> location of the data in HDFS, type of delimiter or serde to use and so on.
> 1. With hive managed tables, when you drop a table, both the metadata in
> mysql and raw data on the cluster gets deleted.
> 2. With external tables, when you drop a table, just the metadata gets
> deleted and the raw data continues to exist on the cluster.
>
>
> On Thu, May 10, 2012 at 3:02 PM, David Kulp 
> <dk...@fiksu.com<javascript:_e({}, 'cvml', 'dk...@fiksu.com');>
> > wrote:
>
>> It's simpler than this.  All files look the same -- and are often very
>> simple delimited text -- whether managed or external.  The only difference
>> is that the files associated with a managed table are dropped when the
>> table is dropped and files that are loaded into a managed table are moved
>> into hive's private path.  External tables never move or remove files.
>>  Performance is the same.
>>
>> On May 10, 2012, at 5:52 PM, kulkarni.swar...@gmail.com<javascript:_e({}, 
>> 'cvml', 'kulkarni.swar...@gmail.com');>wrote:
>>
>> > I am pretty new to hive and was trying to clearly understand the
>> difference between a managed and an external table.
>> >
>> > As my current understanding stands, a managed table is a table whose
>> data is completely owned by hive whereas an external table is usually
>> created to have a hive frontend for the data managed in external systems.I
>> would suppose this would mean that a query on an external table goes out to
>> fetch data from the given external table, deserialize according to the
>> given/suitable SerDe and then show the output of the query in hive format.
>> >
>> > So does this mean that cost of using external tables is much higher
>> than the native ones? Or is there some caching that comes into play that I
>> am not seeing right now.
>> >
>> > Thanks for the help.
>> >
>> > --
>> > Swarnim
>>
>>
>

-- 

Raja Thiruvathuru

Reply via email to