Good info Edward. Thanks.

Thanks,
Ranjith

On May 13, 2012, at 2:33 PM, Edward Capriolo <edlinuxg...@gmail.com> wrote:

> The original design docs say you can not build indexes on external tables but 
> I tried it in 0.8.x and confirmed you can.
> 
> On Sunday, May 13, 2012, Ranjith <ranjith.raghunat h...@gmail.com> wrote:
> > Indexes can be built on tables managed by hive. For external tables I do 
> > not believe that to be true. Please feel to correct if I am wrong.
> >
> > Thanks,
> > Ranjith
> > On May 12, 2012, at 9:24 PM, Nanda Vijaydev <nanda.vijay...@gmail.com> 
> > wrote:
> >
> > In hive, the raw data is in HDFS and there is a metadata layer that defines 
> > the structure of the raw data. Table is usually a reference to metadata, 
> > probably in a mySQL server and it contains a reference to the location of 
> > the data in HDFS, type of delimiter or serde to use and so on.  
> > 1. With hive managed tables, when you drop a table, both the metadata in 
> > mysql and raw data on the cluster gets deleted. 
> > 2. With external tables, when you drop a table, just the metadata gets 
> > deleted and the raw data continues to exist on the cluster. 
> >  
> > On Thu, May 10, 2012 at 3:02 PM, David Kulp <dk...@fiksu.com> wrote:
> >>
> >> It's simpler than this.  All files look the same -- and are often very 
> >> simple delimited text -- whether managed or external.  The only difference 
> >> is that the files associated with a managed table are dropped when the 
> >> table is dropped and files that are loaded into a managed table are moved 
> >> into hive's private path.  External tables never move or remove files.  
> >> Performance is the same.
> >>
> >> On May 10, 2012, at 5:52 PM, kulkarni.swar...@gmail.com wrote:
> >>
> >> > I am pretty new to hive and was trying to clearly understand the 
> >> > difference between a managed and an external table.
> >> >
> >> > As my current understanding stands, a managed table is a table whose 
> >> > data is completely owned by hive whereas an external table is usually 
> >> > created to have a hive frontend for the data managed in external 
> >> > systems.I would suppose this would mean that a query on an external 
> >> > table goes out to fetch data from the given external table, deserialize 
> >> > according to the given/suitable SerDe and then show the output of the 
> >> > query in hive format.
> >> >
> >> > So does this mean that cost of using external tables is much higher than 
> >> > the native ones? Or is there some caching that comes into play that I am 
> >> > not seeing right now.
> >> >
> >> > Thanks for the help.
> >> >
> >> > --
> >> > Swarnim
> >>
> >
> >

Reply via email to