In hive, the raw data is in HDFS and there is a metadata layer that defines
the structure of the raw data. Table is usually a reference to metadata,
probably in a mySQL server and it contains a reference to the location of
the data in HDFS, type of delimiter or serde to use and so on.
1. With hive managed tables, when you drop a table, both the metadata in
mysql and raw data on the cluster gets deleted.
2. With external tables, when you drop a table, just the metadata gets
deleted and the raw data continues to exist on the cluster.


On Thu, May 10, 2012 at 3:02 PM, David Kulp <dk...@fiksu.com> wrote:

> It's simpler than this.  All files look the same -- and are often very
> simple delimited text -- whether managed or external.  The only difference
> is that the files associated with a managed table are dropped when the
> table is dropped and files that are loaded into a managed table are moved
> into hive's private path.  External tables never move or remove files.
>  Performance is the same.
>
> On May 10, 2012, at 5:52 PM, kulkarni.swar...@gmail.com wrote:
>
> > I am pretty new to hive and was trying to clearly understand the
> difference between a managed and an external table.
> >
> > As my current understanding stands, a managed table is a table whose
> data is completely owned by hive whereas an external table is usually
> created to have a hive frontend for the data managed in external systems.I
> would suppose this would mean that a query on an external table goes out to
> fetch data from the given external table, deserialize according to the
> given/suitable SerDe and then show the output of the query in hive format.
> >
> > So does this mean that cost of using external tables is much higher than
> the native ones? Or is there some caching that comes into play that I am
> not seeing right now.
> >
> > Thanks for the help.
> >
> > --
> > Swarnim
>
>

Reply via email to