It's simpler than this. All files look the same -- and are often very simple delimited text -- whether managed or external. The only difference is that the files associated with a managed table are dropped when the table is dropped and files that are loaded into a managed table are moved into hive's private path. External tables never move or remove files. Performance is the same.
On May 10, 2012, at 5:52 PM, kulkarni.swar...@gmail.com wrote: > I am pretty new to hive and was trying to clearly understand the difference > between a managed and an external table. > > As my current understanding stands, a managed table is a table whose data is > completely owned by hive whereas an external table is usually created to have > a hive frontend for the data managed in external systems.I would suppose this > would mean that a query on an external table goes out to fetch data from the > given external table, deserialize according to the given/suitable SerDe and > then show the output of the query in hive format. > > So does this mean that cost of using external tables is much higher than the > native ones? Or is there some caching that comes into play that I am not > seeing right now. > > Thanks for the help. > > -- > Swarnim