Just to clarify the statement in quotes was made by the author of the article
"We can access all objects from Hive data warehouse with HiveQL which leverages the map-reduce architecture in background for data retrieval and transformation and this results in latency." Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 1 March 2016 at 11:33, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > I have not heard of Impala anymore. I saw an article in LinkedIn titled > > "Apache Hive Or Cloudera Impala? What is Best for me?" > > "We can access all objects from Hive data warehouse with HiveQL which > leverages the map-reduce architecture in background for data retrieval and > transformation and this results in latency." > > My response was > > This statement is no longer valid as you have choices of three engines now > with MR, Spark and Tez. I have not used Impala myself as I don't think > there is a need for it with Hive on Spark or Spark using Hive metastore > providing whatever needed. Hive is for Data Warehouse and provides what is > says on the tin. Please also bear in mind that Hive offers ORC storage > files that provide store Index capabilities further optimizing the queries > with additional stats at file, stripe and row group levels. > > Anyway the question is with Hive on Spark or Spark using Hive metastore > what we cannot achieve that we can achieve with Impala? > > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > >