Just to clarify the statement in quotes was made by the author of the
article

"We can access all objects from Hive data warehouse with HiveQL which
leverages the map-reduce architecture in background for data retrieval and
transformation and this results in latency."

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 1 March 2016 at 11:33, Mich Talebzadeh <mich.talebza...@gmail.com> wrote:

> I have not heard of Impala anymore. I saw an article in LinkedIn titled
>
> "Apache Hive Or Cloudera Impala? What is Best for me?"
>
> "We can access all objects from Hive data warehouse with HiveQL which
> leverages the map-reduce architecture in background for data retrieval and
> transformation and this results in latency."
>
> My response was
>
> This statement is no longer valid as you have choices of three engines now
> with MR, Spark and Tez. I have not used Impala myself as I don't think
> there is a need for it with Hive on Spark or Spark using Hive metastore
> providing whatever needed. Hive is for Data Warehouse and provides what is
> says on the tin. Please also bear in mind that Hive offers ORC storage
> files that provide store Index capabilities further optimizing the queries
> with additional stats at file, stripe and row group levels.
>
> Anyway the question is with Hive on Spark or Spark using Hive metastore
> what we cannot achieve that we can achieve with Impala?
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>

Reply via email to