Re: Hive and Impala

Edward Capriolo Tue, 01 Mar 2016 14:22:11 -0800

My nocks on impala. (not intended to be a post knocking impala)

Impala really has not delivered on the complex types that hive has (after
promising it for quite a while), also it only works with the 'blessed'
input formats, parquet, avro, text.

It is very annoying to work with impala, In my version if you create a
partition in hive impala does not see it. You have to run "refresh".

In impala I do not have all the UDFS that hive has like percentile, etc.

Impala is fast. Many data-analysts / data-scientist types that can't wait
10 seconds for a query so when I need top produce something for them I make
sure the data has no complex types and uses a table type that impala
understands.

But for my work I still work primarily in hive, because I do not want to
deal with all the things that impala does not have/might have/ and when I
need something special like my own UDFs it is easier to whip up the
solution in hive.

Having worked with M$ SQL server, and vertica, Impala is on par with them
but I don'think of it like i think of hive. To me it just feels like a
vertica that I can cheat loading sometimes because it is backed by hdfs.

Hive is something different, I am making pipelines, I am transforming data,
doing streaming, writing custom udfs, querying JSON directly. Its not !=
impala.

::random message of the day::

On Tue, Mar 1, 2016 at 4:38 PM, Ashok Kumar <ashok34...@yahoo.com> wrote:

>
> Dr Mitch,
>
> My two cents here.
>
> I don't have direct experience of Impala but in my humble opinion I share
> your views that Hive provides the best metastore of all Big Data systems.
> Looking around almost every product in one form and shape use Hive code
> somewhere. My colleagues inform me that Hive is one of the most stable Big
> Data products.
>
> With the capabilities of Spark on Hive and Hive on Spark or Tez plus of
> course MR, there is really little need for many other products in the same
> space. It is good to keep things simple.
>
> Warmest
>
>
> On Tuesday, 1 March 2016, 11:33, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>
> I have not heard of Impala anymore. I saw an article in LinkedIn titled
>
> "Apache Hive Or Cloudera Impala? What is Best for me?"
>
> "We can access all objects from Hive data warehouse with HiveQL which
> leverages the map-reduce architecture in background for data retrieval and
> transformation and this results in latency."
>
> My response was
>
> This statement is no longer valid as you have choices of three engines now
> with MR, Spark and Tez. I have not used Impala myself as I don't think
> there is a need for it with Hive on Spark or Spark using Hive metastore
> providing whatever needed. Hive is for Data Warehouse and provides what is
> says on the tin. Please also bear in mind that Hive offers ORC storage
> files that provide store Index capabilities further optimizing the queries
> with additional stats at file, stripe and row group levels.
>
> Anyway the question is with Hive on Spark or Spark using Hive metastore
> what we cannot achieve that we can achieve with Impala?
>
>
> Dr Mich Talebzadeh
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
> http://talebzadehmich.wordpress.com
>
>
>
>

Re: Hive and Impala

Reply via email to