My nocks on impala. (not intended to be a post knocking impala) Impala really has not delivered on the complex types that hive has (after promising it for quite a while), also it only works with the 'blessed' input formats, parquet, avro, text.
It is very annoying to work with impala, In my version if you create a partition in hive impala does not see it. You have to run "refresh". In impala I do not have all the UDFS that hive has like percentile, etc. Impala is fast. Many data-analysts / data-scientist types that can't wait 10 seconds for a query so when I need top produce something for them I make sure the data has no complex types and uses a table type that impala understands. But for my work I still work primarily in hive, because I do not want to deal with all the things that impala does not have/might have/ and when I need something special like my own UDFs it is easier to whip up the solution in hive. Having worked with M$ SQL server, and vertica, Impala is on par with them but I don'think of it like i think of hive. To me it just feels like a vertica that I can cheat loading sometimes because it is backed by hdfs. Hive is something different, I am making pipelines, I am transforming data, doing streaming, writing custom udfs, querying JSON directly. Its not != impala. ::random message of the day:: On Tue, Mar 1, 2016 at 4:38 PM, Ashok Kumar <ashok34...@yahoo.com> wrote: > > Dr Mitch, > > My two cents here. > > I don't have direct experience of Impala but in my humble opinion I share > your views that Hive provides the best metastore of all Big Data systems. > Looking around almost every product in one form and shape use Hive code > somewhere. My colleagues inform me that Hive is one of the most stable Big > Data products. > > With the capabilities of Spark on Hive and Hive on Spark or Tez plus of > course MR, there is really little need for many other products in the same > space. It is good to keep things simple. > > Warmest > > > On Tuesday, 1 March 2016, 11:33, Mich Talebzadeh < > mich.talebza...@gmail.com> wrote: > > > I have not heard of Impala anymore. I saw an article in LinkedIn titled > > "Apache Hive Or Cloudera Impala? What is Best for me?" > > "We can access all objects from Hive data warehouse with HiveQL which > leverages the map-reduce architecture in background for data retrieval and > transformation and this results in latency." > > My response was > > This statement is no longer valid as you have choices of three engines now > with MR, Spark and Tez. I have not used Impala myself as I don't think > there is a need for it with Hive on Spark or Spark using Hive metastore > providing whatever needed. Hive is for Data Warehouse and provides what is > says on the tin. Please also bear in mind that Hive offers ORC storage > files that provide store Index capabilities further optimizing the queries > with additional stats at file, stripe and row group levels. > > Anyway the question is with Hive on Spark or Spark using Hive metastore > what we cannot achieve that we can achieve with Impala? > > > Dr Mich Talebzadeh > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > http://talebzadehmich.wordpress.com > > > >