I am afraid your points (the original thread owner) are manifestly
misleading or at best half baked. Given a set of parameters one can argue
from any angle. Why use Spark but not Flink. Why use this and not that.
These are cyclic arguments.


   - Hive can use Spark as its execution engine with excellent results
   compared to map-reduce. It does not mean that map-reduce is out of picture.
   It can also use Tez+LLAP as its execution engine. I think this shows how
   versatile Hive is.
   - Transactional support was added to Hive for ORC tables.
   - No transactional support with Spark SQL on ORC tables yet or on any
   other DB
   - Locking and concurrency (as used by Hive) with Spark app running a
   Hive context. I am not convinced this works with Spark SQL
   - Spark as yet does not have a Cost Based Optimizer (CBO).
   - Spark has a complete fork of Hive inside it. *Spark SQL is a sub-set
   of Hive SQL*
   - Hive was billed as a Data Warehouse (DW)  on HDFS.
   - Hive is the most versatile and capable of the many SQL or SQL-like
   ways of accessing data on Hadoop
   - You can set up your copy of your RDBMS table in Hive in no time and
   use Sqoop to get the table data into Hive table practically in one command.
   For many this is the
   - great attraction of Hive
   - Ability to do real time analytics on Hive by sending real time
   transactional movements from RDBMS tables to Hive via the existing
   replication technologies. This is very handy. Today, organizations are
   struggling to achieve real-time integration between RDBMS silos and Big
   Data. Fast decision-making depends on real-time data movement that
   allows businesses to gather data from multiple locations into Big Data as
   well as conventional data warehouses.
   - Spark Thrift Server is basically Hive thrift server and without it
   would not exist
   - Without Hive and HiveContext there would not be Spark-sql


I am a fan of Spark and use it extensively. However, you have to consider
the use case when talking about a product.

HTH












Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 8 August 2016 at 02:49, 理 <wwl...@126.com> wrote:

> in  my opinion, multiple  engine  is not  advantage,  but reverse.  it
>  disperse  the dev energy.
>   consider  the activity ,sparksql  support  all  tpc ds without modify
> syntax!  but  hive cannot.
> consider the tech,   dag, vectorization,   etc sparksql also has,   seems
> the  code  is  more   efficiently.
>
>
> regards
> On 08/08/2016 08:48, Will Du <will...@gmail.com> wrote:
>
> First, hive supports different engines. Look forward it's dynamic engine
> switch
> Second, look forward hadoop 3rd gen and map reduce on memory will fill the
> gap
>
> Thanks,
> Will
>
> On 2016年8月7日, at 20:27, 理 <wwl...@126.com> wrote:
>
> hi,
>   sparksql improve  so fast,   both  hive and sparksql  are similar,  so
> hive  will  lost  or not?
>
> regards
>
>
>
>
>
>

Reply via email to