I am afraid your points (the original thread owner) are manifestly misleading or at best half baked. Given a set of parameters one can argue from any angle. Why use Spark but not Flink. Why use this and not that. These are cyclic arguments.
- Hive can use Spark as its execution engine with excellent results compared to map-reduce. It does not mean that map-reduce is out of picture. It can also use Tez+LLAP as its execution engine. I think this shows how versatile Hive is. - Transactional support was added to Hive for ORC tables. - No transactional support with Spark SQL on ORC tables yet or on any other DB - Locking and concurrency (as used by Hive) with Spark app running a Hive context. I am not convinced this works with Spark SQL - Spark as yet does not have a Cost Based Optimizer (CBO). - Spark has a complete fork of Hive inside it. *Spark SQL is a sub-set of Hive SQL* - Hive was billed as a Data Warehouse (DW) on HDFS. - Hive is the most versatile and capable of the many SQL or SQL-like ways of accessing data on Hadoop - You can set up your copy of your RDBMS table in Hive in no time and use Sqoop to get the table data into Hive table practically in one command. For many this is the - great attraction of Hive - Ability to do real time analytics on Hive by sending real time transactional movements from RDBMS tables to Hive via the existing replication technologies. This is very handy. Today, organizations are struggling to achieve real-time integration between RDBMS silos and Big Data. Fast decision-making depends on real-time data movement that allows businesses to gather data from multiple locations into Big Data as well as conventional data warehouses. - Spark Thrift Server is basically Hive thrift server and without it would not exist - Without Hive and HiveContext there would not be Spark-sql I am a fan of Spark and use it extensively. However, you have to consider the use case when talking about a product. HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On 8 August 2016 at 02:49, 理 <wwl...@126.com> wrote: > in my opinion, multiple engine is not advantage, but reverse. it > disperse the dev energy. > consider the activity ,sparksql support all tpc ds without modify > syntax! but hive cannot. > consider the tech, dag, vectorization, etc sparksql also has, seems > the code is more efficiently. > > > regards > On 08/08/2016 08:48, Will Du <will...@gmail.com> wrote: > > First, hive supports different engines. Look forward it's dynamic engine > switch > Second, look forward hadoop 3rd gen and map reduce on memory will fill the > gap > > Thanks, > Will > > On 2016年8月7日, at 20:27, 理 <wwl...@126.com> wrote: > > hi, > sparksql improve so fast, both hive and sparksql are similar, so > hive will lost or not? > > regards > > > > > >