Re: spark sql versus interactive hive versus hive

Jörn Franke Sat, 11 Feb 2017 00:22:48 -0800

I think this is a rather simplistic view. All the tools to computation 
in-memory in the end. For certain type of computation and usage patterns it 
makes sense to keep them in memory. For example, most of the machine learning 
approaches require to include the same data in several iterative calculations. 
This is what Spark has been designed for. Most aggregations/precalculations are 
just done by using the data in-memory once. Here is where Hive+Tez and to a 
limited extend Spark can help. The third pattern, where users interactively 
query the data i.e. Many concurrent users query the same or similar data very 
frequently, is addressed by Hive on Tez + Llap, Hive Tez+ Ignite or Spark + 
ignite ( and there are other tools).

So it is important to understand what your users want to do.

Then, you have a lot of benchmark data on the web to compare. However I always 
recommend to generate or use data yourself that fits to the data the company is 
using. Keep also in mind that time is needed to convert this data in a 
efficient format.

> On 10 Feb 2017, at 20:36, Saikat Kanjilal <sxk1...@hotmail.com> wrote:
> 
> Folks,
> 
> I'm embarking on a project to build a POC around spark sql, I was wondering 
> if anyone has experience in comparing spark sql with hive or interactive hive 
> and data points around the types of queries suited for both, I am naively 
> assuming that spark sql will beat hive in all queries given that computations 
> are mostly done in memory but want to hear some more data  points around 
> queries that maybe problematic in spark-sql, also are there debugging tools 
> people ordinarily use with spark-sql to troubleshoot perf related issues.
> 
> 
> I look forward to hearing from the community.
> 
> Regards

Re: spark sql versus interactive hive versus hive

Reply via email to