Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-12 Thread Mich Talebzadeh
I just read further notes on LLAP. As Gopal explained LLAP has more to do that just in-memory and I quote Gopal: "... LLAP is designed to be hammered by multiple user sessions running different queries, designed to automate the cache eviction & selection process. There's no user visible explicit

Re: Any way in hive to have functionality like SQL Server collation on Case sensitivity

2016-07-12 Thread Mahender Sarangam
Thanks Dudu, I would like to know dealing with case in-sensitivity in other project. is every one converting to toLower() or toUpper() in the Joins ? . Is there any setting applied at Hive Server level which gets reflected in all the queries ? /MS On 5/25/2016 9:05 AM, Markovitz, Dudu wrote:

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-12 Thread Mich Talebzadeh
Thanks Alan. Point taken. In mitigation, here are members in Spark forum who have shown (interest) in using Hive directly and I quote one: "Did you have any benchmark for using Spark as backend engine for Hive vs using Spark thrift server (and run spark code for hive queries)? We are using later

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-12 Thread Alan Gates
> On Jul 11, 2016, at 16:22, Mich Talebzadeh wrote: > > > • If I add LLAP, will that be more efficient in terms of memory usage > compared to Hive or not? Will it keep the data in memory for reuse or not. > Yes, this is exactly what LLAP does. It keeps a cache of hot data (hot co

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-12 Thread Mich Talebzadeh
I guess that is what DAG adds up to with Tez Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw * http://talebzadehmich.wordpress.com *Disclaimer

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-12 Thread Marcin Tustin
More like 2x than 10x as I recall. On Tue, Jul 12, 2016 at 9:39 AM, Mich Talebzadeh wrote: > thanks Marcin. > > What Is your guesstimate on the order of "faster" please? > > Cheers > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCC

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-12 Thread Mich Talebzadeh
thanks Marcin. What Is your guesstimate on the order of "faster" please? Cheers Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw * http://talebza

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-12 Thread Marcin Tustin
Quick note - my experience (no benchmarks) is that Tez without LLAP (we're still not on hive 2) is faster than MR by some way. I haven't dug into why that might be. On Tue, Jul 12, 2016 at 9:19 AM, Mich Talebzadeh wrote: > sorry I completely miss your points > > I was NOT talking about Exadata.

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-12 Thread Mich Talebzadeh
sorry I completely miss your points I was NOT talking about Exadata. I was comparing Oracle 12c caching with that of Oracle TimesTen. no one mentioned Exadata here and neither storeindex etc.. so if Tez is not MR with DAG could you give me an example of how it works. No opinions but relevant to

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-12 Thread Jörn Franke
I think the comparison with Oracle rdbms and oracle times ten is not so good. There are times when the in-memory database of Oracle is slower than the rdbms (especially in case of Exadata) due to the issue that in-memory - as in Spark - means everything is in memory and everything is always pro

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-12 Thread Mich Talebzadeh
I suggest that you try it for yourself then Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw * http://talebzadehmich.wordpress.com *Disclaimer:*

RE: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-12 Thread Markovitz, Dudu
The principals are very clear and if our use-case was a complex one, combined from many stages I would expect performance benefits from the Spark engine. Since our use-case is a simple one and most of the work here is just reading the files, I don’t see how we can explain the performance differen

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-12 Thread Mich Talebzadeh
That is only a plan not what execution engine is doing. As I stated before Spark uses DAG + in-memory computing. MR is serial on disk. The key is the execution here or rather the execution engine. In general The standard MapReduce as I know reads the data from HDFS, apply map-reduce algorithm

RE: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-12 Thread Markovitz, Dudu
I don’t see how this explains the time differences. Dudu From: Mich Talebzadeh [mailto:mich.talebza...@gmail.com] Sent: Tuesday, July 12, 2016 10:56 AM To: user Cc: user @spark Subject: Re: Using Spark on Hive with Hive also using Spark as its execution engine This the whole idea. Spark uses

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-12 Thread Mich Talebzadeh
This the whole idea. Spark uses DAG + IM, MR is classic This is for Hive on Spark hive> explain select max(id) from dummy_parquet; OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Spark Edges: Reducer 2 <- Map 1

RE: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-12 Thread Markovitz, Dudu
This is a simple task – Read the files, find the local max value and combine the results (find the global max value). How do you explain the differences in the results? Spark reads the files and finds a local max 10X (+) faster than MR? Can you please attach the execution plan? Thanks Dudu F