RE: Running Spark-sql on Hive metastore

2016-01-31 Thread Mich Talebzadeh
Thanks Jeff, Much like RDBMS that caches data, the same I believe happens in Spark as well with 0n-memory operations. I ran each job three times to reduce the impact from physical IOs. It is mentioned below (three runs). I agree with you that this is only a test with two clusters but essenti

Re: Running Spark-sql on Hive metastore

2016-01-31 Thread Xuefu Zhang
For Hive on Spark, there is a startup cost. The second run should be faster. More importantly, it looks like you have 18 map tasks but only your cluster only runs two of them at a time. Thus, you cluster is basically having only two way parallelism. If you configure your cluster to give more capaci

Running Spark-sql on Hive metastore

2016-01-31 Thread Mich Talebzadeh
Hi, * Spark 1.5.2 on Hive 1.2.1 * Hive 1.2.1 on Spark 1.3.1 * Oracle Release 11.2.0.1.0 * Hadoop 2.6 I am running spark-sql using Hive metastore and I am pleasantly surprised by the speed by which Spark performs certain queries on Hive tables. I import

Difference between hive.mapjoin.smalltable.filesize and hive.auto.convert.join.noconditionaltask.size

2016-01-31 Thread Jim Green
Sharing one article about the difference between hive.mapjoin.smalltable.filesize and hive.auto.convert.join.noconditionaltask.size. Although both of them can control the behavior of map join. http://www.openkb.info/2016/01/difference-between-hivemapjoinsmalltabl.html -- Thanks, www.openkb.info

Re: Importing Oracle data into Hive

2016-01-31 Thread Ashok Kumar
Thank you Mich and Jorn foryour help. Very useful indeed. On Sunday, 31 January 2016, 13:43, Mich Talebzadeh wrote: #yiv6166657167 -- filtered {panose-1:2 4 5 3 5 4 6 3 2 4;}#yiv6166657167 filtered {font-family:Calibri;panose-1:2 15 5 2 2 2 4 3 2 4;}#yiv6166657167 filtered {}#yiv61

RE: Importing Oracle data into Hive

2016-01-31 Thread Mich Talebzadeh
You will need to have Oracle Database 11g JDBC Driver ojdbc6.jar installed in $SQOOP_HOME/lib. You can download it from here The approach I prefer is to let Sqoop import it as a text file to a staging table and then inse

Re: Importing Oracle data into Hive

2016-01-31 Thread Jörn Franke
Well, you can create an empty Hive table in Orc format and use --hive-override in sqoop Alternatively you can use --hive-import and set hive.default.format I recommend to define the schema properly on the command line, because sqoop detection of formats is based on jdbc (Java) types which is no

Re: Importing Oracle data into Hive

2016-01-31 Thread Ashok Kumar
Thanks, Can sqoop create this table as ORC in Hive? On Sunday, 31 January 2016, 13:13, Ashok Kumar wrote: Thanks. Can sqoop create this table as ORC in Hive? On Sunday, 31 January 2016, 13:11, Nitin Pawar wrote: check sqoop On Sun, Jan 31, 2016 at 6:36 PM, Ashok Kumar wrot

Re: Importing Oracle data into Hive

2016-01-31 Thread Nitin Pawar
check sqoop On Sun, Jan 31, 2016 at 6:36 PM, Ashok Kumar wrote: > Hi, > > What is the easiest method of importing data from an Oracle 11g table to > Hive please? This will be a weekly periodic job. The source table has 20 > million rows. > > I am running Hive 1.2.1 > > regards > > > -- Niti

Importing Oracle data into Hive

2016-01-31 Thread Ashok Kumar
  Hi, What is the easiest method of importing data from an Oracle 11g table to Hive please? This will be a weekly periodic job. The source table has 20 million rows. I am running Hive 1.2.1 regards