Well Had to write a Scala code and compile it with Maven to make it work.

Still doing it. The good thing as I expected it is doing a Direct Path Read
(as opposed to the Conventional Path Read) from the source Oracle database.

+-------------------------------------------------------------------------------------------+
|  What Object causing the highest resource wait from
V$ACTIVE_SESSION_HISIORY, dba_objects |
+-------------------------------------------------------------------------------------------+
Object Name                    Type
Event                                              Total Wait Time/ms
------------------------------ ----------
-------------------------------------------------- ------------------
DUMMY
TABLE
3
DUMMY                          TABLE      direct path
read                                                   56

Well it is a billion table loaded from DF into temp table. The code
actually creates the Hive ORC table in Hive database and populates it from
temp table.


​



See How it goes


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 30 April 2016 at 15:24, Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> yes I was thinking of that. use Spark to load JDBC data from Oracle and
> flush it into ORC table in Hive.
>
> Now I am using Spark 1.6.1 and JDBC driver as I recall (I raised a thread
> for it) throwing error.
>
> This was working under Spark 1.5.2.
>
> Cheers
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 30 April 2016 at 15:20, Marcin Tustin <mtus...@handybook.com> wrote:
>
>> No, the execution engines are not in general interchangeable. The Hive
>> project uses an abstraction layer to be able to plug different execution
>> engines. I don't know if sqoop uses hive code, or if it uses an old
>> version, or what.
>>
>> As with many things in the hadoop world, if you want to know if there's
>> something undocumented, your best bet is to look at the source code.
>>
>> My suggestion would be to (1) make sure you're executing somewhere close
>> to the data - i.e. on nodemanagers colocated with datanodes; (2) profile to
>> make sure the slowness really is where you think; and (3) if you really
>> can't get the speed you need, try writing a small spark job to do the
>> export. Newer versions of spark seem faster.
>>
>>
>> On Sat, Apr 30, 2016 at 10:05 AM, Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Hi Marcin,
>>>
>>> It is the speed really. The speed in which data is digested into Hive.
>>>
>>> Sqoop is two stage as I understand.
>>>
>>>
>>>    1. Take the data out of RDMSD via JADB and put in on an external
>>>    HDFS file
>>>    2. Read that file and insert into a Hive table
>>>
>>>  The issue is the second part. In general I use Hive 2 with Spark 1.3.1
>>> engine to put data into Hive table. I wondered if there was such a
>>> parameter in Sqoop to use Spark engine.
>>>
>>> Well I gather this is easier said that done.  I am importing 1 billion
>>> rows table from Oracle
>>>
>>> sqoop import --connect "jdbc:oracle:thin:@rhes564:1521:mydb12"
>>> --username scratchpad -P \
>>>         --query "select * from scratchpad.dummy where \
>>>         \$CONDITIONS" \
>>>         --split-by ID \
>>>         --hive-import  --hive-table "oraclehadoop.dummy" --target-dir
>>> "dummy"
>>>
>>>
>>> Now the fact that in hive-site.xml I have set
>>> hive.execution.engine=spark does not matter. Sqoop seems to internally set
>>> hive.execution.engine=mr anyway.
>>>
>>> May be there should be an option   --hive-execution-engine='mr/tez/spak'
>>> etc in above command?
>>>
>>> Cheers,
>>>
>>> Mich
>>>
>>>
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> On 30 April 2016 at 14:51, Marcin Tustin <mtus...@handybook.com> wrote:
>>>
>>>> They're not simply interchangeable. sqoop is written to use mapreduce.
>>>>
>>>> I actually implemented my own replacement for sqoop-export in spark,
>>>> which was extremely simple. It wasn't any faster, because the bottleneck
>>>> was the receiving database.
>>>>
>>>> Is your motivation here speed? Or correctness?
>>>>
>>>> On Sat, Apr 30, 2016 at 8:45 AM, Mich Talebzadeh <
>>>> mich.talebza...@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> What is the simplest way of making sqoop import use spark engine as
>>>>> opposed to the default mapreduce when putting data into hive table. I did
>>>>> not see any parameter for this in sqoop command line doc.
>>>>>
>>>>> Thanks
>>>>>
>>>>> Dr Mich Talebzadeh
>>>>>
>>>>>
>>>>>
>>>>> LinkedIn * 
>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>
>>>>>
>>>>>
>>>>> http://talebzadehmich.wordpress.com
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> Want to work at Handy? Check out our culture deck and open roles
>>>> <http://www.handy.com/careers>
>>>> Latest news <http://www.handy.com/press> at Handy
>>>> Handy just raised $50m
>>>> <http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/>
>>>>  led
>>>> by Fidelity
>>>>
>>>>
>>>
>>
>> Want to work at Handy? Check out our culture deck and open roles
>> <http://www.handy.com/careers>
>> Latest news <http://www.handy.com/press> at Handy
>> Handy just raised $50m
>> <http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/>
>>  led
>> by Fidelity
>>
>>
>

Reply via email to