Re: Sqoop vs spark jdbc

Mich Talebzadeh Wed, 21 Sep 2016 11:15:05 -0700

This is happening with sqoop and also putting data into Hbase table with
command line



Sqoop 1.4.6
Hadoop 2.7.3
Hive 2.0.1

I am still getting this error when using sqoop to get a simple table
data from Oracle.


 sqoop import --connect "jdbc:oracle:thin:@rhes564:1521:mydb12" --username
sh -P \
                 --query "select * from sh.sales where \
                  \$CONDITIONS" \
                  --split-by prod_id \
           --target-dir "sales"

Note that it does everything and puts data on directory on hdfs and then
sends that error


2016-09-21 19:10:31,447 [myid:] - INFO  [main:Job@1317] - Running job:
job_1474455325627_0041
2016-09-21 19:10:39,696 [myid:] - INFO  [main:Job@1338] - Job
job_1474455325627_0041 running in uber mode : false
2016-09-21 19:10:39,707 [myid:] - INFO  [main:Job@1345] -  map 0% reduce 0%
2016-09-21 19:10:53,844 [myid:] - INFO  [main:Job@1345] -  map 25% reduce 0%
2016-09-21 19:11:01,903 [myid:] - INFO  [main:Job@1345] -  map 50% reduce 0%
2016-09-21 19:11:05,924 [myid:] - INFO  [main:Job@1345] -  map 75% reduce 0%
2016-09-21 19:11:13,966 [myid:] - INFO  [main:Job@1345] -  map 100% reduce
0%
2016-09-21 19:11:14,977 [myid:] - INFO  [main:Job@1356] - Job
job_1474455325627_0041 completed successfully
2016-09-21 19:11:15,138 [myid:] - ERROR [main:ImportTool@607] - Imported
Failed: No enum constant
org.apache.hadoop.mapreduce.JobCounter.MB_MILLIS_MAPS

Any ideas?


Thanks





Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 25 August 2016 at 11:51, Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> Hi,
>
> I am using Hadoop 2.6
>
> hduser@rhes564: /home/hduser/dba/bin>
> *hadoop version*Hadoop 2.6.0
>
> Thanks
>
>
>
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 25 August 2016 at 11:48, Bhaskar Dutta <bhas...@gmail.com> wrote:
>
>> This constant was added in Hadoop 2.3. Maybe you are using an older
>> version?
>>
>> ~bhaskar
>>
>> On Thu, Aug 25, 2016 at 3:04 PM, Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Actually I started using Spark to import data from RDBMS (in this case
>>> Oracle) after upgrading to Hive 2, running an import like below
>>>
>>> sqoop import --connect "jdbc:oracle:thin:@rhes564:1521:mydb12"
>>> --username scratchpad -P \
>>>                 --query "select * from scratchpad.dummy2 where \
>>>                  \$CONDITIONS" \
>>>                       --split-by ID \
>>>                    --hive-import  --hive-table "test.dumy2" --target-dir
>>> "/tmp/dummy2" *--direct*
>>>
>>> This gets the data into HDFS and then throws this error
>>>
>>> ERROR [main] tool.ImportTool: Imported Failed: No enum constant
>>> org.apache.hadoop.mapreduce.JobCounter.MB_MILLIS_MAPS
>>>
>>> I can easily get the data into Hive from the file on HDFS or dig into
>>> the problem (Spark 2, Hive 2, Hadoop 2.6, Sqoop 1.4.5) but I find Spark
>>> trouble free like below
>>>
>>>  val df = HiveContext.read.format("jdbc").options(
>>>  Map("url" -> dbURL,
>>>  "dbtable" -> "scratchpad.dummy)",
>>>  "partitionColumn" -> partitionColumnName,
>>>  "lowerBound" -> lowerBoundValue,
>>>  "upperBound" -> upperBoundValue,
>>>  "numPartitions" -> numPartitionsValue,
>>>  "user" -> dbUserName,
>>>  "password" -> dbPassword)).load
>>>
>>> It does work, opens parallel connections to Oracle DB and creates DF
>>> with the specified number of partitions.
>>>
>>> One thing I am not sure or tried if Spark supports direct mode yet.
>>>
>>> HTH
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>> On 25 August 2016 at 09:07, Bhaskar Dutta <bhas...@gmail.com> wrote:
>>>
>>>> Which RDBMS are you using here, and what is the data volume and
>>>> frequency of pulling data off the RDBMS?
>>>> Specifying these would help in giving better answers.
>>>>
>>>> Sqoop has a direct mode (non-JDBC) support for Postgres, MySQL and
>>>> Oracle, so you can use that for better performance if using one of these
>>>> databases.
>>>>
>>>> And don't forget that you Sqoop can load data directly into Parquet or
>>>> Avro (I think direct mode is not supported in this case).
>>>> Also you can use Kite SDK with Sqoop to manage/transform datasets,
>>>> perform schema evolution and such.
>>>>
>>>> ~bhaskar
>>>>
>>>>
>>>> On Thu, Aug 25, 2016 at 3:09 AM, Venkata Penikalapati <
>>>> mail.venkatakart...@gmail.com> wrote:
>>>>
>>>>> Team,
>>>>> Please help me in choosing sqoop or spark jdbc to fetch data from
>>>>> rdbms. Sqoop has lot of optimizations to fetch data does spark jdbc also
>>>>> has those ?
>>>>>
>>>>> I'm performing few analytics using spark data for which data is
>>>>> residing in rdbms.
>>>>>
>>>>> Please guide me with this.
>>>>>
>>>>>
>>>>> Thanks
>>>>> Venkata Karthik P
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Sqoop vs spark jdbc

Reply via email to