Re: Sqoop vs spark jdbc

Mich Talebzadeh Thu, 25 Aug 2016 03:51:59 -0700

Hi,

I am using Hadoop 2.6


hduser@rhes564: /home/hduser/dba/bin>
*hadoop version*Hadoop 2.6.0

Thanks






Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 25 August 2016 at 11:48, Bhaskar Dutta <bhas...@gmail.com> wrote:

> This constant was added in Hadoop 2.3. Maybe you are using an older
> version?
>
> ~bhaskar
>
> On Thu, Aug 25, 2016 at 3:04 PM, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> Actually I started using Spark to import data from RDBMS (in this case
>> Oracle) after upgrading to Hive 2, running an import like below
>>
>> sqoop import --connect "jdbc:oracle:thin:@rhes564:1521:mydb12"
>> --username scratchpad -P \
>>                 --query "select * from scratchpad.dummy2 where \
>>                  \$CONDITIONS" \
>>                       --split-by ID \
>>                    --hive-import  --hive-table "test.dumy2" --target-dir
>> "/tmp/dummy2" *--direct*
>>
>> This gets the data into HDFS and then throws this error
>>
>> ERROR [main] tool.ImportTool: Imported Failed: No enum constant
>> org.apache.hadoop.mapreduce.JobCounter.MB_MILLIS_MAPS
>>
>> I can easily get the data into Hive from the file on HDFS or dig into the
>> problem (Spark 2, Hive 2, Hadoop 2.6, Sqoop 1.4.5) but I find Spark trouble
>> free like below
>>
>>  val df = HiveContext.read.format("jdbc").options(
>>  Map("url" -> dbURL,
>>  "dbtable" -> "scratchpad.dummy)",
>>  "partitionColumn" -> partitionColumnName,
>>  "lowerBound" -> lowerBoundValue,
>>  "upperBound" -> upperBoundValue,
>>  "numPartitions" -> numPartitionsValue,
>>  "user" -> dbUserName,
>>  "password" -> dbPassword)).load
>>
>> It does work, opens parallel connections to Oracle DB and creates DF with
>> the specified number of partitions.
>>
>> One thing I am not sure or tried if Spark supports direct mode yet.
>>
>> HTH
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 25 August 2016 at 09:07, Bhaskar Dutta <bhas...@gmail.com> wrote:
>>
>>> Which RDBMS are you using here, and what is the data volume and
>>> frequency of pulling data off the RDBMS?
>>> Specifying these would help in giving better answers.
>>>
>>> Sqoop has a direct mode (non-JDBC) support for Postgres, MySQL and
>>> Oracle, so you can use that for better performance if using one of these
>>> databases.
>>>
>>> And don't forget that you Sqoop can load data directly into Parquet or
>>> Avro (I think direct mode is not supported in this case).
>>> Also you can use Kite SDK with Sqoop to manage/transform datasets,
>>> perform schema evolution and such.
>>>
>>> ~bhaskar
>>>
>>>
>>> On Thu, Aug 25, 2016 at 3:09 AM, Venkata Penikalapati <
>>> mail.venkatakart...@gmail.com> wrote:
>>>
>>>> Team,
>>>> Please help me in choosing sqoop or spark jdbc to fetch data from
>>>> rdbms. Sqoop has lot of optimizations to fetch data does spark jdbc also
>>>> has those ?
>>>>
>>>> I'm performing few analytics using spark data for which data is
>>>> residing in rdbms.
>>>>
>>>> Please guide me with this.
>>>>
>>>>
>>>> Thanks
>>>> Venkata Karthik P
>>>>
>>>>
>>>
>>
>

Re: Sqoop vs spark jdbc

Reply via email to