Re: Sqoop vs spark jdbc

Bhaskar Dutta Thu, 25 Aug 2016 03:49:11 -0700

This constant was added in Hadoop 2.3. Maybe you are using an older version?


~bhaskar

On Thu, Aug 25, 2016 at 3:04 PM, Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> Actually I started using Spark to import data from RDBMS (in this case
> Oracle) after upgrading to Hive 2, running an import like below
>
> sqoop import --connect "jdbc:oracle:thin:@rhes564:1521:mydb12" --username
> scratchpad -P \
>                 --query "select * from scratchpad.dummy2 where \
>                  \$CONDITIONS" \
>                       --split-by ID \
>                    --hive-import  --hive-table "test.dumy2" --target-dir
> "/tmp/dummy2" *--direct*
>
> This gets the data into HDFS and then throws this error
>
> ERROR [main] tool.ImportTool: Imported Failed: No enum constant
> org.apache.hadoop.mapreduce.JobCounter.MB_MILLIS_MAPS
>
> I can easily get the data into Hive from the file on HDFS or dig into the
> problem (Spark 2, Hive 2, Hadoop 2.6, Sqoop 1.4.5) but I find Spark trouble
> free like below
>
>  val df = HiveContext.read.format("jdbc").options(
>  Map("url" -> dbURL,
>  "dbtable" -> "scratchpad.dummy)",
>  "partitionColumn" -> partitionColumnName,
>  "lowerBound" -> lowerBoundValue,
>  "upperBound" -> upperBoundValue,
>  "numPartitions" -> numPartitionsValue,
>  "user" -> dbUserName,
>  "password" -> dbPassword)).load
>
> It does work, opens parallel connections to Oracle DB and creates DF with
> the specified number of partitions.
>
> One thing I am not sure or tried if Spark supports direct mode yet.
>
> HTH
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 25 August 2016 at 09:07, Bhaskar Dutta <bhas...@gmail.com> wrote:
>
>> Which RDBMS are you using here, and what is the data volume and frequency
>> of pulling data off the RDBMS?
>> Specifying these would help in giving better answers.
>>
>> Sqoop has a direct mode (non-JDBC) support for Postgres, MySQL and
>> Oracle, so you can use that for better performance if using one of these
>> databases.
>>
>> And don't forget that you Sqoop can load data directly into Parquet or
>> Avro (I think direct mode is not supported in this case).
>> Also you can use Kite SDK with Sqoop to manage/transform datasets,
>> perform schema evolution and such.
>>
>> ~bhaskar
>>
>>
>> On Thu, Aug 25, 2016 at 3:09 AM, Venkata Penikalapati <
>> mail.venkatakart...@gmail.com> wrote:
>>
>>> Team,
>>> Please help me in choosing sqoop or spark jdbc to fetch data from rdbms.
>>> Sqoop has lot of optimizations to fetch data does spark jdbc also has those
>>> ?
>>>
>>> I'm performing few analytics using spark data for which data is residing
>>> in rdbms.
>>>
>>> Please guide me with this.
>>>
>>>
>>> Thanks
>>> Venkata Karthik P
>>>
>>>
>>
>

Re: Sqoop vs spark jdbc

Reply via email to