Re: Sqoop on Spark

Mich Talebzadeh Wed, 06 Apr 2016 08:17:55 -0700

Yes JDBC is another option. Need to be aware of some conversion issues like
spark does like CHAR types etc. You best bet is to do the conversion when
fetching data from Oracle itself.


var _ORACLEserver : String = "jdbc:oracle:thin:@rhes564:1521:mydb"
var _username : String = "sh"
var _password : String = "xxxx"
val c = HiveContext.load("jdbc",
Map("url" -> _ORACLEserver,
"dbtable" -> "(SELECT to_char(CHANNEL_ID) AS CHANNEL_ID, CHANNEL_DESC FROM
sh.channels)",
"user" -> _username,
"password" -> _password))
c.registerTempTable("t_c")


Then put the data from t_c table into Oracle table

HTH






Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 6 April 2016 at 10:34, Jorge Sánchez <jorgesg1...@gmail.com> wrote:

> Ayan,
>
> there was a talk in spark summit
> https://spark-summit.org/2015/events/Sqoop-on-Spark-for-Data-Ingestion/
> Apparently they had a lot of problems and the project seems abandoned.
>
> If you just have to do simple ingestion of a full table or a simple query,
> just use Sqoop as suggested by Mich, but if your use case requires further
> transformation of the data, I'd suggest you try Spark connecting to Oracle
> using JDBC and then having the data as a Dataframe.
>
> Regards.
>
> 2016-04-06 6:59 GMT+01:00 ayan guha <guha.a...@gmail.com>:
>
>> Thanks guys for feedback.
>>
>> On Wed, Apr 6, 2016 at 3:44 PM, Jörn Franke <jornfra...@gmail.com> wrote:
>>
>>> I do not think you can be more resource efficient. In the end you have
>>> to store the data anyway on HDFS . You have a lot of development effort for
>>> doing something like sqoop. Especially with error handling.
>>> You may create a ticket with the Sqoop guys to support Spark as an
>>> execution engine and maybe it is less effort to plug it in there.
>>> Maybe if your cluster is loaded then you may want to add more machines
>>> or improve the existing programs.
>>>
>>> On 06 Apr 2016, at 07:33, ayan guha <guha.a...@gmail.com> wrote:
>>>
>>> One of the reason in my mind is to avoid Map-Reduce application
>>> completely during ingestion, if possible. Also, I can then use Spark stand
>>> alone cluster to ingest, even if my hadoop cluster is heavily loaded. What
>>> you guys think?
>>>
>>> On Wed, Apr 6, 2016 at 3:13 PM, Jörn Franke <jornfra...@gmail.com>
>>> wrote:
>>>
>>>> Why do you want to reimplement something which is already there?
>>>>
>>>> On 06 Apr 2016, at 06:47, ayan guha <guha.a...@gmail.com> wrote:
>>>>
>>>> Hi
>>>>
>>>> Thanks for reply. My use case is query ~40 tables from Oracle (using
>>>> index and incremental only) and add data to existing Hive tables. Also, it
>>>> would be good to have an option to create Hive table, driven by job
>>>> specific configuration.
>>>>
>>>> What do you think?
>>>>
>>>> Best
>>>> Ayan
>>>>
>>>> On Wed, Apr 6, 2016 at 2:30 PM, Takeshi Yamamuro <linguin....@gmail.com
>>>> > wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> It depends on your use case using sqoop.
>>>>> What's it like?
>>>>>
>>>>> // maropu
>>>>>
>>>>> On Wed, Apr 6, 2016 at 1:26 PM, ayan guha <guha.a...@gmail.com> wrote:
>>>>>
>>>>>> Hi All
>>>>>>
>>>>>> Asking opinion: is it possible/advisable to use spark to replace what
>>>>>> sqoop does? Any existing project done in similar lines?
>>>>>>
>>>>>> --
>>>>>> Best Regards,
>>>>>> Ayan Guha
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> ---
>>>>> Takeshi Yamamuro
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards,
>>>> Ayan Guha
>>>>
>>>>
>>>
>>>
>>> --
>>> Best Regards,
>>> Ayan Guha
>>>
>>>
>>
>>
>> --
>> Best Regards,
>> Ayan Guha
>>
>
>

Re: Sqoop on Spark

Reply via email to