Shyam, As mark said - if we boost the parallelism with spark we can reach
to performance of sqoop or better than that.
On Tue, Sep 3, 2019 at 6:35 PM Shyam P wrote:
> J Franke,
> Leave alone sqoop , I am just asking about spark in ETL of Oracle ...?
>
> Thanks,
> Shyam
>
>>
J Franke,
Leave alone sqoop , I am just asking about spark in ETL of Oracle ...?
Thanks,
Shyam
>
This I would not say. The only “issue” with Spark is that you need to build
some functionality on top which is available in Sqoop out of the box,
especially for import processes and if you need to define a lot of them.
> Am 03.09.2019 um 09:30 schrieb Shyam P :
>
> Hi Mich,
>Lot of people s
Hi Mich,
Lot of people say that Spark does not have proven record in migrating
data from oracle as sqoop has.
At list in production.
Please correct me if I am wrong and suggest how to deal with shuffling when
dealing with groupBy ?
Thanks,
Shyam
On Sat, Aug 31, 2019 at 12:17 PM Mich Talebzade
Hey Chetan,
How many database connections are you anticipating in this job? Is this for
every row in the dataframe?
Kind regards
Chris
On Mon., 2 Sep. 2019, 9:11 pm Chetan Khatri,
wrote:
> Hi Chris, Thanks for the email. You're right. but it's like Sqoop job gets
> launched based on dataframe
Hi,
Just to clarify, JDBC connection to RDBMS from Spark is slow?
This one read from an Oracle table with 4 connections in parallel to Oracle
table assuming there is a primary key on the Oracle tale
//
// Get maxID first
//
val minID = HiveContext.read.format("jdbc").options(Map("url" ->
_ORACLE
Hi Mich, JDBC Connection which is similar to Sqoop takes time and could not
do parallelism.
On Sat, Aug 31, 2019 at 12:17 PM Mich Talebzadeh
wrote:
> Spark is an excellent ETL tool to lift data from source and put it in
> target. Spark uses JDBC connection similar to Sqoop. I don't see the need
Hi Chris, Thanks for the email. You're right. but it's like Sqoop job gets
launched based on dataframe values in spark job. Certainly it can be
isolated and broken.
On Sat, Aug 31, 2019 at 8:07 AM Chris Teoh wrote:
> I'd say this is an uncommon approach, could you use a workflow/scheduling
> sys
Spark is an excellent ETL tool to lift data from source and put it in
target. Spark uses JDBC connection similar to Sqoop. I don't see the need
for Sqoop with Spark here.
Where is the source (Oracle MSSQL, etc) and target (Hive?) here
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.co
I'd say this is an uncommon approach, could you use a workflow/scheduling
system to call Sqoop outside of Spark? Spark is usually multiprocess
distributed so putting in this Sqoop job in the Spark code seems to imply
you want to run Sqoop first, then Spark. If you're really insistent on
this, call
Sorry,
I call sqoop job from above function. Can you help me to resolve this.
Thanks
On Fri, Aug 30, 2019 at 1:31 AM Chetan Khatri
wrote:
> Hi Users,
> I am launching a Sqoop job from Spark job and would like to FAIL Spark job
> if Sqoop job fails.
>
> def executeSqoopOriginal(serverName: Strin
Hi Users,
I am launching a Sqoop job from Spark job and would like to FAIL Spark job
if Sqoop job fails.
def executeSqoopOriginal(serverName: String, schemaName: String,
username: String, password: String,
query: String, splitBy: String, fetchSize: Int,
numMappers: Int, targetDir:
12 matches
Mail list logo