Re: Control Sqoop job from Spark job

2019-10-17 Thread Chetan Khatri
Shyam, As mark said - if we boost the parallelism with spark we can reach to performance of sqoop or better than that. On Tue, Sep 3, 2019 at 6:35 PM Shyam P wrote: > J Franke, > Leave alone sqoop , I am just asking about spark in ETL of Oracle ...? > > Thanks, > Shyam > >>

Re: Control Sqoop job from Spark job

2019-09-03 Thread Shyam P
J Franke, Leave alone sqoop , I am just asking about spark in ETL of Oracle ...? Thanks, Shyam >

Re: Control Sqoop job from Spark job

2019-09-03 Thread Jörn Franke
This I would not say. The only “issue” with Spark is that you need to build some functionality on top which is available in Sqoop out of the box, especially for import processes and if you need to define a lot of them. > Am 03.09.2019 um 09:30 schrieb Shyam P : > > Hi Mich, >Lot of people s

Re: Control Sqoop job from Spark job

2019-09-03 Thread Shyam P
Hi Mich, Lot of people say that Spark does not have proven record in migrating data from oracle as sqoop has. At list in production. Please correct me if I am wrong and suggest how to deal with shuffling when dealing with groupBy ? Thanks, Shyam On Sat, Aug 31, 2019 at 12:17 PM Mich Talebzade

Re: Control Sqoop job from Spark job

2019-09-02 Thread Chris Teoh
Hey Chetan, How many database connections are you anticipating in this job? Is this for every row in the dataframe? Kind regards Chris On Mon., 2 Sep. 2019, 9:11 pm Chetan Khatri, wrote: > Hi Chris, Thanks for the email. You're right. but it's like Sqoop job gets > launched based on dataframe

Re: Control Sqoop job from Spark job

2019-09-02 Thread Mich Talebzadeh
Hi, Just to clarify, JDBC connection to RDBMS from Spark is slow? This one read from an Oracle table with 4 connections in parallel to Oracle table assuming there is a primary key on the Oracle tale // // Get maxID first // val minID = HiveContext.read.format("jdbc").options(Map("url" -> _ORACLE

Re: Control Sqoop job from Spark job

2019-09-02 Thread Chetan Khatri
Hi Mich, JDBC Connection which is similar to Sqoop takes time and could not do parallelism. On Sat, Aug 31, 2019 at 12:17 PM Mich Talebzadeh wrote: > Spark is an excellent ETL tool to lift data from source and put it in > target. Spark uses JDBC connection similar to Sqoop. I don't see the need

Re: Control Sqoop job from Spark job

2019-09-02 Thread Chetan Khatri
Hi Chris, Thanks for the email. You're right. but it's like Sqoop job gets launched based on dataframe values in spark job. Certainly it can be isolated and broken. On Sat, Aug 31, 2019 at 8:07 AM Chris Teoh wrote: > I'd say this is an uncommon approach, could you use a workflow/scheduling > sys

Re: Control Sqoop job from Spark job

2019-08-30 Thread Mich Talebzadeh
Spark is an excellent ETL tool to lift data from source and put it in target. Spark uses JDBC connection similar to Sqoop. I don't see the need for Sqoop with Spark here. Where is the source (Oracle MSSQL, etc) and target (Hive?) here HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.co

Re: Control Sqoop job from Spark job

2019-08-30 Thread Chris Teoh
I'd say this is an uncommon approach, could you use a workflow/scheduling system to call Sqoop outside of Spark? Spark is usually multiprocess distributed so putting in this Sqoop job in the Spark code seems to imply you want to run Sqoop first, then Spark. If you're really insistent on this, call

Re: Control Sqoop job from Spark job

2019-08-29 Thread Chetan Khatri
Sorry, I call sqoop job from above function. Can you help me to resolve this. Thanks On Fri, Aug 30, 2019 at 1:31 AM Chetan Khatri wrote: > Hi Users, > I am launching a Sqoop job from Spark job and would like to FAIL Spark job > if Sqoop job fails. > > def executeSqoopOriginal(serverName: Strin

Control Sqoop job from Spark job

2019-08-29 Thread Chetan Khatri
Hi Users, I am launching a Sqoop job from Spark job and would like to FAIL Spark job if Sqoop job fails. def executeSqoopOriginal(serverName: String, schemaName: String, username: String, password: String, query: String, splitBy: String, fetchSize: Int, numMappers: Int, targetDir: