Hi, Just to clarify, JDBC connection to RDBMS from Spark is slow?
This one read from an Oracle table with 4 connections in parallel to Oracle table assuming there is a primary key on the Oracle tale // // Get maxID first // val minID = HiveContext.read.format("jdbc").options(Map("url" -> _ORACLEserver,"dbtable" -> "(SELECT cast(MIN(ID) AS INT) AS minID FROM scratchpad.dummy)", "user" -> _username, "password" -> _password)).load().collect.apply(0).getDecimal(0).toString val maxID = HiveContext.read.format("jdbc").options(Map("url" -> _ORACLEserver,"dbtable" -> "(SELECT cast(MAX(ID) AS INT) AS maxID FROM scratchpad.dummy)", "user" -> _username, "password" -> _password)).load().collect.apply(0).getDecimal(0).toString val s = HiveContext.read.format("jdbc").options( Map("url" -> _ORACLEserver, "dbtable" -> "(SELECT ID, CLUSTERED, SCATTERED, RANDOMISED, RANDOM_STRING, SMALL_VC, PADDING FROM scratchpad.dummy)", "partitionColumn" -> "ID", "lowerBound" -> minID, "upperBound" -> maxID, "numPartitions" -> "4", "user" -> _username, "password" -> _password)).load HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On Mon, 2 Sep 2019 at 12:12, Chetan Khatri <chetan.opensou...@gmail.com> wrote: > Hi Mich, JDBC Connection which is similar to Sqoop takes time and could > not do parallelism. > > On Sat, Aug 31, 2019 at 12:17 PM Mich Talebzadeh < > mich.talebza...@gmail.com> wrote: > >> Spark is an excellent ETL tool to lift data from source and put it in >> target. Spark uses JDBC connection similar to Sqoop. I don't see the need >> for Sqoop with Spark here. >> >> Where is the source (Oracle MSSQL, etc) and target (Hive?) here >> >> HTH >> >> Dr Mich Talebzadeh >> >> >> >> LinkedIn * >> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >> >> >> >> http://talebzadehmich.wordpress.com >> >> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for >> any loss, damage or destruction of data or any other property which may >> arise from relying on this email's technical content is explicitly >> disclaimed. The author will in no case be liable for any monetary damages >> arising from such loss, damage or destruction. >> >> >> >> >> On Thu, 29 Aug 2019 at 21:01, Chetan Khatri <chetan.opensou...@gmail.com> >> wrote: >> >>> Hi Users, >>> I am launching a Sqoop job from Spark job and would like to FAIL Spark >>> job if Sqoop job fails. >>> >>> def executeSqoopOriginal(serverName: String, schemaName: String, username: >>> String, password: String, >>> query: String, splitBy: String, fetchSize: Int, >>> numMappers: Int, targetDir: String, jobName: String, dateColumns: String) = >>> { >>> >>> val connectionString = "jdbc:sqlserver://" + serverName + ";" + >>> "databaseName=" + schemaName >>> var parameters = Array("import") >>> parameters = parameters :+ "-Dmapreduce.job.user.classpath.first=true" >>> parameters = parameters :+ "--connect" >>> parameters = parameters :+ connectionString >>> parameters = parameters :+ "--mapreduce-job-name" >>> parameters = parameters :+ jobName >>> parameters = parameters :+ "--username" >>> parameters = parameters :+ username >>> parameters = parameters :+ "--password" >>> parameters = parameters :+ password >>> parameters = parameters :+ "--hadoop-mapred-home" >>> parameters = parameters :+ "/usr/hdp/2.6.5.0-292/hadoop-mapreduce/" >>> parameters = parameters :+ "--hadoop-home" >>> parameters = parameters :+ "/usr/hdp/2.6.5.0-292/hadoop/" >>> parameters = parameters :+ "--query" >>> parameters = parameters :+ query >>> parameters = parameters :+ "--split-by" >>> parameters = parameters :+ splitBy >>> parameters = parameters :+ "--fetch-size" >>> parameters = parameters :+ fetchSize.toString >>> parameters = parameters :+ "--num-mappers" >>> parameters = parameters :+ numMappers.toString >>> if (dateColumns.length() > 0) { >>> parameters = parameters :+ "--map-column-java" >>> parameters = parameters :+ dateColumns >>> } >>> parameters = parameters :+ "--target-dir" >>> parameters = parameters :+ targetDir >>> parameters = parameters :+ "--delete-target-dir" >>> parameters = parameters :+ "--as-avrodatafile" >>> >>> } >>> >>>