Hi,

This sounds like a problem introduced in spark-shell 1.6.1.

Objective:  Use JDBC connection in Spark shell to get data from RDBMS table
(in this case Oracle)

Results: JDBC connection is made OK but the collection fails with error

ERROR TaskSetManager: Task 0 in stage 2.0 failed 4 times; aborting job
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage
2.0 (TID 11, rhes564): java.lang.IllegalStateException: Did not find
registered driver with class oracle.jdbc.OracleDriver

Details

*Spark 1.6.1*

1) Create a simple JDBC connection in Spark-shell where the Oracle jar file
is loaded as below

spark-shell --master spark://50.140.197.217:7077 --jars
/home/hduser/jars/ojdbc6.jar

scala> val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
HiveContext: org.apache.spark.sql.hive.HiveContext =
org.apache.spark.sql.hive.HiveContext@500bde5b
scala> var _ORACLEserver : String = "jdbc:oracle:thin:@rhes564:1521:mydb12"
_ORACLEserver: String = jdbc:oracle:thin:@rhes564:1521:mydb12
scala> var _username : String = "sh"
_username: String = sh
scala> var _password : String = "xxxxx"
_password: String = sh
scala> val c = HiveContext.load("jdbc",
     | Map("url" -> _ORACLEserver,
     | "dbtable" -> "(SELECT to_char(CHANNEL_ID) AS CHANNEL_ID,
CHANNEL_DESC FROM sh.channels)",
     | "user" -> _username,
     | "password" -> _password))
warning: there were 1 deprecation warning(s); re-run with -deprecation for
details
c: org.apache.spark.sql.DataFrame = [CHANNEL_ID: string, CHANNEL_DESC:
string]

This works

scala> c.printSchema
root
 |-- CHANNEL_ID: string (nullable = true)
 |-- CHANNEL_DESC: string (nullable = false)

*This fails *


*scala> c.first*16/05/01 10:06:13 ERROR TaskSetManager: Task 0 in stage 2.0
failed 4 times; aborting job
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage
2.0 (TID 11, rhes564): java.lang.IllegalStateException: Did not      find
registered driver with class oracle.jdbc.OracleDriver

*In Spark 1.5.2 it works*

      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.5.2
      /_/
Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java
1.8.0_77)
Type in expressions to have them evaluated.
Type :help for more information.
Spark context available as sc.
SQL context available as sqlContext.
scala> val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
HiveContext: org.apache.spark.sql.hive.HiveContext =
org.apache.spark.sql.hive.HiveContext@e87c4cf
scala> var _ORACLEserver : String = "jdbc:oracle:thin:@rhes564:1521:mydb12"
_ORACLEserver: String = jdbc:oracle:thin:@rhes564:1521:mydb12
scala> var _username : String = "sh"
_username: String = sh
scala> var _password : String = "sh"
_password: String = sh
scala> val c = HiveContext.load("jdbc",
     | Map("url" -> _ORACLEserver,
     | "dbtable" -> "(SELECT to_char(CHANNEL_ID) AS CHANNEL_ID,
CHANNEL_DESC FROM sh.channels)",
     | "user" -> _username,
     | "password" -> _password))
warning: there were 1 deprecation warning(s); re-run with -deprecation for
details
c: org.apache.spark.sql.DataFrame = [CHANNEL_ID: string, CHANNEL_DESC:
string]
scala> c.printSchema
root
 |-- CHANNEL_ID: string (nullable = true)
 |-- CHANNEL_DESC: string (nullable = false)

*This works in Spark 1.5.2 but fails in Spark 1.6.1*


*scala> c.firstres1: org.apache.spark.sql.Row = [3,Direct Sales]*

The work-around for now is to use Masven or sbt to create a jar file and
use that with spark-submit for now which is not really ideal.


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com

Reply via email to