Hi, This sounds like a problem introduced in spark-shell 1.6.1.
Objective: Use JDBC connection in Spark shell to get data from RDBMS table (in this case Oracle) Results: JDBC connection is made OK but the collection fails with error ERROR TaskSetManager: Task 0 in stage 2.0 failed 4 times; aborting job org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 11, rhes564): java.lang.IllegalStateException: Did not find registered driver with class oracle.jdbc.OracleDriver Details *Spark 1.6.1* 1) Create a simple JDBC connection in Spark-shell where the Oracle jar file is loaded as below spark-shell --master spark://50.140.197.217:7077 --jars /home/hduser/jars/ojdbc6.jar scala> val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc) HiveContext: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@500bde5b scala> var _ORACLEserver : String = "jdbc:oracle:thin:@rhes564:1521:mydb12" _ORACLEserver: String = jdbc:oracle:thin:@rhes564:1521:mydb12 scala> var _username : String = "sh" _username: String = sh scala> var _password : String = "xxxxx" _password: String = sh scala> val c = HiveContext.load("jdbc", | Map("url" -> _ORACLEserver, | "dbtable" -> "(SELECT to_char(CHANNEL_ID) AS CHANNEL_ID, CHANNEL_DESC FROM sh.channels)", | "user" -> _username, | "password" -> _password)) warning: there were 1 deprecation warning(s); re-run with -deprecation for details c: org.apache.spark.sql.DataFrame = [CHANNEL_ID: string, CHANNEL_DESC: string] This works scala> c.printSchema root |-- CHANNEL_ID: string (nullable = true) |-- CHANNEL_DESC: string (nullable = false) *This fails * *scala> c.first*16/05/01 10:06:13 ERROR TaskSetManager: Task 0 in stage 2.0 failed 4 times; aborting job org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 11, rhes564): java.lang.IllegalStateException: Did not find registered driver with class oracle.jdbc.OracleDriver *In Spark 1.5.2 it works* ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 1.5.2 /_/ Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_77) Type in expressions to have them evaluated. Type :help for more information. Spark context available as sc. SQL context available as sqlContext. scala> val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc) HiveContext: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@e87c4cf scala> var _ORACLEserver : String = "jdbc:oracle:thin:@rhes564:1521:mydb12" _ORACLEserver: String = jdbc:oracle:thin:@rhes564:1521:mydb12 scala> var _username : String = "sh" _username: String = sh scala> var _password : String = "sh" _password: String = sh scala> val c = HiveContext.load("jdbc", | Map("url" -> _ORACLEserver, | "dbtable" -> "(SELECT to_char(CHANNEL_ID) AS CHANNEL_ID, CHANNEL_DESC FROM sh.channels)", | "user" -> _username, | "password" -> _password)) warning: there were 1 deprecation warning(s); re-run with -deprecation for details c: org.apache.spark.sql.DataFrame = [CHANNEL_ID: string, CHANNEL_DESC: string] scala> c.printSchema root |-- CHANNEL_ID: string (nullable = true) |-- CHANNEL_DESC: string (nullable = false) *This works in Spark 1.5.2 but fails in Spark 1.6.1* *scala> c.firstres1: org.apache.spark.sql.Row = [3,Direct Sales]* The work-around for now is to use Masven or sbt to create a jar file and use that with spark-submit for now which is not really ideal. Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com