Is sc there a SparkContext or a JavaSparkContext? The compilation error seems to indicate the former, but JdbcRDD.create expects the latter
On Wed, Feb 18, 2015 at 12:30 PM, Dmitry Goldenberg < [email protected]> wrote: > I have tried that as well, I get a compile error -- > > [ERROR] ...SparkProto.java:[105,39] error: no suitable method found for > create(SparkContext,<anonymous > ConnectionFactory>,String,int,int,int,<anonymous > Function<ResultSet,Integer>>) > > The code is a copy and paste: > > JavaRDD<Integer> jdbcRDD = JdbcRDD.create( > sc, > new JdbcRDD.ConnectionFactory() { > public Connection getConnection() throws SQLException { > return > DriverManager.getConnection("jdbc:derby:target/JavaJdbcRDDSuiteDb"); > } > }, > "SELECT DATA FROM FOO WHERE ? <= ID AND ID <= ?", > 1, 100, 1, > new Function<ResultSet, Integer>() { > public Integer call(ResultSet r) throws Exception { > return r.getInt(1); > } > } > ); > > The other thing I've tried was to define a static class locally for > GetConnection and use the JdbcCreate constructor. This got around the > compile issues but blew up at runtime with "NoClassDefFoundError: > scala/runtime/AbstractFunction0" ! > > JdbcRDD<Row> jdbcRDD = new JdbcRDD<Row>( > sc, > (AbstractFunction0<Connection>) new DbConn(), // had to cast or a compile > error > SQL_QUERY, > 0L, > 1000L, > 10, > new MapRow(), > ROW_CLASS_TAG); > // DbConn is defined as public static class DbConn extends > AbstractFunction0<Connection> implements Serializable > > On Wed, Feb 18, 2015 at 1:20 PM, Cody Koeninger <[email protected]> > wrote: > >> That test I linked >> >> >> https://github.com/apache/spark/blob/v1.2.1/core/src/test/java/org/apache/spark/JavaJdbcRDDSuite.java#L90 >> >> is calling a static method JdbcRDD.create, not new JdbcRDD. Is that what >> you tried doing? >> >> On Wed, Feb 18, 2015 at 12:00 PM, Dmitry Goldenberg < >> [email protected]> wrote: >> >>> Thanks, Cody. Yes, I originally started off by looking at that but I get >>> a compile error if I try and use that approach: constructor JdbcRDD in >>> class JdbcRDD<T> cannot be applied to given types. Not to mention that >>> JavaJdbcRDDSuite somehow manages to not pass in the class tag (the last >>> argument). >>> >>> Wonder if it's a JDK version issue, I'm using 1.7. >>> >>> So I've got this, which doesn't compile >>> >>> JdbcRDD<Row> jdbcRDD = new JdbcRDD<Row>( >>> new SparkContext(conf), >>> new JdbcRDD.ConnectionFactory() { >>> public Connection getConnection() throws SQLException { >>> Connection conn = null; >>> try { >>> Class.forName(JDBC_DRIVER); >>> conn = DriverManager.getConnection(JDBC_URL, JDBC_USER, JDBC_PASSWORD); >>> } catch (ClassNotFoundException ex) { >>> throw new RuntimeException("Error while loading JDBC driver.", ex); >>> } >>> return conn; >>> } >>> }, >>> "SELECT * FROM EMPLOYEES", >>> 0L, >>> 1000L, >>> 10, >>> new Function<ResultSet, Row>() { >>> public Row call(ResultSet r) throws Exception { >>> return null; // have some actual logic here... >>> } >>> }, >>> scala.reflect.ClassManifestFactory$.MODULE$.fromClass(Row.class)); >>> >>> The other approach was mimicing the DbConnection class from this post: >>> http://www.sparkexpert.com/2015/01/02/load-database-data-into-spark-using-jdbcrdd-in-java/. >>> It got around any of the compilation issues but then I got the runtime >>> error where Spark wouldn't recognize the db connection class as a >>> scala.Function0. >>> >>> >>> >>> On Wed, Feb 18, 2015 at 12:37 PM, Cody Koeninger <[email protected]> >>> wrote: >>> >>>> Take a look at >>>> >>>> >>>> https://github.com/apache/spark/blob/v1.2.1/core/src/test/java/org/apache/spark/JavaJdbcRDDSuite.java >>>> >>>> >>>> >>>> On Wed, Feb 18, 2015 at 11:14 AM, dgoldenberg <[email protected] >>>> > wrote: >>>> >>>>> I'm reading data from a database using JdbcRDD, in Java, and I have an >>>>> implementation of Function0<Connection> whose instance I supply as the >>>>> 'getConnection' parameter into the JdbcRDD constructor. Compiles fine. >>>>> >>>>> The definition of the class/function is as follows: >>>>> >>>>> public class GetDbConnection extends AbstractFunction0<Connection> >>>>> implements Serializable >>>>> >>>>> where scala.runtime.AbstractFunction0 extends scala.Function0. >>>>> >>>>> At runtime, I get an exception as below. Does anyone have an idea as >>>>> to how >>>>> to resolve this/work around it? Thanks. >>>>> >>>>> I'm running Spark 1.2.1 built for Hadoop 2.4. >>>>> >>>>> >>>>> Exception in thread "main" org.apache.spark.SparkException: Job >>>>> aborted due >>>>> to stage failure: Task 3 in stage 0.0 failed 1 times, most recent >>>>> failure: >>>>> Lost task 3.0 in stage 0.0 (TID 3, localhost): >>>>> java.lang.ClassCastException: >>>>> cannot assign instance of com.kona.motivis.spark.proto.GetDbConnection >>>>> to >>>>> field >>>>> org.apache.spark.rdd.JdbcRDD.org$apache$spark$rdd$JdbcRDD$$getConnection >>>>> of >>>>> type scala.Function0 in instance of org.apache.spark.rdd.JdbcRDD >>>>> at >>>>> >>>>> java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2083) >>>>> at >>>>> >>>>> java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1261) >>>>> at >>>>> >>>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1996) >>>>> at >>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) >>>>> at >>>>> >>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) >>>>> at >>>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >>>>> at >>>>> >>>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) >>>>> at >>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) >>>>> at >>>>> >>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) >>>>> at >>>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >>>>> at >>>>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) >>>>> at >>>>> >>>>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62) >>>>> at >>>>> >>>>> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87) >>>>> at >>>>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:57) >>>>> at org.apache.spark.scheduler.Task.run(Task.scala:56) >>>>> at >>>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200) >>>>> at >>>>> >>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>>> at >>>>> >>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>>> at java.lang.Thread.run(Thread.java:744) >>>>> >>>>> Driver stacktrace: >>>>> at >>>>> org.apache.spark.scheduler.DAGScheduler.org >>>>> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1214) >>>>> at >>>>> >>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1203) >>>>> at >>>>> >>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1202) >>>>> at >>>>> >>>>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) >>>>> at >>>>> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) >>>>> at >>>>> >>>>> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1202) >>>>> at >>>>> >>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696) >>>>> at >>>>> >>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696) >>>>> at scala.Option.foreach(Option.scala:236) >>>>> at >>>>> >>>>> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:696) >>>>> at >>>>> >>>>> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1420) >>>>> at akka.actor.Actor$class.aroundReceive(Actor.scala:465) >>>>> at >>>>> >>>>> org.apache.spark.scheduler.DAGSchedulerEventProcessActor.aroundReceive(DAGScheduler.scala:1375) >>>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) >>>>> at akka.actor.ActorCell.invoke(ActorCell.scala:487) >>>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) >>>>> at akka.dispatch.Mailbox.run(Mailbox.scala:220) >>>>> at >>>>> >>>>> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) >>>>> at >>>>> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) >>>>> at >>>>> >>>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) >>>>> at >>>>> >>>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) >>>>> at >>>>> >>>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> View this message in context: >>>>> http://apache-spark-user-list.1001560.n3.nabble.com/JdbcRDD-ClassCastException-with-scala-Function0-tp21707.html >>>>> Sent from the Apache Spark User List mailing list archive at >>>>> Nabble.com. >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: [email protected] >>>>> For additional commands, e-mail: [email protected] >>>>> >>>>> >>>> >>> >> >
