+dev ---------- Forwarded message ---------- From: 刘虓 <ipf...@gmail.com> Date: 2017-08-27 1:02 GMT+08:00 Subject: Re: spark dataframe jdbc Amazon RDS problem To: user <u...@spark.apache.org>
my code is here: from pyspark.sql import SparkSession spark = SparkSession.builder.getOrCreate() mysql_jdbc_url = 'mydb/test' table = "test" props = {"user": "myname", "password": 'mypassword'} df = spark.read.jdbc(mysql_jdbc_url,table,properties=props) df.printSchema() wtf = df.collect() for i in wtf:print i 2017-08-27 1:00 GMT+08:00 刘虓 <ipf...@gmail.com>: > hi,all > I came across this problem yesterday: > I was using data frame to read from a amazon rds mysql table ,and this > exception came up: > > java.sql.SQLException: Invalid value for getLong() - 'id' > > at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:964) > > at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:897) > > at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:886) > > at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:860) > > at com.mysql.jdbc.ResultSetImpl.getLong(ResultSetImpl.java:2688) > > at com.mysql.jdbc.ResultSetImpl.getLong(ResultSetImpl.java:2650) > > at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$ano > n$1.getNext(JDBCRDD.scala:447) > > at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$ano > n$1.hasNext(JDBCRDD.scala:544) > > at org.apache.spark.sql.catalyst.expressions.GeneratedClass$Gen > eratedIterator.processNext(Unknown Source) > > at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(B > ufferedRowIterator.java:43) > > at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfu > n$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370) > > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > > at org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.has > Next(SerDeUtil.scala:117) > > at scala.collection.Iterator$class.foreach(Iterator.scala:893) > > at org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.for > each(SerDeUtil.scala:112) > > at scala.collection.generic.Growable$class.$plus$plus$eq(Growab > le.scala:59) > > at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuff > er.scala:104) > > at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuff > er.scala:48) > > at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310) > > at org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.to( > SerDeUtil.scala:112) > > at scala.collection.TraversableOnce$class.toBuffer(TraversableO > nce.scala:302) > > at org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.toB > uffer(SerDeUtil.scala:112) > > at scala.collection.TraversableOnce$class.toArray(TraversableOn > ce.scala:289) > > at org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.toA > rray(SerDeUtil.scala:112) > > at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.appl > y(RDD.scala:912) > > at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.appl > y(RDD.scala:912) > > at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkC > ontext.scala:1916) > > at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkC > ontext.scala:1916) > > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > > at org.apache.spark.scheduler.Task.run(Task.scala:86) > > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool > Executor.java:1145) > > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo > lExecutor.java:615) > > at java.lang.Thread.run(Thread.java:722) > > > obviously there seems to be a column name 'a' in the results. > > Have anybody seen this before? >