Hi,
I am experiencing a weird error that suddenly popped up in my unit tests. I
have a couple of HDFS files in JSON format and my test is basically
creating a JsonRDD and then issuing a very simple SQL query over it. This
used to work fine, but now suddenly I get:
15:58:49.039 [Executor task launch worker-1] ERROR executor.Executor -
Exception in task 1.0 in stage 29.0 (TID 117)
scala.MatchError: 14452800566866169008 (of class java.math.BigInteger)
at org.apache.spark.sql.json.JsonRDD$.toLong(JsonRDD.scala:282)
at org.apache.spark.sql.json.JsonRDD$.enforceCorrectType(JsonRDD.scala:353)
at
org.apache.spark.sql.json.JsonRDD$$anonfun$org$apache$spark$sql$json$JsonRDD$$asRow$1$$anonfun$apply$12.apply(JsonRDD.scala:381)
at scala.Option.map(Option.scala:145)
at
org.apache.spark.sql.json.JsonRDD$$anonfun$org$apache$spark$sql$json$JsonRDD$$asRow$1.apply(JsonRDD.scala:380)
at
org.apache.spark.sql.json.JsonRDD$$anonfun$org$apache$spark$sql$json$JsonRDD$$asRow$1.apply(JsonRDD.scala:365)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at
org.apache.spark.sql.json.JsonRDD$.org$apache$spark$sql$json$JsonRDD$$asRow(JsonRDD.scala:365)
at
org.apache.spark.sql.json.JsonRDD$$anonfun$jsonStringToRow$1.apply(JsonRDD.scala:38)
at
org.apache.spark.sql.json.JsonRDD$$anonfun$jsonStringToRow$1.apply(JsonRDD.scala:38)
...
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:745)
The stack trace contains none of my classes, so it's a bit hard to track
down where this starts.
The code of JsonRDD.toLong is in fact
private def toLong(value: Any): Long = {
value match {
case value: java.lang.Integer => value.asInstanceOf[Int].toLong
case value: java.lang.Long => value.asInstanceOf[Long]
}
}
so if value is a BigInteger, toLong doesn't work. Now I'm wondering where
this comes from (I haven't touched this component in a while, nor upgraded
Spark etc.), but in particular I would like to know how to work around this.
Thanks
Tobias