https://issues.apache.org/jira/browse/SPARK-13531
On Sat, Feb 27, 2016 at 3:49 AM, Reynold Xin <r...@databricks.com> wrote: > Can you file a JIRA ticket? > > > On Friday, February 26, 2016, Koert Kuipers <ko...@tresata.com> wrote: > >> dataframe df1: >> schema: >> StructType(StructField(x,IntegerType,true)) >> explain: >> == Physical Plan == >> MapPartitions <function1>, obj#135: object, [if (input[0, >> object].isNullAt) null else input[0, object].get AS x#128] >> +- MapPartitions <function1>, createexternalrow(if (isnull(x#9)) null >> else x#9), [input[0, object] AS obj#135] >> +- WholeStageCodegen >> : +- Project [_1#8 AS x#9] >> : +- Scan ExistingRDD[_1#8] >> show: >> +---+ >> | x| >> +---+ >> | 2| >> | 3| >> +---+ >> >> >> dataframe df2: >> schema: >> StructType(StructField(x,IntegerType,true), >> StructField(y,StringType,true)) >> explain: >> == Physical Plan == >> MapPartitions <function1>, createexternalrow(x#2, if (isnull(y#3)) null >> else y#3.toString), [if (input[0, object].isNullAt) null else input[0, >> object].get AS x#130,if (input[0, object].isNullAt) null else >> staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, >> fromString, input[0, object].get, true) AS y#131] >> +- WholeStageCodegen >> : +- Project [_1#0 AS x#2,_2#1 AS y#3] >> : +- Scan ExistingRDD[_1#0,_2#1] >> show: >> +---+---+ >> | x| y| >> +---+---+ >> | 1| 1| >> | 2| 2| >> | 3| 3| >> +---+---+ >> >> >> i run: >> df1.join(df2, Seq("x")).show >> >> i get: >> java.lang.UnsupportedOperationException: No size estimation available for >> objects. >> at org.apache.spark.sql.types.ObjectType.defaultSize(ObjectType.scala:41) >> at >> org.apache.spark.sql.catalyst.plans.logical.UnaryNode$$anonfun$6.apply(LogicalPlan.scala:323) >> at >> org.apache.spark.sql.catalyst.plans.logical.UnaryNode$$anonfun$6.apply(LogicalPlan.scala:323) >> at >> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) >> at >> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) >> at scala.collection.immutable.List.foreach(List.scala:381) >> at scala.collection.TraversableLike$class.map(TraversableLike.scala:245) >> at scala.collection.immutable.List.map(List.scala:285) >> at >> org.apache.spark.sql.catalyst.plans.logical.UnaryNode.statistics(LogicalPlan.scala:323) >> at >> org.apache.spark.sql.execution.SparkStrategies$CanBroadcast$.unapply(SparkStrategies.scala:87) >> >> now sure what changed, this ran about a week ago without issues (in our >> internal unit tests). it is fully reproducible, however when i tried to >> minimize the issue i could not reproduce it by just creating data frames in >> the repl with the same contents, so it probably has something to do with >> way these are created (from Row objects and StructTypes). >> >> best, koert >> >