Can you file a JIRA ticket? On Friday, February 26, 2016, Koert Kuipers <ko...@tresata.com> wrote:
> dataframe df1: > schema: > StructType(StructField(x,IntegerType,true)) > explain: > == Physical Plan == > MapPartitions <function1>, obj#135: object, [if (input[0, > object].isNullAt) null else input[0, object].get AS x#128] > +- MapPartitions <function1>, createexternalrow(if (isnull(x#9)) null else > x#9), [input[0, object] AS obj#135] > +- WholeStageCodegen > : +- Project [_1#8 AS x#9] > : +- Scan ExistingRDD[_1#8] > show: > +---+ > | x| > +---+ > | 2| > | 3| > +---+ > > > dataframe df2: > schema: > StructType(StructField(x,IntegerType,true), StructField(y,StringType,true)) > explain: > == Physical Plan == > MapPartitions <function1>, createexternalrow(x#2, if (isnull(y#3)) null > else y#3.toString), [if (input[0, object].isNullAt) null else input[0, > object].get AS x#130,if (input[0, object].isNullAt) null else > staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, > fromString, input[0, object].get, true) AS y#131] > +- WholeStageCodegen > : +- Project [_1#0 AS x#2,_2#1 AS y#3] > : +- Scan ExistingRDD[_1#0,_2#1] > show: > +---+---+ > | x| y| > +---+---+ > | 1| 1| > | 2| 2| > | 3| 3| > +---+---+ > > > i run: > df1.join(df2, Seq("x")).show > > i get: > java.lang.UnsupportedOperationException: No size estimation available for > objects. > at org.apache.spark.sql.types.ObjectType.defaultSize(ObjectType.scala:41) > at > org.apache.spark.sql.catalyst.plans.logical.UnaryNode$$anonfun$6.apply(LogicalPlan.scala:323) > at > org.apache.spark.sql.catalyst.plans.logical.UnaryNode$$anonfun$6.apply(LogicalPlan.scala:323) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at scala.collection.immutable.List.foreach(List.scala:381) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:245) > at scala.collection.immutable.List.map(List.scala:285) > at > org.apache.spark.sql.catalyst.plans.logical.UnaryNode.statistics(LogicalPlan.scala:323) > at > org.apache.spark.sql.execution.SparkStrategies$CanBroadcast$.unapply(SparkStrategies.scala:87) > > now sure what changed, this ran about a week ago without issues (in our > internal unit tests). it is fully reproducible, however when i tried to > minimize the issue i could not reproduce it by just creating data frames in > the repl with the same contents, so it probably has something to do with > way these are created (from Row objects and StructTypes). > > best, koert >