If you want to find what commit caused it, try out the "git bisect" command. On Sat, Feb 27, 2016 at 11:06 AM Koert Kuipers <ko...@tresata.com> wrote:
> https://issues.apache.org/jira/browse/SPARK-13531 > > On Sat, Feb 27, 2016 at 3:49 AM, Reynold Xin <r...@databricks.com> wrote: > >> Can you file a JIRA ticket? >> >> >> On Friday, February 26, 2016, Koert Kuipers <ko...@tresata.com> wrote: >> >>> dataframe df1: >>> schema: >>> StructType(StructField(x,IntegerType,true)) >>> explain: >>> == Physical Plan == >>> MapPartitions <function1>, obj#135: object, [if (input[0, >>> object].isNullAt) null else input[0, object].get AS x#128] >>> +- MapPartitions <function1>, createexternalrow(if (isnull(x#9)) null >>> else x#9), [input[0, object] AS obj#135] >>> +- WholeStageCodegen >>> : +- Project [_1#8 AS x#9] >>> : +- Scan ExistingRDD[_1#8] >>> show: >>> +---+ >>> | x| >>> +---+ >>> | 2| >>> | 3| >>> +---+ >>> >>> >>> dataframe df2: >>> schema: >>> StructType(StructField(x,IntegerType,true), >>> StructField(y,StringType,true)) >>> explain: >>> == Physical Plan == >>> MapPartitions <function1>, createexternalrow(x#2, if (isnull(y#3)) null >>> else y#3.toString), [if (input[0, object].isNullAt) null else input[0, >>> object].get AS x#130,if (input[0, object].isNullAt) null else >>> staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, >>> fromString, input[0, object].get, true) AS y#131] >>> +- WholeStageCodegen >>> : +- Project [_1#0 AS x#2,_2#1 AS y#3] >>> : +- Scan ExistingRDD[_1#0,_2#1] >>> show: >>> +---+---+ >>> | x| y| >>> +---+---+ >>> | 1| 1| >>> | 2| 2| >>> | 3| 3| >>> +---+---+ >>> >>> >>> i run: >>> df1.join(df2, Seq("x")).show >>> >>> i get: >>> java.lang.UnsupportedOperationException: No size estimation available >>> for objects. >>> at org.apache.spark.sql.types.ObjectType.defaultSize(ObjectType.scala:41) >>> at >>> org.apache.spark.sql.catalyst.plans.logical.UnaryNode$$anonfun$6.apply(LogicalPlan.scala:323) >>> at >>> org.apache.spark.sql.catalyst.plans.logical.UnaryNode$$anonfun$6.apply(LogicalPlan.scala:323) >>> at >>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) >>> at >>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) >>> at scala.collection.immutable.List.foreach(List.scala:381) >>> at scala.collection.TraversableLike$class.map(TraversableLike.scala:245) >>> at scala.collection.immutable.List.map(List.scala:285) >>> at >>> org.apache.spark.sql.catalyst.plans.logical.UnaryNode.statistics(LogicalPlan.scala:323) >>> at >>> org.apache.spark.sql.execution.SparkStrategies$CanBroadcast$.unapply(SparkStrategies.scala:87) >>> >>> now sure what changed, this ran about a week ago without issues (in our >>> internal unit tests). it is fully reproducible, however when i tried to >>> minimize the issue i could not reproduce it by just creating data frames in >>> the repl with the same contents, so it probably has something to do with >>> way these are created (from Row objects and StructTypes). >>> >>> best, koert >>> >> >