Re: some joins stopped working with spark 2.0.0 SNAPSHOT

Reynold Xin Sat, 27 Feb 2016 00:49:30 -0800

Can you file a JIRA ticket?

On Friday, February 26, 2016, Koert Kuipers <ko...@tresata.com> wrote:


> dataframe df1:
> schema:
> StructType(StructField(x,IntegerType,true))
> explain:
> == Physical Plan ==
> MapPartitions <function1>, obj#135: object, [if (input[0,
> object].isNullAt) null else input[0, object].get AS x#128]
> +- MapPartitions <function1>, createexternalrow(if (isnull(x#9)) null else
> x#9), [input[0, object] AS obj#135]
>    +- WholeStageCodegen
>       :  +- Project [_1#8 AS x#9]
>       :     +- Scan ExistingRDD[_1#8]
> show:
> +---+
> |  x|
> +---+
> |  2|
> |  3|
> +---+
>
>
> dataframe df2:
> schema:
> StructType(StructField(x,IntegerType,true), StructField(y,StringType,true))
> explain:
> == Physical Plan ==
> MapPartitions <function1>, createexternalrow(x#2, if (isnull(y#3)) null
> else y#3.toString), [if (input[0, object].isNullAt) null else input[0,
> object].get AS x#130,if (input[0, object].isNullAt) null else
> staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType,
> fromString, input[0, object].get, true) AS y#131]
> +- WholeStageCodegen
>    :  +- Project [_1#0 AS x#2,_2#1 AS y#3]
>    :     +- Scan ExistingRDD[_1#0,_2#1]
> show:
> +---+---+
> |  x|  y|
> +---+---+
> |  1|  1|
> |  2|  2|
> |  3|  3|
> +---+---+
>
>
> i run:
> df1.join(df2, Seq("x")).show
>
> i get:
> java.lang.UnsupportedOperationException: No size estimation available for
> objects.
> at org.apache.spark.sql.types.ObjectType.defaultSize(ObjectType.scala:41)
> at
> org.apache.spark.sql.catalyst.plans.logical.UnaryNode$$anonfun$6.apply(LogicalPlan.scala:323)
> at
> org.apache.spark.sql.catalyst.plans.logical.UnaryNode$$anonfun$6.apply(LogicalPlan.scala:323)
> at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
> at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
> at scala.collection.immutable.List.foreach(List.scala:381)
> at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
> at scala.collection.immutable.List.map(List.scala:285)
> at
> org.apache.spark.sql.catalyst.plans.logical.UnaryNode.statistics(LogicalPlan.scala:323)
> at
> org.apache.spark.sql.execution.SparkStrategies$CanBroadcast$.unapply(SparkStrategies.scala:87)
>
> now sure what changed, this ran about a week ago without issues (in our
> internal unit tests). it is fully reproducible, however when i tried to
> minimize the issue i could not reproduce it by just creating data frames in
> the repl with the same contents, so it probably has something to do with
> way these are created (from Row objects and StructTypes).
>
> best, koert
>

Re: some joins stopped working with spark 2.0.0 SNAPSHOT

Reply via email to