Re: some joins stopped working with spark 2.0.0 SNAPSHOT

Koert Kuipers Sat, 27 Feb 2016 11:07:20 -0800

https://issues.apache.org/jira/browse/SPARK-13531


On Sat, Feb 27, 2016 at 3:49 AM, Reynold Xin <r...@databricks.com> wrote:

> Can you file a JIRA ticket?
>
>
> On Friday, February 26, 2016, Koert Kuipers <ko...@tresata.com> wrote:
>
>> dataframe df1:
>> schema:
>> StructType(StructField(x,IntegerType,true))
>> explain:
>> == Physical Plan ==
>> MapPartitions <function1>, obj#135: object, [if (input[0,
>> object].isNullAt) null else input[0, object].get AS x#128]
>> +- MapPartitions <function1>, createexternalrow(if (isnull(x#9)) null
>> else x#9), [input[0, object] AS obj#135]
>>    +- WholeStageCodegen
>>       :  +- Project [_1#8 AS x#9]
>>       :     +- Scan ExistingRDD[_1#8]
>> show:
>> +---+
>> |  x|
>> +---+
>> |  2|
>> |  3|
>> +---+
>>
>>
>> dataframe df2:
>> schema:
>> StructType(StructField(x,IntegerType,true),
>> StructField(y,StringType,true))
>> explain:
>> == Physical Plan ==
>> MapPartitions <function1>, createexternalrow(x#2, if (isnull(y#3)) null
>> else y#3.toString), [if (input[0, object].isNullAt) null else input[0,
>> object].get AS x#130,if (input[0, object].isNullAt) null else
>> staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType,
>> fromString, input[0, object].get, true) AS y#131]
>> +- WholeStageCodegen
>>    :  +- Project [_1#0 AS x#2,_2#1 AS y#3]
>>    :     +- Scan ExistingRDD[_1#0,_2#1]
>> show:
>> +---+---+
>> |  x|  y|
>> +---+---+
>> |  1|  1|
>> |  2|  2|
>> |  3|  3|
>> +---+---+
>>
>>
>> i run:
>> df1.join(df2, Seq("x")).show
>>
>> i get:
>> java.lang.UnsupportedOperationException: No size estimation available for
>> objects.
>> at org.apache.spark.sql.types.ObjectType.defaultSize(ObjectType.scala:41)
>> at
>> org.apache.spark.sql.catalyst.plans.logical.UnaryNode$$anonfun$6.apply(LogicalPlan.scala:323)
>> at
>> org.apache.spark.sql.catalyst.plans.logical.UnaryNode$$anonfun$6.apply(LogicalPlan.scala:323)
>> at
>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>> at
>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>> at scala.collection.immutable.List.foreach(List.scala:381)
>> at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
>> at scala.collection.immutable.List.map(List.scala:285)
>> at
>> org.apache.spark.sql.catalyst.plans.logical.UnaryNode.statistics(LogicalPlan.scala:323)
>> at
>> org.apache.spark.sql.execution.SparkStrategies$CanBroadcast$.unapply(SparkStrategies.scala:87)
>>
>> now sure what changed, this ran about a week ago without issues (in our
>> internal unit tests). it is fully reproducible, however when i tried to
>> minimize the issue i could not reproduce it by just creating data frames in
>> the repl with the same contents, so it probably has something to do with
>> way these are created (from Row objects and StructTypes).
>>
>> best, koert
>>
>

Re: some joins stopped working with spark 2.0.0 SNAPSHOT

Reply via email to