Re: some joins stopped working with spark 2.0.0 SNAPSHOT

Jonathan Kelly Sat, 27 Feb 2016 11:27:07 -0800

If you want to find what commit caused it, try out the "git bisect" command.
On Sat, Feb 27, 2016 at 11:06 AM Koert Kuipers <ko...@tresata.com> wrote:


> https://issues.apache.org/jira/browse/SPARK-13531
>
> On Sat, Feb 27, 2016 at 3:49 AM, Reynold Xin <r...@databricks.com> wrote:
>
>> Can you file a JIRA ticket?
>>
>>
>> On Friday, February 26, 2016, Koert Kuipers <ko...@tresata.com> wrote:
>>
>>> dataframe df1:
>>> schema:
>>> StructType(StructField(x,IntegerType,true))
>>> explain:
>>> == Physical Plan ==
>>> MapPartitions <function1>, obj#135: object, [if (input[0,
>>> object].isNullAt) null else input[0, object].get AS x#128]
>>> +- MapPartitions <function1>, createexternalrow(if (isnull(x#9)) null
>>> else x#9), [input[0, object] AS obj#135]
>>>    +- WholeStageCodegen
>>>       :  +- Project [_1#8 AS x#9]
>>>       :     +- Scan ExistingRDD[_1#8]
>>> show:
>>> +---+
>>> |  x|
>>> +---+
>>> |  2|
>>> |  3|
>>> +---+
>>>
>>>
>>> dataframe df2:
>>> schema:
>>> StructType(StructField(x,IntegerType,true),
>>> StructField(y,StringType,true))
>>> explain:
>>> == Physical Plan ==
>>> MapPartitions <function1>, createexternalrow(x#2, if (isnull(y#3)) null
>>> else y#3.toString), [if (input[0, object].isNullAt) null else input[0,
>>> object].get AS x#130,if (input[0, object].isNullAt) null else
>>> staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType,
>>> fromString, input[0, object].get, true) AS y#131]
>>> +- WholeStageCodegen
>>>    :  +- Project [_1#0 AS x#2,_2#1 AS y#3]
>>>    :     +- Scan ExistingRDD[_1#0,_2#1]
>>> show:
>>> +---+---+
>>> |  x|  y|
>>> +---+---+
>>> |  1|  1|
>>> |  2|  2|
>>> |  3|  3|
>>> +---+---+
>>>
>>>
>>> i run:
>>> df1.join(df2, Seq("x")).show
>>>
>>> i get:
>>> java.lang.UnsupportedOperationException: No size estimation available
>>> for objects.
>>> at org.apache.spark.sql.types.ObjectType.defaultSize(ObjectType.scala:41)
>>> at
>>> org.apache.spark.sql.catalyst.plans.logical.UnaryNode$$anonfun$6.apply(LogicalPlan.scala:323)
>>> at
>>> org.apache.spark.sql.catalyst.plans.logical.UnaryNode$$anonfun$6.apply(LogicalPlan.scala:323)
>>> at
>>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>>> at
>>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>>> at scala.collection.immutable.List.foreach(List.scala:381)
>>> at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
>>> at scala.collection.immutable.List.map(List.scala:285)
>>> at
>>> org.apache.spark.sql.catalyst.plans.logical.UnaryNode.statistics(LogicalPlan.scala:323)
>>> at
>>> org.apache.spark.sql.execution.SparkStrategies$CanBroadcast$.unapply(SparkStrategies.scala:87)
>>>
>>> now sure what changed, this ran about a week ago without issues (in our
>>> internal unit tests). it is fully reproducible, however when i tried to
>>> minimize the issue i could not reproduce it by just creating data frames in
>>> the repl with the same contents, so it probably has something to do with
>>> way these are created (from Row objects and StructTypes).
>>>
>>> best, koert
>>>
>>
>

Re: some joins stopped working with spark 2.0.0 SNAPSHOT

Reply via email to