jingz-db commented on code in PR #49488: URL: https://github.com/apache/spark/pull/49488#discussion_r1950010855
########## sql/connect/common/src/main/scala/org/apache/spark/sql/connect/KeyValueGroupedDataset.scala: ########## @@ -526,6 +553,71 @@ private class KeyValueGroupedDatasetImpl[K, V, IK, IV]( } } + override protected[sql] def transformWithStateHelper[U: Encoder, S: Encoder]( + statefulProcessor: StatefulProcessor[K, V, U], + timeMode: TimeMode, + outputMode: OutputMode, + initialState: Option[sql.KeyValueGroupedDataset[K, S]] = None, + eventTimeColumnName: String = ""): Dataset[U] = { + val outputEncoder = agnosticEncoderFor[U] + val stateEncoder = agnosticEncoderFor[S] + + val inputEncoders: Seq[AgnosticEncoder[_]] = Seq(kEncoder, stateEncoder, ivEncoder) + val dummyGroupingFunc = SparkUserDefinedFunction( + function = UdfUtils.noOp[K, U](), + inputEncoders = inputEncoders, + outputEncoder = outputEncoder) + val udf = toExpr( + dummyGroupingFunc.apply( + inputEncoders.map(_ => col("*")): _*)).getCommonInlineUserDefinedFunction + + val initialStateImpl = if (initialState.isDefined) { + assert(initialState.get.isInstanceOf[KeyValueGroupedDatasetImpl[K, S, _, _]]) Review Comment: > I'd argue that the ClassCastException provides the user with more information than an assert that fails without an explanation. > That's why I'm asking to see whether this can be triggered by Spark's bug or users' bug. If this is former, this is not a huge problem as long as we don't lose debuggability on this (I don't expect users to debug on their own). If this is latter, this should be definitely classified as error class. Looks like in this case ClassCastException would provide enough info and no need to let assert swallow useful hints. I am removing the assert here and let it throw ClassCastException. This is a Spark bug and not a user bug. I am going to remove the assertion inside `FlatMapGroupsWithState` as well. It is also doing similar assertion. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org