yhuang-db commented on code in PR #51505:
URL: https://github.com/apache/spark/pull/51505#discussion_r2430820005
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproxTopKAggregates.scala:
##########
@@ -317,8 +322,51 @@ object ApproxTopK {
def getSketchStateDataType(itemDataType: DataType): StructType =
StructType(
StructField("sketch", BinaryType, nullable = false) ::
+ StructField("maxItemsTracked", IntegerType, nullable = false) ::
StructField("itemDataType", itemDataType) ::
- StructField("maxItemsTracked", IntegerType, nullable = false) :: Nil)
+ StructField("itemDataTypeDDL", StringType, nullable = false) :: Nil)
+
+ def dataTypeToDDL(dataType: DataType): String = dataType match {
+ case _: StringType =>
+ // Hide collation information in DDL format
Review Comment:
Yes. IIUC, this test runs for all expressions, and assert that for string
utf8Binary and string utf8Lcase the expression should have the same output, or
throw the same exception.
If I force to output collation in toDDL/fromDDL, approx_top_k_accumulate has
different outputs and fails on assertion
> ArraySeq("{[04 01 0a 03 03 00 00 00 01 00 00 00 00 00 00 00 01 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 0c 00 00 00 64 75 6d
6d 79 20 73 74 72 69 6e 67], 5, null, item string collate utf8_binary not
null}") did not equal ArraySeq("{[04 01 0a 03 03 00 00 00 01 00 00 00 00 00 00
00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 0c
00 00 00 64 75 6d 6d 79 20 73 74 72 69 6e 67], 5, null, item string collate
utf8_lcase not null}")
If I simply use `StructField("item", dataType, nullable = false).toDDL` for
string, approx_top_k_accumulate still has different outputs and fails on
assertion
>ArraySeq("{[04 01 0a 03 03 00 00 00 01 00 00 00 00 00 00 00 01 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 0c 00 00 00 64 75 6d
6d 79 20 73 74 72 69 6e 67], 5, null, item string not null}") did not equal
ArraySeq("{[04 01 0a 03 03 00 00 00 01 00 00 00 00 00 00 00 01 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 0c 00 00 00 64 75 6d 6d
79 20 73 74 72 69 6e 67], 5, null, item string collate utf8_lcase not null}")
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]