dawidwys commented on PR #24526: URL: https://github.com/apache/flink/pull/24526#issuecomment-2014857862
Before I review the code let's settle on the behaviour first. @MartijnVisser What is your opinion on how the function should behave? Especially in the context of https://github.com/apache/flink/pull/23173#discussion_r1491044219 and handling duplicates. What should be the output of: `[1, 1, 1, 2] INTERSECT [1, 1, 2]`? 1. [1,2] - Spark/Databricks/Presto 2. [1,1,2] - Snowflake 3. [1, 1, 1, 2] - (as far as I can tell the current behaviour of the PR) * [Snowflake](https://docs.snowflake.com/en/sql-reference/functions/array_intersection#usage-notes) has multi-set semantics. > If one array has N copies of a value, and the other array has M copies of the same value, then the number of copies in the returned array is the smaller of N or M. For example, if N is 4 and M is 2, then the returned value contains 2 copies. * [Databricks](https://docs.databricks.com/en/sql/language-manual/functions/array_intersect.html#returns) deduplicates result ([Spark](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.array_intersect.html) I presume has the same behaviour) > An ARRAY of matching type to array1 with no duplicates and elements contained in both array1 and array2. * [Presto](https://prestodb.io/docs/current/functions/array.html#array_intersect) does the same as Spark: > Returns an array of the elements in the intersection of x and y, without duplicates. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org