dawidwys commented on PR #24526:
URL: https://github.com/apache/flink/pull/24526#issuecomment-2014857862

   Before I review the code let's settle on the behaviour first.
   
   @MartijnVisser What is your opinion on how the function should behave? 
Especially in the context of 
https://github.com/apache/flink/pull/23173#discussion_r1491044219 and handling 
duplicates.
   
   What should be the output of: `[1, 1, 1, 2] INTERSECT [1, 1, 2]`?
   1. [1,2] - Spark/Databricks/Presto
   2. [1,1,2] - Snowflake
   3. [1, 1, 1, 2] - (as far as I can tell the current behaviour of the PR)
   
   * 
[Snowflake](https://docs.snowflake.com/en/sql-reference/functions/array_intersection#usage-notes)
 has multi-set semantics.
   > If one array has N copies of a value, and the other array has M copies of 
the same value, then the number of copies in the returned array is the smaller 
of N or M. For example, if N is 4 and M is 2, then the returned value contains 
2 copies.
   
   * 
[Databricks](https://docs.databricks.com/en/sql/language-manual/functions/array_intersect.html#returns)
 deduplicates result 
([Spark](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.array_intersect.html)
 I presume has the same behaviour)
   > An ARRAY of matching type to array1 with no duplicates and elements 
contained in both array1 and array2.
   
   * 
[Presto](https://prestodb.io/docs/current/functions/array.html#array_intersect) 
does the same as Spark:
   > Returns an array of the elements in the intersection of x and y, without 
duplicates.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to