[ 
https://issues.apache.org/jira/browse/SPARK-51475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17935525#comment-17935525
 ] 

Robert Joseph Evans commented on SPARK-51475:
---------------------------------------------

I want to add that the behavior if we make it a an array of a struct with one 
item in it, or an array of an array with one item in it behave as expected and 
-0.0 is considered equal to 0.0

> ArrayDistinct Producing Inconsistent Behavior For -0.0 and +0.0
> ---------------------------------------------------------------
>
>                 Key: SPARK-51475
>                 URL: https://issues.apache.org/jira/browse/SPARK-51475
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.5.0, 3.4.4, 3.5.5
>            Reporter: Warrick He
>            Priority: Major
>              Labels: correctness
>
> This impacts array_distinct. This was tested on Spark versions 3.5.5, 3.5.0, 
> and 3.4.4, but it likely affects all versions.
> Problem: inconsistent behavior for 0.0 and -0.0. See below (ran on 3.5.5)
> I'm not sure what the desired behavior is, does Spark want to follow the IEEE 
> standard and set them to equal, giving only -0.0 or 0.0, or should it 
> consider these distinct?
> {quote}>>> spark.createDataFrame([([0.0, 6.0 -0.0],)], 
> ['values']).createOrReplaceTempView("tab")
> >>> spark.sql("select array_distinct(values) from tab").show()
> +----------------------+
> |array_distinct(values)|
> +----------------------+
> |            [0.0, 6.0]|
> +----------------------+
>  
> >>> spark.createDataFrame([([0.0, -0.0, 6.0],)], 
> >>> ['values']).createOrReplaceTempView("tab")
> >>> spark.sql("select array_distinct(values) from tab").show()
> +----------------------+
> |array_distinct(values)|
> +----------------------+
> |      [0.0, -0.0, 6.0]|
> +----------------------+
> {quote}
> This issue could be related to the implementation of OpenHashSet.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to