[ https://issues.apache.org/jira/browse/SPARK-51475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17935525#comment-17935525 ]
Robert Joseph Evans commented on SPARK-51475: --------------------------------------------- I want to add that the behavior if we make it a an array of a struct with one item in it, or an array of an array with one item in it behave as expected and -0.0 is considered equal to 0.0 > ArrayDistinct Producing Inconsistent Behavior For -0.0 and +0.0 > --------------------------------------------------------------- > > Key: SPARK-51475 > URL: https://issues.apache.org/jira/browse/SPARK-51475 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.5.0, 3.4.4, 3.5.5 > Reporter: Warrick He > Priority: Major > Labels: correctness > > This impacts array_distinct. This was tested on Spark versions 3.5.5, 3.5.0, > and 3.4.4, but it likely affects all versions. > Problem: inconsistent behavior for 0.0 and -0.0. See below (ran on 3.5.5) > I'm not sure what the desired behavior is, does Spark want to follow the IEEE > standard and set them to equal, giving only -0.0 or 0.0, or should it > consider these distinct? > {quote}>>> spark.createDataFrame([([0.0, 6.0 -0.0],)], > ['values']).createOrReplaceTempView("tab") > >>> spark.sql("select array_distinct(values) from tab").show() > +----------------------+ > |array_distinct(values)| > +----------------------+ > | [0.0, 6.0]| > +----------------------+ > > >>> spark.createDataFrame([([0.0, -0.0, 6.0],)], > >>> ['values']).createOrReplaceTempView("tab") > >>> spark.sql("select array_distinct(values) from tab").show() > +----------------------+ > |array_distinct(values)| > +----------------------+ > | [0.0, -0.0, 6.0]| > +----------------------+ > {quote} > This issue could be related to the implementation of OpenHashSet. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org