Majid Hajiheidari created SPARK-46349:
-----------------------------------------

             Summary: Prevent SortOrder from Accepting Nested SortOrder 
Instances
                 Key: SPARK-46349
                 URL: https://issues.apache.org/jira/browse/SPARK-46349
             Project: Spark
          Issue Type: Bug
          Components: PySpark, SQL
    Affects Versions: 4.0.0
            Reporter: Majid Hajiheidari


Hello everyone,

This is my first contribution to the project. I welcome any feedback and edits 
to improve this pull request.Currently, it's possible to create redundant sort 
expressions in both Scala and Python APIs, leading to potentially incorrect and 
confusing SQL statements. For example:

Scala:
spark.range(10).orderBy($"id".desc.asc).show()
 
Python:
spark.range(10).orderBy(f.desc('id'), ascending=False).show()
 
Such usage generates SQL like order by id DESC NULLS LAST DESC NULLS LAST, 
causing non-descriptive error messages.

I created a pull request for handling the issue. This pull request introduces a 
constraint in the SortOrder class, ensuring that its child cannot be another 
instance of SortOrder. This change prevents the creation of nested, redundant 
sort expressions.

Additionally, in PySpark's DataFrame.sort, there's an ascending keyword 
argument that could conflict with already sorted expressions. I've added an 
exception handler to generate more descriptive error messages in such cases.

A test case has been added to verify that no double ordering occurs after this 
fix.

 

I look forward to your feedback and thank you for considering this contribution.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to