Blizzara opened a new issue, #1052:
URL: https://github.com/apache/datafusion-comet/issues/1052

   ### Describe the bug
   
   DataFusion's initcap behaves differently than Spark's. While both do 
"upper-case the first letter of each word and lowercase others", Spark 
considers as words anything separated by whitespace (' '), while DataFusion 
considers anything separated by non-ascii-alphanumeric as words. (DF's code 
would also fail to uppercase or lowercase non-ascii chars, but that doesn't 
materialize as a separate issue as it considers them separators already in the 
first place.)
   
   https://github.com/apache/datafusion-comet/pull/1051 shows the problem by 
adding two cases to the test, one using a dash and one using non-ascii letters 
(from Finnish).
   
   ```
   == Results ==
   !== Correct Answer - 7 ==       == Spark Answer - 7 ==
    struct<initcap(name):string>   struct<initcap(name):string>
    [James Smith]                  [James Smith]
    [James Smith]                  [James Smith]
   ![James Ähtäri]                 [James äHtäRi]
    [Michael Rose]                 [Michael Rose]
    [Rames Rose]                   [Rames Rose]
   ![Robert Rose-smith]            [Robert Rose-Smith]
    [Robert Williams]              [Robert Williams]
    ```
   
   
   ### Steps to reproduce
   
   Call initcap with an input containing non-ascii-alphanumeric non-whitespace 
characters
   
   ### Expected behavior
   
   Match Spark
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to