[ 
https://issues.apache.org/jira/browse/MNG-8240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17947329#comment-17947329
 ] 

Matthew Donoughe commented on MNG-8240:
---------------------------------------

The original problem cannot be reproduced because it was fixed in 
https://github.com/apache/maven/commit/688c9c8d011f07ae8cbb2285959a319922e98219.

However, non-ascii digits are still somehow being interpreted as numbers.

{noformat}
$ java -cp compat/maven-artifact/target/classes 
org.apache.maven.artifact.versioning.ComparableVersion ०००००००००००००००००००1 
0000000000000000000000002
Display parameters as parsed by Maven (in canonical form and as a list of 
tokens) and comparison result:
1. ०००००००००००००००००००1 -> 1; tokens: [1]
   ०००००००००००००००००००1 < 0000000000000000000000002
2. 0000000000000000000000002 -> 2; tokens: [2]
{noformat}

Worse, it's inconsistent:

{noformat}
$ java -cp compat/maven-artifact/target/classes 
org.apache.maven.artifact.versioning.ComparableVersion ०००००००००००००००००००० 
0000000000000000000000000
Display parameters as parsed by Maven (in canonical form and as a list of 
tokens) and comparison result:
1. ०००००००००००००००००००० -> ००००००००००००००००००००; tokens: [००००००००००००००००००००]
   ०००००००००००००००००००० > 0000000000000000000000000
2. 0000000000000000000000000 -> ; tokens: []
$ java -cp compat/maven-artifact/target/classes 
org.apache.maven.artifact.versioning.ComparableVersion ०००००००००००००००००००2 
0000000000000000000000001
Display parameters as parsed by Maven (in canonical form and as a list of 
tokens) and comparison result:
1. ०००००००००००००००००००2 -> 2; tokens: [2]
   ०००००००००००००००००००2 < 0000000000000000000000001
2. 0000000000000000000000001 -> 1; tokens: [1]
$ java -cp compat/maven-artifact/target/classes 
org.apache.maven.artifact.versioning.ComparableVersion a०००००००००००००००००००1 
a0000000000000000000000002
Display parameters as parsed by Maven (in canonical form and as a list of 
tokens) and comparison result:
1. a०००००००००००००००००००1 -> alpha1; tokens: [alpha1]
   a०००००००००००००००००००1 > a0000000000000000000000002
2. a0000000000000000000000002 -> alpha2; tokens: [alpha2]
{noformat}

I think the problem is that the ASCII 2 in ०००००००००००००००००००2 causes 
०००००००००००००००००००2 to be treated as a CombinationItem and then 
CombinationItem uses the same old logic that supports all Unicode BMP Nd class 
digits.

> ComparableVersion incorrectly handles leading Unicode Nd class zeros
> --------------------------------------------------------------------
>
>                 Key: MNG-8240
>                 URL: https://issues.apache.org/jira/browse/MNG-8240
>             Project: Maven
>          Issue Type: Bug
>         Environment: openjdk version "1.8.0_412"
> OpenJDK Runtime Environment (Temurin)(build 1.8.0_412-b08)
> OpenJDK 64-Bit Server VM (Temurin)(build 25.412-b08, mixed mode)
>            Reporter: Matthew Donoughe
>            Assignee: Elliotte Rusty Harold
>            Priority: Major
>
> ComparableVersion supports positive decimal numbers of unlimited size. As an 
> optimization, the size of the number (in UCS-2 codepoints) determines whether 
> the value should be converted into an int or a long or a BigDecimal. As 
> another optimization, because the size of the value affects the data type, an 
> int is always smaller than a long which is always smaller than a BigDecimal. 
> Leading 0s are removed to avoid the case where 00000000000000000001 > 2.
> However, it's specifically '0', DIGIT ZERO, 0x0030, that is being removed. 
> The code that segments the version string into items uses 
> [Character.isDigit|https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/lang/Character.html#isDigit(char)],
>  which uses 
> [Character.getType|https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/lang/Character.html#getType(char)]
>  to check for 
> [DECIMAL_DIGIT_NUMBER|https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/lang/Character.html#DECIMAL_DIGIT_NUMBER]
>  corresponding to the [Unicode Nd 
> class|https://www.fileformat.info/info/unicode/category/Nd/list.htm], and 
> parsing into a number eventually uses 
> [Character.digit|https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/lang/Character.html#digit(char,int)]
>  which likewise supports Unicode Nd class digits. This leads to the following 
> case:
> {noformat}
> java -jar 
> ~/.m2/repository/org/apache/maven/maven-artifact/3.9.4/maven-artifact-3.9.4.jar
>  ०००००००००००००००००००1 0000000000000000000000002
> Display parameters as parsed by Maven (in canonical form and as a list of 
> tokens) and comparison result:
> 1. ०००००००००००००००००००1 -> 1; tokens: [1]
>    ०००००००००००००००००००1 > 0000000000000000000000002
> 2. 0000000000000000000000002 -> 2; tokens: [2]{noformat}
> A 1 with 19 leading zeros is parsed as a BigDecimal, and a 2 with 24 leading 
> zeros is parsed as an int, so therefore 1 > 2. However, the canonicalization 
> still works correctly, so if you canonicalize the versions before comparing 
> them you get 1 < 2 as expected. I don't know if that's better or worse 
> because it can lead to the order being unstable.
> I guess the easy solution is to use Character.digit to check for int 0 
> instead of char '0'.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to