Dear core-libs developers,

I'd like to add the following overloads:

• Character.codePointCount(CharSequence seq)
• Character.codePointCount(char[] a)
• String.codePointCount(int beginIndex)
• StringBuffer.codePointCount()
• StringBuilder.codePointCount()

and created a patch (https://github.com/openjdk/jdk/pull/26461).

Why:

There have already been similar overloads with the start and end indicies by 
JSR 204 (JDK-4985217). They are thought to have been designed with a priority 
on versatility. They make the specification of indices mandatory, but have the 
following disadvantages:

1. The string expression have to be written twice. Unlike C#, Java has no 
equivalent of extended methods.
2. Unneccesary boundary checks are mixed in.
3. The most userland code tries to calculate the number of code points in the 
entire stirng.
4. Some other languages can count the number of code points in a single 
function without extra arguments (e.g. len() in Python3)

For 3., e.g.:

• VARCHAR in MySQL & PostgreSQL counts the number of characters in the unit of 
code points. e.g. VARCHAR(20) means that the limit is 20 code points, not 20 
UTF-16 code units (20 chars in Java)
• NIST Special Publication 800-63B stiplates that the password length must be 
counted as the unit of code points. (Quote from 
https://pages.nist.gov/800-63-3/sp800-63b.html#-5112-memorized-secret-verifiers 
: "For purposes of the above length requirements, each Unicode code point SHALL 
be counted as a single character.")

I would like to get agreement on these changes and would like to know what I 
have to do outside of GitHub (e.g how to submit CSRs). If you have a GitHub 
account, it would be helpful if you could reply to the PR. If not, you can 
reply directly to this email.

Best Regards,

Tatsunori Uchino 
https://github.com/tats-u/

Reply via email to