HI,
This seems like a reasonable idea.
For CharSequence, I would add them as default methods on CharSequence
and include the API Character.codePointCount(csq, begin, end)).
The char array version will still need to be in Character.
Regards, Roger
On 8/11/25 7:37 PM, Uchino Tatsunori wrote:
Dear Chen-san,
The beginIndex there is just a mistake that must have been removed.
String.codePointCount()
is the correct suggestion, as you can imagine. I am sorry for the
confusion.
Regards,
Tatsunori Uchino
2025/08/12 7:29 Chen Liang <chen.l.li...@oracle.com>:
Hi Uchino, I think your request is sensible in general.
Do you intend to require a beginIndex for the codePointCount for
String? I think a no-arg version suffices.
Also forwarding this to i18n-dev as it is the locale-related list.
P.S. When you reply, make sure you click "Reply all" so all the
recipients of this current mail gets your reply. Otherwise, the
reply is only sent to me, and others on the list won't see your reply.
Regards, Chen
------------------------------------------------------------------------
*From:* core-libs-dev <core-libs-dev-r...@openjdk.org> on behalf
of Uchino Tatsunori <tat...@live.jp>
*Sent:* Monday, August 11, 2025 6:54 AM
*To:* core-libs-dev@openjdk.org <core-libs-dev@openjdk.org>
*Subject:* I'd like add no-argument overloads to CharSequence,
String, and StringBuilder (JDK-8364007)
Dear core-libs developers,
I'd like to add the following overloads:
• Character.codePointCount(CharSequence seq)
• Character.codePointCount(char[] a)
• String.codePointCount(int beginIndex)
• StringBuffer.codePointCount()
• StringBuilder.codePointCount()
and created a patch (https://github.com/openjdk/jdk/pull/26461).
Why:
There have already been similar overloads with the start and end
indicies by JSR 204 (JDK-4985217). They are thought to have been
designed with a priority on versatility. They make the
specification of indices mandatory, but have the following
disadvantages:
1. The string expression have to be written twice. Unlike C#, Java
has no equivalent of extended methods.
2. Unneccesary boundary checks are mixed in.
3. The most userland code tries to calculate the number of code
points in the entire stirng.
4. Some other languages can count the number of code points in a
single function without extra arguments (e.g. len() in Python3)
For 3., e.g.:
• VARCHAR in MySQL & PostgreSQL counts the number of characters in
the unit of code points. e.g. VARCHAR(20) means that the limit is
20 code points, not 20 UTF-16 code units (20 chars in Java)
• NIST Special Publication 800-63B stiplates that the password
length must be counted as the unit of code points. (Quote from
https://pages.nist.gov/800-63-3/sp800-63b.html#-5112-memorized-secret-verifiers
: "For purposes of the above length requirements, each Unicode
code point SHALL be counted as a single character.")
I would like to get agreement on these changes and would like to
know what I have to do outside of GitHub (e.g how to submit CSRs).
If you have a GitHub account, it would be helpful if you could
reply to the PR. If not, you can reply directly to this email.
Best Regards,
Tatsunori Uchino
https://github.com/tats-u/