Re: I'd like add no-argument overloads to CharSequence, String, and StringBuilder (JDK-8364007)

Roger Riggs Wed, 20 Aug 2025 14:49:27 -0700

HI,

This seems like a reasonable idea.
For CharSequence, I would add them as default methods on CharSequence
and include the API Character.codePointCount(csq, begin, end)).
The char array version will still need to be in Character.


Regards, Roger

On 8/11/25 7:37 PM, Uchino Tatsunori wrote:

Dear Chen-san,

The beginIndex there is just a mistake that must have been removed.

String.codePointCount()

is the correct suggestion, as you can imagine. I am sorry for theconfusion.


Regards,

Tatsunori Uchino

2025/08/12 7:29 Chen Liang <chen.l.li...@oracle.com>:

    Hi Uchino, I think your request is sensible in general.

    Do you intend to require a beginIndex for the codePointCount for
    String? I think a no-arg version suffices.

    Also forwarding this to i18n-dev as it is the locale-related list.

    P.S. When you reply, make sure you click "Reply all" so all the
    recipients of this current mail gets your reply. Otherwise, the
    reply is only sent to me, and others on the list won't see your reply.

    Regards, Chen
    ------------------------------------------------------------------------
    *From:* core-libs-dev <core-libs-dev-r...@openjdk.org> on behalf
    of Uchino Tatsunori <tat...@live.jp>
    *Sent:* Monday, August 11, 2025 6:54 AM
    *To:* core-libs-dev@openjdk.org <core-libs-dev@openjdk.org>
    *Subject:* I'd like add no-argument overloads to CharSequence,
    String, and StringBuilder (JDK-8364007)
    Dear core-libs developers,

    I'd like to add the following overloads:

    • Character.codePointCount(CharSequence seq)
    • Character.codePointCount(char[] a)
    • String.codePointCount(int beginIndex)
    • StringBuffer.codePointCount()
    • StringBuilder.codePointCount()

    and created a patch (https://github.com/openjdk/jdk/pull/26461).

    Why:

    There have already been similar overloads with the start and end
    indicies by JSR 204 (JDK-4985217). They are thought to have been
    designed with a priority on versatility. They make the
    specification of indices mandatory, but have the following
    disadvantages:

    1. The string expression have to be written twice. Unlike C#, Java
    has no equivalent of extended methods.
    2. Unneccesary boundary checks are mixed in.
    3. The most userland code tries to calculate the number of code
    points in the entire stirng.
    4. Some other languages can count the number of code points in a
    single function without extra arguments (e.g. len() in Python3)

    For 3., e.g.:

    • VARCHAR in MySQL & PostgreSQL counts the number of characters in
    the unit of code points. e.g. VARCHAR(20) means that the limit is
    20 code points, not 20 UTF-16 code units (20 chars in Java)
    • NIST Special Publication 800-63B stiplates that the password
    length must be counted as the unit of code points. (Quote from
    
https://pages.nist.gov/800-63-3/sp800-63b.html#-5112-memorized-secret-verifiers
    : "For purposes of the above length requirements, each Unicode
    code point SHALL be counted as a single character.")

    I would like to get agreement on these changes and would like to
    know what I have to do outside of GitHub (e.g how to submit CSRs).
    If you have a GitHub account, it would be helpful if you could
    reply to the PR. If not, you can reply directly to this email.

    Best Regards,

    Tatsunori Uchino
    https://github.com/tats-u/

Re: I'd like add no-argument overloads to CharSequence, String, and StringBuilder (JDK-8364007)

Reply via email to