Re: CharsetEncoder.maxBytesPerChar()

2019-09-30 Thread Ulf Zibis
Hey Martin, great, that you got my issue. The link you shared is an interesting basis for this discussion. Maybe at some places e.g. in the "upfront specifications", additionally the term "UTF-16 char" or "UTF-16 code unit" could be helpful and then determining "char" or "{@code char}" as a short

Re: CharsetEncoder.maxBytesPerChar()

2019-09-27 Thread Martin Buchholz
Like Ulf, I am sometimes annoyed by the use of the "character" misnomer throughout the API docs, and would support an effort to use "character" the way that unicode.org uses it. "char" no longer represents a Unicode character, but at least it provides a short clear name, in the Java language, for "

Re: CharsetEncoder.maxBytesPerChar()

2019-09-26 Thread mark . reinhold
2019/9/24 13:00:21 -0700, ulf.zi...@cosoco.de: > Am 21.09.19 um 00:03 schrieb mark.reinh...@oracle.com: >> To avoid this confusion, a more verbose specification might read: >> * Returns the maximum number of $otype$s that will be produced for each >> * $itype$ of input. This value may be u

Re: CharsetEncoder.maxBytesPerChar()

2019-09-24 Thread Ulf Zibis
Am 21.09.19 um 00:03 schrieb mark.reinh...@oracle.com: > To avoid this confusion, a more verbose specification might read: > * Returns the maximum number of $otype$s that will be produced for each > * $itype$ of input. This value may be used to compute the worst-case > size > * o

Re: [14] RFR: 8230531: API Doc for CharsetEncoder.maxBytesPerChar() should be clearer about BOMs

2019-09-24 Thread Alan Bateman
On 23/09/2019 21:45, naoto.s...@oracle.com wrote: Hello, Please review the fix to the following issue: https://bugs.openjdk.java.net/browse/JDK-8230531 Relevant CSR (in draft) and proposed changeset are located at: [CSR]: https://bugs.openjdk.java.net/browse/JDK-8231319 [changeset]: https://c

Re: [14] RFR: 8230531: API Doc for CharsetEncoder.maxBytesPerChar() should be clearer about BOMs

2019-09-23 Thread Martin Buchholz
LGTM On Mon, Sep 23, 2019 at 1:48 PM wrote: > Hello, > > Please review the fix to the following issue: > > https://bugs.openjdk.java.net/browse/JDK-8230531 > > Relevant CSR (in draft) and proposed changeset are located at: > > [CSR]: https://bugs.openjdk.java.net/browse/JDK-8231319 > [changeset]

[14] RFR: 8230531: API Doc for CharsetEncoder.maxBytesPerChar() should be clearer about BOMs

2019-09-23 Thread naoto . sato
Hello, Please review the fix to the following issue: https://bugs.openjdk.java.net/browse/JDK-8230531 Relevant CSR (in draft) and proposed changeset are located at: [CSR]: https://bugs.openjdk.java.net/browse/JDK-8231319 [changeset]: https://cr.openjdk.java.net/~naoto/8230531/webrev.00/ The p

Re: CharsetEncoder.maxBytesPerChar()

2019-09-20 Thread naoto . sato
Hi Mark, Thank you for the crystal clear explanation. I will go ahead and clarify the method description. Naoto On 9/20/19 3:03 PM, mark.reinh...@oracle.com wrote: 2019/9/20 13:25:38 -0700, naoto.s...@oracle.com: I am looking at the following bug: https://bugs.openjdk.java.net/browse/JDK-8

Re: CharsetEncoder.maxBytesPerChar()

2019-09-20 Thread mark . reinhold
2019/9/20 13:25:38 -0700, naoto.s...@oracle.com: > I am looking at the following bug: > > https://bugs.openjdk.java.net/browse/JDK-8230531 > > and hoping someone who is familiar with the encoder will clear things > out. As in the bug report, the method description reads: > > -- > Returns the ma

CharsetEncoder.maxBytesPerChar()

2019-09-20 Thread naoto . sato
Hello, I am looking at the following bug: https://bugs.openjdk.java.net/browse/JDK-8230531 and hoping someone who is familiar with the encoder will clear things out. As in the bug report, the method description reads: -- Returns the maximum number of bytes that will be produced for each cha

Re: RFR [8058875]: CharsetEncoder.maxBytesPerChar() should return 4 for UTF-8

2014-09-23 Thread Martin Buchholz
ate: Tue, 23 Sep 2014 11:37:07 +0400 > From: Ivan Gerasimov > To: Xueming Shen , Martin Buchholz > > Cc: nio-...@openjdk.java.net, core-libs-dev > > Subject: Re: RFR [8058875]: CharsetEncoder.maxBytesPerChar() should > return 4 for UTF-8 > Me

RE: RFR [8058875]: CharsetEncoder.maxBytesPerChar() should return 4 for UTF-8

2014-09-23 Thread Salter, Thomas A
2014 11:37:07 +0400 From: Ivan Gerasimov To: Xueming Shen , Martin Buchholz Cc: nio-...@openjdk.java.net, core-libs-dev Subject: Re: RFR [8058875]: CharsetEncoder.maxBytesPerChar() should return 4 for UTF-8 Message-ID: <54212323.5080...@oracle.com> Content-Type

Re: RFR [8058875]: CharsetEncoder.maxBytesPerChar() should return 4 for UTF-8

2014-09-23 Thread Ivan Gerasimov
Martin, Sherman thanks for clarification! Closing the bug as not a bug. The "character" in the nio Charset and CharDe/Encoder is specified as "sixteen-bit Unicode code unit", so it is reasonable to interpret the "character" in the "maximum number of bytes that will be produced for each charact

Re: RFR [8058875]: CharsetEncoder.maxBytesPerChar() should return 4 for UTF-8

2014-09-22 Thread Martin Buchholz
34 PM, Mark Thomas wrote: > On 22/09/2014 22:23, Martin Buchholz wrote: > > I think you are mistaken. It's maxBytesPerChar, not maxBytesPerCodepoint! > > You are going to have to explain that some more. The Javadoc for > CharsetEncoder.maxBytesPerChar() is explicit: >

Re: RFR [8058875]: CharsetEncoder.maxBytesPerChar() should return 4 for UTF-8

2014-09-22 Thread Mark Thomas
On 22/09/2014 22:46, Xueming Shen wrote: > On 09/22/2014 01:14 PM, Ivan Gerasimov wrote: >> Hello! >> >> The UTF-8 encoding allows characters that are 4 bytes long. >> However, CharsetEncoder.maxBytesPerChar() currently returns 3.0, which >> is not always enough.

Re: RFR [8058875]: CharsetEncoder.maxBytesPerChar() should return 4 for UTF-8

2014-09-22 Thread Mark Thomas
On 22/09/2014 22:23, Martin Buchholz wrote: > I think you are mistaken. It's maxBytesPerChar, not maxBytesPerCodepoint! You are going to have to explain that some more. The Javadoc for CharsetEncoder.maxBytesPerChar() is explicit: Returns the maximum number of bytes that will be prod

Re: RFR [8058875]: CharsetEncoder.maxBytesPerChar() should return 4 for UTF-8

2014-09-22 Thread Xueming Shen
On 09/22/2014 01:14 PM, Ivan Gerasimov wrote: Hello! The UTF-8 encoding allows characters that are 4 bytes long. However, CharsetEncoder.maxBytesPerChar() currently returns 3.0, which is not always enough. Would you please review the simple fix for this issue? BUGURL: https

Re: RFR [8058875]: CharsetEncoder.maxBytesPerChar() should return 4 for UTF-8

2014-09-22 Thread Martin Buchholz
I think you are mistaken. It's maxBytesPerChar, not maxBytesPerCodepoint! changeset: 3116:b44704ce8a08 user:sherman date:2010-11-19 12:58 -0800 6957230: CharsetEncoder.maxBytesPerChar() reports 4 for UTF-8; should be 3 Summary: changged utf-8's CharsetEncoder.maxBytes

RFR [8058875]: CharsetEncoder.maxBytesPerChar() should return 4 for UTF-8

2014-09-22 Thread Ivan Gerasimov
Hello! The UTF-8 encoding allows characters that are 4 bytes long. However, CharsetEncoder.maxBytesPerChar() currently returns 3.0, which is not always enough. Would you please review the simple fix for this issue? BUGURL: https://bugs.openjdk.java.net/browse/JDK-8058875 WEBREV: http

Re: Code review request 6957230: CharsetEncoder.maxBytesPerChar() reports 4 for UTF-8; should be 3

2010-11-25 Thread Ulf Zibis
No answer ? :-( -Ulf Am 19.11.2010 18:00, schrieb Ulf Zibis: IMO, you consequently should additionally correct the javadoc according the evaluation of bug 6957230. -Ulf Am 19.11.2010 08:55, schrieb Xueming Shen: Alan, Last time when Martin and I discussed this issue we agreed that the s

Re: Code review request 6957230: CharsetEncoder.maxBytesPerChar() reports 4 for UTF-8; should be 3

2010-11-19 Thread Ulf Zibis
IMO, you consequently should additionally correct the javadoc according the evaluation of bug 6957230. -Ulf Am 19.11.2010 08:55, schrieb Xueming Shen: Alan, Last time when Martin and I discussed this issue we agreed that the submitter is right about this. (The "charset" is a mapping between

Re: Code review request 6957230: CharsetEncoder.maxBytesPerChar() reports 4 for UTF-8; should be 3

2010-11-19 Thread Alan Bateman
Xueming Shen wrote: Alan, Last time when Martin and I discussed this issue we agreed that the submitter is right about this. (The "charset" is a mapping between "a sequence of bytes" and a "sequence of sisteen-bit Unicode characters, so the character discussed here should be a utf-16 charac

Code review request 6957230: CharsetEncoder.maxBytesPerChar() reports 4 for UTF-8; should be 3

2010-11-19 Thread Xueming Shen
Alan, Last time when Martin and I discussed this issue we agreed that the submitter is right about this. (The "charset" is a mapping between "a sequence of bytes" and a "sequence of sisteen-bit Unicode characters, so the character discussed here should be a utf-16 character...) http://cr.op