Thanks for the explanation. Here are comments to your fix:
- Can you add some comments about the gist of the change to IBM943.c2b?
Summarizing the explanation below would be fine.
For the test case:
- Please wrap the @bug line appropriately.
- Those byte array/Strings can be "static final" outside the test method.
- Looks like the mapping is exactly the same with IBM943C for those code
points. Is that correct? If so, would you please add some comment as such?
- Typo: "round-tip" -> "round-trip"
Naoto
On 2/20/20 2:47 AM, Ichiroh Takiguchi wrote:
Hello Naoto.
I appreciate your comment.
The definition has not changed recently.
Applying the change will prevent the characters which are used on
Japanese PC from being converted to "?".
I checked IBM-943 definition and changelog.
Definitions and creation date are as follows:
(All definitions are valid)
03AF34B0.TPMAP120: May 20 1997 (Current OpenJDK)
34B003AF.RPMAP130: May 20 1997 (Proposed change, upward compatible)
34B003AF.RPMAP14A: Jul 29 1998 (Same as 34B003AF.RPMAP130)
34B003AF.RPMAP15A: Jan 8 2003 (Additionally, 13 characters are changed)
According to IBM943.map, OpenJDK JDK refers 03AF34B0.TPMAP120.
03AF34B0.TPMAP120 just has B2C conversion table only, no C2B definition.
34B003AF.RPMAP130 which has C2B definition was released on same date.
I assume C2B definition was not implemented at that time.
C2B definition for 34B003AF.RPMAP130 and 34B003AF.RPMAP14A are same,
only replacement character is changed.
34B003AF.RPMAP15A is the latest, but it's almost same as MS932.
If 34B003AF.RPMAP15A is applied, 03AF34B0.TPMAP14A is also required.
I'd like to add C2B definition without B2C definition because of
compatibility.
I don't want to apply 03AF34B0.TPMAP14A B2C definition.
So I'd like to apply 34B003AF.RPMAP130 definition.
Thanks,
Ichiroh Takiguchi
On 2020-02-19 07:33, naoto.s...@oracle.com wrote:
Hi Takiguchi-san,
Can you please elaborate the rationale for the change? It looks like
IBM943 chaset hasn't changed for a long time, at least from JDK8. Has
the mapping definition recently changed?
Naoto
On 2/17/20 10:23 AM, Ichiroh Takiguchi wrote:
Hello.
Could you review the fix ?
Bug: https://bugs.openjdk.java.net/browse/JDK-8235834
Change: https://cr.openjdk.java.net/~itakiguchi/8235834/webrev.00/
IBM-943 is for IBM Japanese PC encoding.
MS932 is for Microsoft Japanese Windows encoding.
IBM-943 charset encoder does not contain 5 compatible entries
compared to MS932 charset.
Thanks,
Ichiroh Takiguchi
IBM Japan, Ltd.