Hi All:
I am glad to know there is a 3.0 way of doing that, which is:
@Test
public void testEscapeXmlSupplementaryCharacters() {
CharSequenceTranslator escapeXml =
StringEscapeUtils.ESCAPE_XML.with(
NumericEntityEscaper.between(0x7f, Integer.MAX_VALUE) );
assertEquals("Supplementary character must be represented using a
single escape", "𣎴",
escapeXml.translate("\uD84C\uDFB4"));
but what about the test the way it was originally written?
// Example from https://issues.apache.org/jira/browse/LANG-728
assertEquals("Supplementary character must be represented using a
single escape", "𣎴",
StringEscapeUtils.escapeXml("\uD84C\uDFB4"));
// Example from See http://www.w3.org/International/questions/qa-escapes
assertEquals("Supplementary character must be represented using a
single escape", "𣎴",
StringEscapeUtils.escapeXml("\uD84C;\uDFB4;"));
It still fails.
Shouldn't the API be changed to work for this case too? The W3C seems to say
so: "you must use the single, code point value for that character" in:
* From http://www.w3.org/International/questions/qa-escapes
* </p>
* <blockquote>
* Supplementary characters are those Unicode characters that have code
points higher than the characters in
* the Basic Multilingual Plane (BMP). In UTF-16 a supplementary character
is encoded using two 16-bit surrogate code points from the
* BMP. Because of this, some people think that supplementary characters
need to be represented using two escapes, but this is incorrect
* – you must use the single, code point value for that character. For
example, use 𣎴 rather than ��.
* </blockquote>
Gary
-----Original Message-----
From: [email protected] [mailto:[email protected]]
Sent: Tuesday, July 19, 2011 0:58 AM
To: [email protected]
Subject: svn commit: r1148162 -
/commons/proper/lang/trunk/src/test/java/org/apache/commons/lang3/StringEscapeUtilsTest.java
Author: bayard
Date: Tue Jul 19 04:58:03 2011
New Revision: 1148162
URL: http://svn.apache.org/viewvc?rev=1148162&view=rev
Log:
Updating unit test for LANG-728 to work with Lang 3.0 way of using escapeXml
with > 0x7f characters
Modified:
commons/proper/lang/trunk/src/test/java/org/apache/commons/lang3/StringEscapeUtilsTest.java
Modified:
commons/proper/lang/trunk/src/test/java/org/apache/commons/lang3/StringEscapeUtilsTest.java
URL:
http://svn.apache.org/viewvc/commons/proper/lang/trunk/src/test/java/org/apache/commons/lang3/StringEscapeUtilsTest.java?rev=1148162&r1=1148161&r2=1148162&view=diff
==============================================================================
---
commons/proper/lang/trunk/src/test/java/org/apache/commons/lang3/StringEscapeUtilsTest.java
(original)
+++ commons/proper/lang/trunk/src/test/java/org/apache/commons/lang3/Str
+++ ingEscapeUtilsTest.java Tue Jul 19 04:58:03 2011
@@ -31,6 +31,9 @@ import org.apache.commons.io.IOUtils; import
org.junit.Ignore; import org.junit.Test;
+import org.apache.commons.lang3.text.translate.CharSequenceTranslator;
+import org.apache.commons.lang3.text.translate.UnicodeEscaper;
+
/**
* Unit tests for {@link StringEscapeUtils}.
*
@@ -333,15 +336,13 @@ public class StringEscapeUtilsTest {
* @see <a
href="http://www.w3.org/International/questions/qa-escapes">Using character
escapes in markup and CSS</a>
* @see <a
href="https://issues.apache.org/jira/browse/LANG-728">LANG-728</a>
*/
- @Ignore
@Test
public void testEscapeXmlSupplementaryCharacters() {
- // Example from https://issues.apache.org/jira/browse/LANG-728
- assertEquals("Supplementary character must be represented using a
single escape", "𣎴",
- StringEscapeUtils.escapeXml("\uD84C\uDFB4"));
- // Example from See
http://www.w3.org/International/questions/qa-escapes
- assertEquals("Supplementary character must be represented using a
single escape", "𣎴",
- StringEscapeUtils.escapeXml("\uD84C;\uDFB4;"));
+ CharSequenceTranslator escapeXml =
+ StringEscapeUtils.ESCAPE_XML.with(
+ UnicodeEscaper.between(0x7f, Integer.MAX_VALUE) );
+
+ assertEquals("Supplementary character must be represented using a
single escape", "\u233B4",
+ escapeXml.translate("\uD84C\uDFB4"));
}
// Tests issue #38569