So you're not saying that we have to escape > 0x7f (old behaviour), but that we have to escape any supplementary characters?
Hen On Tue, Jul 19, 2011 at 7:28 AM, Gary Gregory <[email protected]> wrote: > Hi All: > > I am glad to know there is a 3.0 way of doing that, which is: > > @Test > public void testEscapeXmlSupplementaryCharacters() { > CharSequenceTranslator escapeXml = > StringEscapeUtils.ESCAPE_XML.with( > NumericEntityEscaper.between(0x7f, Integer.MAX_VALUE) ); > > assertEquals("Supplementary character must be represented using a > single escape", "𣎴", > escapeXml.translate("\uD84C\uDFB4")); > > but what about the test the way it was originally written? > > // Example from https://issues.apache.org/jira/browse/LANG-728 > assertEquals("Supplementary character must be represented using a > single escape", "𣎴", > StringEscapeUtils.escapeXml("\uD84C\uDFB4")); > // Example from See > http://www.w3.org/International/questions/qa-escapes > assertEquals("Supplementary character must be represented using a > single escape", "𣎴", > StringEscapeUtils.escapeXml("\uD84C;\uDFB4;")); > > It still fails. > > Shouldn't the API be changed to work for this case too? The W3C seems to say > so: "you must use the single, code point value for that character" in: > > * From http://www.w3.org/International/questions/qa-escapes > * </p> > * <blockquote> > * Supplementary characters are those Unicode characters that have code > points higher than the characters in > * the Basic Multilingual Plane (BMP). In UTF-16 a supplementary character > is encoded using two 16-bit surrogate code points from the > * BMP. Because of this, some people think that supplementary characters > need to be represented using two escapes, but this is incorrect > * – you must use the single, code point value for that character. For > example, use 𣎴 rather than ��. > * </blockquote> > > Gary > > -----Original Message----- > From: [email protected] [mailto:[email protected]] > Sent: Tuesday, July 19, 2011 0:58 AM > To: [email protected] > Subject: svn commit: r1148162 - > /commons/proper/lang/trunk/src/test/java/org/apache/commons/lang3/StringEscapeUtilsTest.java > > Author: bayard > Date: Tue Jul 19 04:58:03 2011 > New Revision: 1148162 > > URL: http://svn.apache.org/viewvc?rev=1148162&view=rev > Log: > Updating unit test for LANG-728 to work with Lang 3.0 way of using escapeXml > with > 0x7f characters > > Modified: > > commons/proper/lang/trunk/src/test/java/org/apache/commons/lang3/StringEscapeUtilsTest.java > > Modified: > commons/proper/lang/trunk/src/test/java/org/apache/commons/lang3/StringEscapeUtilsTest.java > URL: > http://svn.apache.org/viewvc/commons/proper/lang/trunk/src/test/java/org/apache/commons/lang3/StringEscapeUtilsTest.java?rev=1148162&r1=1148161&r2=1148162&view=diff > ============================================================================== > --- > commons/proper/lang/trunk/src/test/java/org/apache/commons/lang3/StringEscapeUtilsTest.java > (original) > +++ commons/proper/lang/trunk/src/test/java/org/apache/commons/lang3/Str > +++ ingEscapeUtilsTest.java Tue Jul 19 04:58:03 2011 > @@ -31,6 +31,9 @@ import org.apache.commons.io.IOUtils; import > org.junit.Ignore; import org.junit.Test; > > +import org.apache.commons.lang3.text.translate.CharSequenceTranslator; > +import org.apache.commons.lang3.text.translate.UnicodeEscaper; > + > /** > * Unit tests for {@link StringEscapeUtils}. > * > @@ -333,15 +336,13 @@ public class StringEscapeUtilsTest { > * @see <a > href="http://www.w3.org/International/questions/qa-escapes">Using character > escapes in markup and CSS</a> > * @see <a > href="https://issues.apache.org/jira/browse/LANG-728">LANG-728</a> > */ > - @Ignore > @Test > public void testEscapeXmlSupplementaryCharacters() { > - // Example from https://issues.apache.org/jira/browse/LANG-728 > - assertEquals("Supplementary character must be represented using a > single escape", "𣎴", > - StringEscapeUtils.escapeXml("\uD84C\uDFB4")); > - // Example from See > http://www.w3.org/International/questions/qa-escapes > - assertEquals("Supplementary character must be represented using a > single escape", "𣎴", > - StringEscapeUtils.escapeXml("\uD84C;\uDFB4;")); > + CharSequenceTranslator escapeXml = > + StringEscapeUtils.ESCAPE_XML.with( > + UnicodeEscaper.between(0x7f, Integer.MAX_VALUE) ); > + > + assertEquals("Supplementary character must be represented using a > single escape", "\u233B4", > + escapeXml.translate("\uD84C\uDFB4")); > } > > // Tests issue #38569 > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
