On Sat, Feb 1, 2014 at 9:12 AM, Benedikt Ritter <brit...@apache.org> wrote:
> Hi, > > right now we have the following methods in StringEscapeUtils: > > escapeXml(String > escapeHtml3(String) > escapeHtml4(String) > > These methods only escape the basic xml/html entities, though they may > produce invalid XML/HTML. LANG-955 [1] proposes to add new methods that > only produce valid XML, they should throw an exception if a character is > encountered that cannot be displayed in XML (not even by escaping). > How does that the problem mentioned earlier on the ML of needing valid XML no matter what the input? There are several tasks for the API(s): - Escaping (implied by the API name) - Dealing with non-XML chars: o Strip, or o Throw exception The simplest solution using today's style would be: escapeXml10(String text, boolean strip) escapeXml11(String text, boolean strip) strip true - strips strip false - throws exception What I am not sure on is why you would want an exception or what you'd do with it. Are these 'bad chars' embeddable in a CDATA? If so, strip false makes sense because we really cannot handle the text. But what would the app then do with the exception? I am not sure that I want the extra logic. Presumably, if I am not using JAXB then I am doing my own "looser" XML IO, so I need to escape content... I wonder what JAXB does here... > > Since the set of valid characters differs between XML 1.0 and XML 1.1, we > need two methods: > > escapeXml_1_0(String) > escapeXml_1_1(String) > Yuck! Underscores are of last resort. Simple alternatives escapeXml10 escapeXml11 escapeXmlV10 escapeXmlV11 Until we get to XML version 10, this will be fine. Precise alternatives: escapeXml10_20081126 (the W3C REC for XML 1.0 *5th edition* is is http://www.w3.org/TR/2008/REC-xml-20081126/) escapeXml10_20060816 (the W3C REC for XML 1.0 *4th edition* is is http://www.w3.org/TR/2008/REC-xml-20060816/) escapeXml10_20040204 (the W3C REC for XML 1.0 *3th edition* is is http://www.w3.org/TR/2008/REC-xml-20040204/) Or use a "E" or "e" for Edition instead of _ escapeXml10E20081126 escapeXml10e20081126 Each edition may have several versions BTW. > > To clarify the behavior of the old method I've created LANG-963 [2]. The > idea is to rename escapeXml(String) to escapeXmlEntities(String) and > deprecate the old method. > > Now I'm tempted to rename the HTML counterparts as well leading to either > of the following: > > escapeHtml3Entities(String) > escapeHtml4Entities(String) > > or: > > escapeHtml_3_Entities(String) > escapeHtml_4_Entities(String) > > or: > > escapeHtml_3_0_Entities(String) > escapeHtml_4_0_Entities(String) > > I find neither of the three very appealing, but for code symmetry we should > change this as well. Which one would you prefer? > > Benedikt > > P.S.: I'm planning to redesign great parts of the API. The "static util" > pattern is out dated and it is better to encode the information we're > trying to express here via fluent API. My proposal for lang 4.0 would be: > > StringEscaping.escape(str).with(Escaping.HTML_4_0) > StringEscaping.escape(str).with(Escaping.XML_ENTITIES) > Gross, don't force an API style on me, Java is verbose enough as it is. For those in love with fluent APIs, you can provide an separate code path I suppose. I'd rather not deal with it for low level util call sites. I am not building an object model here. Now that Java 8 lambdas are here, the style will change again. > > This way we don't have to encode everything into method names. You still can use parameters. But first we need to decide on strip/exception policies. Gary > I've created > LANG-964 [3] for this. > > [1] https://issues.apache.org/jira/browse/LANG-955 > [2] https://issues.apache.org/jira/browse/LANG-963 > [3] https://issues.apache.org/jira/browse/LANG-964 > > -- > http://people.apache.org/~britter/ > http://www.systemoutprintln.de/ > http://twitter.com/BenediktRitter > http://github.com/britter > -- E-Mail: garydgreg...@gmail.com | ggreg...@apache.org Java Persistence with Hibernate, Second Edition<http://www.manning.com/bauer3/> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/> Spring Batch in Action <http://www.manning.com/templier/> Blog: http://garygregory.wordpress.com Home: http://garygregory.com/ Tweet! http://twitter.com/GaryGregory