Hello Gary,
2014-02-01 Gary Gregory <[email protected]>: > On Sat, Feb 1, 2014 at 9:12 AM, Benedikt Ritter <[email protected]> > wrote: > > > Hi, > > > > right now we have the following methods in StringEscapeUtils: > > > > escapeXml(String > > escapeHtml3(String) > > escapeHtml4(String) > > > > These methods only escape the basic xml/html entities, though they may > > produce invalid XML/HTML. LANG-955 [1] proposes to add new methods that > > only produce valid XML, they should throw an exception if a character is > > encountered that cannot be displayed in XML (not even by escaping). > > > > How does that the problem mentioned earlier on the ML of needing valid XML > no matter what the input? > I don't understand that sentence, sorry :o) > > There are several tasks for the API(s): > > - Escaping (implied by the API name) > - Dealing with non-XML chars: > o Strip, or > o Throw exception > > The simplest solution using today's style would be: > > escapeXml10(String text, boolean strip) > escapeXml11(String text, boolean strip) > > strip true - strips > strip false - throws exception > A boolean flag that controls whether a method throws an exception or not? An exceptional situation is nothing that is configurable, imho. > > What I am not sure on is why you would want an exception or what you'd do > with it. > > Are these 'bad chars' embeddable in a CDATA? If so, strip false makes sense > because we really cannot handle the text. But what would the app then do > with the exception? I am not sure that I want the extra logic. Presumably, > if I am not using JAXB then I am doing my own "looser" XML IO, so I need to > escape content... I wonder what JAXB does here... > As far as I know there is no way to embed the characters into XML. But I may be wrong. I couldn't find something about this in the spec [1]. So maybe we should go with stripping? > > > > > > Since the set of valid characters differs between XML 1.0 and XML 1.1, we > > need two methods: > > > > escapeXml_1_0(String) > > escapeXml_1_1(String) > > > > Yuck! Underscores are of last resort. > > Simple alternatives > > escapeXml10 > escapeXml11 > escapeXmlV10 > escapeXmlV11 > > Until we get to XML version 10, this will be fine. > > Precise alternatives: > > escapeXml10_20081126 (the W3C REC for XML 1.0 *5th edition* is is > http://www.w3.org/TR/2008/REC-xml-20081126/) > escapeXml10_20060816 (the W3C REC for XML 1.0 *4th edition* is is > http://www.w3.org/TR/2008/REC-xml-20060816/) > escapeXml10_20040204 (the W3C REC for XML 1.0 *3th edition* is is > http://www.w3.org/TR/2008/REC-xml-20040204/) > > Or use a "E" or "e" for Edition instead of _ > escapeXml10E20081126 > escapeXml10e20081126 > > Each edition may have several versions BTW. > I guess we should keep it simple then and go with escapeXml10 and escapeXml11. > > > > > > To clarify the behavior of the old method I've created LANG-963 [2]. The > > idea is to rename escapeXml(String) to escapeXmlEntities(String) and > > deprecate the old method. > > > > Now I'm tempted to rename the HTML counterparts as well leading to either > > of the following: > > > > escapeHtml3Entities(String) > > escapeHtml4Entities(String) > > > > or: > > > > escapeHtml_3_Entities(String) > > escapeHtml_4_Entities(String) > > > > or: > > > > escapeHtml_3_0_Entities(String) > > escapeHtml_4_0_Entities(String) > > > > I find neither of the three very appealing, but for code symmetry we > should > > change this as well. Which one would you prefer? > > > > Benedikt > > > > P.S.: I'm planning to redesign great parts of the API. The "static util" > > pattern is out dated and it is better to encode the information we're > > trying to express here via fluent API. My proposal for lang 4.0 would be: > > > > StringEscaping.escape(str).with(Escaping.HTML_4_0) > > StringEscaping.escape(str).with(Escaping.XML_ENTITIES) > > > > Gross, don't force an API style on me, Java is verbose enough as it is. For > those in love with fluent APIs, you can provide an separate code path I > suppose. I'd rather not deal with it for low level util call sites. I am > not building an object model here. > > Now that Java 8 lambdas are here, the style will change again. > I don't see why SuperUtils.staticMethod(param1, param2, param3, param4) isn't forcing an API style, but doing things differently (or should I say "less 1998 style" ;-) is. But let's discuss this when the time for 4.0 comes. Benedikt [1] http://www.w3.org/TR/2006/REC-xml-20060816/#charsets > > > > > > This way we don't have to encode everything into method names. > > > You still can use parameters. But first we need to decide on > strip/exception policies. > > Gary > > > > > I've created > > LANG-964 [3] for this. > > > > [1] https://issues.apache.org/jira/browse/LANG-955 > > [2] https://issues.apache.org/jira/browse/LANG-963 > > [3] https://issues.apache.org/jira/browse/LANG-964 > > > > -- > > http://people.apache.org/~britter/ > > http://www.systemoutprintln.de/ > > http://twitter.com/BenediktRitter > > http://github.com/britter > > > > > > -- > E-Mail: [email protected] | [email protected] > Java Persistence with Hibernate, Second Edition< > http://www.manning.com/bauer3/> > JUnit in Action, Second Edition <http://www.manning.com/tahchiev/> > Spring Batch in Action <http://www.manning.com/templier/> > Blog: http://garygregory.wordpress.com > Home: http://garygregory.com/ > Tweet! http://twitter.com/GaryGregory > -- http://people.apache.org/~britter/ http://www.systemoutprintln.de/ http://twitter.com/BenediktRitter http://github.com/britter
