Not sufficiently useful to include in commons. Chas
> On Feb 21, 2017, at 1:31 PM, Bhowmik, Bindul <bindulbhow...@gmail.com> wrote: > >> On Tue, Feb 21, 2017 at 7:55 AM, sebb <seb...@gmail.com> wrote: >>> On 21 February 2017 at 12:40, Rob Tompkins <chtom...@apache.org> wrote: >>> >>>> On Feb 21, 2017, at 6:02 AM, sebb <seb...@gmail.com> wrote: >>>> >>>> On 21 February 2017 at 04:40, Sampanna Kahu <sampy...@gmail.com >>>> <mailto:sampy...@gmail.com>> wrote: >>>>> Hi Guys, >>>>> Very good points are being made above. Please allow me to add my two cents >>>>> :-) >>>>> >>>>> What if the string contains syntactically valid HTML characters/tags and >>>>> our aim is to prevent rendering these tags in the browser when this string >>>>> is being served via a web application? Or prevent the execution of harmful >>>>> embedded scripts when serving it? The 'escapeOnce' method could be useful >>>>> here, right? >>>> >>>> I don't think so. >>>> >>>>> To explain better, let's consider an example of the specific use-case that >>>>> I had in mind when building the 'escapeOnce' method: >>>>> Consider the scenario of a simple restful web application where users can >>>>> manipulate their text using simple crud operations. Lets assume that we do >>>>> not have the 'escapeOnce' method yet. >>>>> 1. A user comes and submits his string. We escape it and store it in our >>>>> database. If the string had any HTML characters, they would have gotten >>>>> escaped. >>>>> >>>>> 2. After some time, the same user fetches his string, adds some more HTML >>>>> characters and submits it. At this point, although the escape method would >>>>> correctly escape the freshly added HTML characters, it would escape the >>>>> older escaped HTML characters again! (for example > would become >>>>> &gt;) >>>>> And this effect gets magnified if step number 2 above is repeated. >>>> >>>> Of course, that is my point. >>>> >>>> Also remember that you want to show the original string to the user. >>>> That's not possible in general if you use this approach. >>>> >>>> Suppose they originally entered >>>> >>>> "To code ampersand (&) in HTML, use '&'" >>>> >>>> Using escapeOnce, this would become: >>>> >>>> "To code ampersand (&) in HTML, use '&'" >>>> >>>> You can either show that directly to the user, or use an unescapeOnce >>>> and show them: >>>> >>>> "To code ampersand (&) in HTML, use '&'" > > I have had this use case in a project (enclosing XML/HTML content in a > XML stream) and the expected output for escapeOnce in this case would > be: > "To code ampersand (&) in HTML, use '&amp;'" > > And similarly unsecape once would generate back: > "To code ampersand (&) in HTML, use '&'" > > Just my two cents, as I have had to write this code. > >>>> >>>> Neither makes any sense. >>>> >>>>> How do we solve the above problem without the 'escapeOnce' method? >>>> >>>> Store the raw string in the database and escape it just before display. >>>> >>>> If you are using Javascript, then use an approach such as this to escape >>>> it: >>>> >>>> document.getElementById("whereItGoes").appendChild(document.createTextNode(unsafe_str)); >>>> >>>> See: >>>> >>>> http://shebang.brandonmintern.com/foolproof-html-escaping-in-javascript/ >>>> <http://shebang.brandonmintern.com/foolproof-html-escaping-in-javascript/> >>>> >>>> This has a good discussion of some of the problems. >>>> >>>> == >>>> >>>> Sorry, but it's not possible in general to do what you want, because >>>> one cannot reliably determine if a string has been escaped just from >>>> looking at the string. >>> >>> Another thought occurred to me (again despite potential lack of value). >>> >>> We should be able to quickly verify if there are any escape strings in the >>> string in question. A single application of unescape followed by checking >>> string equality with the original input would yield a predicate on the >>> existence of escape’s present in the input in question. >> >> Again, what does unescape mean in this context? >> Does it ignore incomplete escape sequences, or throw an error? >> >>> From there we could: (1) escape if no escapes were present in the original, >>> or (2) throw an exception if there were escapes present in the original >>> string. >>> Again, this feels contrived, so I’m not really suggesting that we add it. >>> I’m just playing with ideas here that could accomplish what Sampanna is >>> going for. >> >> The request is impossible to fulfill reliably, and does not deserve to >> be added to a Commons library. >> >> I don't know why this is still being discussed. >> >>> -Rob >>> >>>> >>>> The most one can do is to sanitise the string by escaping anything >>>> that is unescaped. >>>> However that process corrupts the input - a browser won't display the >>>> proper output. >>>> >>>>>> On 20 February 2017 at 21:40, sebb <seb...@gmail.com> wrote: >>>>>> >>>>>>> On 20 February 2017 at 15:36, Rob Tompkins <chtom...@apache.org> wrote: >>>>>>> >>>>>>>> On Feb 20, 2017, at 10:30 AM, sebb <seb...@gmail.com> wrote: >>>>>>>> >>>>>>>> On 20 February 2017 at 14:55, Rob Tompkins <chtom...@apache.org> wrote: >>>>>>>>> >>>>>>>>>> On Feb 20, 2017, at 4:31 AM, sebb <seb...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>> On 19 February 2017 at 14:29, Raymond DeCampo <r...@decampo.org >>>>>> <mailto:r...@decampo.org>> wrote: >>>>>>>>>>> I am trying to see how having the proposed unescape() method leads >>>>>> to an a >>>>>>>>>>> useful escape method. >>>>>>>>>>> >>>>>>>>>>> E.g. clearly unescape("&") would evaluate to "&". So would >>>>>>>>>>> unescape("&amp;"). That means the proposed escape() method >>>>>> would also >>>>>>>>>>> have the same output for "&" and "&amp;". >>>>>>>>>>> >>>>>>>>>>> I think a better approach for an idempotent escape would be to just >>>>>>>>>>> unescape the string once, and then run the traditional escape. >>>>>>>>>> >>>>>>>>>> That does not eliminate the problems, as you state below. >>>>>>>>>> >>>>>>>>>>> You will >>>>>>>>>>> still have issues if the user intended to escape the string "&" >>>>>> but you >>>>>>>>>>> are never going to crack that without some kind of state saving. >>>>>>>>>> >>>>>>>>>> That is my exact point. >>>>>>>>>> >>>>>>>>>> Since it's not possible for the function to work reliably, we should >>>>>>>>>> not mislead users by pretending that there is a magic method that >>>>>>>>>> works. >>>>>>>>>> >>>>>>>>>>> Than given that the functionality is available via to consecutive >>>>>> calls to >>>>>>>>>>> existing methods, I would probably be disinclined to include it in >>>>>> the >>>>>>>>>>> library. >>>>>>>>>> >>>>>>>>>> +1 >>>>>>>>> >>>>>>>>> I’m a (+1) for removal as well. >>>>>>>>> >>>>>>>>> Also, I didn’t mean for my example to sound like a proposal. I merely >>>>>> was trying to get to a potentially valuable stateless idempotent string >>>>>> escape function. Its contrivance it quite clear. >>>>>>>>> >>>>>>>>> Any other comments out there? >>>>>>>>> >>>>>>>>> We could provide a stateful escaper (that figures out how many escapes >>>>>> a string is in), or a method that returns the number of escapes in a >>>>>> string >>>>>> is. Again, I’m not all that sure on the value of such methods. >>>>>>>> >>>>>>>> I don't think it's possible to work out the number of times a string >>>>>>>> has been escaped. >>>>>>> >>>>>>> That may indeed be true, but it is possible to return the number of >>>>>> times unescape need be run before subsequent unescapes yield the same >>>>>> result. >>>>>> >>>>>> That in itself is potentially ambiguous. >>>>>> Does the unescaper keep going until there are no valid escape >>>>>> sequences left, or does it stop when there is a least one ampersand >>>>>> which is not part of a valid escape sequence? >>>>>> >>>>>>> Again, I’m not sure if this is a valuable measure to concern ourselves >>>>>> with. >>>>>> >>>>>> I don't think it provides anything useful. >>>>>> >>>>>>>> >>>>>>>> The most one can do is to determine if a string has not been escaped. >>>>>>>> That would be the case where a string has one or more unescaped >>>>>>>> characters in it. >>>>>>>> For example "This & that" has obviously not been escaped. >>>>>>>> >>>>>>>> However if a string has no un-escaped characters it it, that does not >>>>>>>> necessarily mean that it has already been escaped. >>>>>>>> For example: "This & that". >>>>>>>> This might have been escaped - or it might not. >>>>>>> >>>>>>> Ah, I was using the definition of “having been escaped” to be that the >>>>>> string contains escape sequences. >>>>>>> >>>>>>>> For example it could be the answer to: "How does one code 'This & >>>>>>>> that' in HTML?” >>>>>>>> >>>>>>>> The application has to keep track of the escape-state of the string. >>>>>>> >>>>>>> Definitely agreed with your definition of “having been escaped." >>>>>>> >>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> -Rob >>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Sat, Feb 18, 2017 at 12:04 PM, Rob Tompkins <chtom...@gmail.com> >>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> In preparation for the 1.0 release, I think we should address >>>>>>>>>>>> Sebb's >>>>>>>>>>>> concern in TEXT-40 about the attempt to create "idempotent" string >>>>>> escape >>>>>>>>>>>> methods. By idempotent I mean someMethod("some string") = >>>>>>>>>>>> someMethod(someMethod(someMethod(...someMethod("some string")))), a >>>>>>>>>>>> single application of a method is equal to any number of the >>>>>> applications >>>>>>>>>>>> of the method on the same input. >>>>>>>>>>>> >>>>>>>>>>>> Below I lay out a mechanism by which it is possible to write such >>>>>> methods, >>>>>>>>>>>> but I don’t know the value in writing such methods. I'm merely >>>>>> expressing >>>>>>>>>>>> that idempotency is a possibility. >>>>>>>>>>>> >>>>>>>>>>>> For string "un-escaping", I believe that we can write a method >>>>>>>>>>>> that, >>>>>>>>>>>> indeed, is idempotent by simply running the un-escape method the >>>>>> finite >>>>>>>>>>>> number of un-escapings to get to the point at which the string >>>>>> remains >>>>>>>>>>>> unchanged between applications of the un-escaping method. (I >>>>>> believe that I >>>>>>>>>>>> can write a proof that all un-escape methods have such a point, if >>>>>> that is >>>>>>>>>>>> needed for the sake of discussion). >>>>>>>>>>>> >>>>>>>>>>>> If indeed we can create an idempotent un-escape method, then we can >>>>>> simply >>>>>>>>>>>> take that method run it, and then run the escaping method one time. >>>>>> If we >>>>>>>>>>>> always completely unescape and then escape once then we do have an >>>>>>>>>>>> idempotent method. >>>>>>>>>>>> >>>>>>>>>>>> Such a method might not be all that valuable to the user though. >>>>>>>>>>>> Furthermore, this just explains one way to create such an >>>>>>>>>>>> idempotent >>>>>>>>>>>> method. Whether or not more or more valuable methods exists, would >>>>>> take >>>>>>>>>>>> some more though. >>>>>>>>>>>> >>>>>>>>>>>> Anyone have any thoughts? My feeling is that it might be more >>>>>> effort than >>>>>>>>>>>> it's worth to ensure that any string is only "singly encoded.” >>>>>> Further, we >>>>>>>>>>>> probably should give a look at the “escape_once” methods in >>>>>>>>>>>> StringEsapeUtils. >>>>>>>>>>>> >>>>>>>>>>>> Cheers >>>>>>>>>>>> -Rob >>>>>>>>>>>> ------------------------------------------------------------ >>>>>> --------- >>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >>>>>>>>>>>> For additional commands, e-mail: dev-h...@commons.apache.org >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org <mailto: >>>>>> dev-unsubscr...@commons.apache.org> >>>>>>>>>> For additional commands, e-mail: dev-h...@commons.apache.org <mailto: >>>>>> dev-h...@commons.apache.org> >>>>>>>> >>>>>>>> --------------------------------------------------------------------- >>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >>>>>>>> For additional commands, e-mail: dev-h...@commons.apache.org >>>>>>>> >>>>>>> >>>>>>> >>>>>>> --------------------------------------------------------------------- >>>>>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >>>>>>> For additional commands, e-mail: dev-h...@commons.apache.org >>>>>>> >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >>>>>> For additional commands, e-mail: dev-h...@commons.apache.org >>>>>> >>>>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >>>> <mailto:dev-unsubscr...@commons.apache.org> >>>> For additional commands, e-mail: dev-h...@commons.apache.org >>>> <mailto:dev-h...@commons.apache.org> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >> For additional commands, e-mail: dev-h...@commons.apache.org >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org