> On Nov 19, 2016, at 6:33 AM, Benedikt Ritter <brit...@apache.org> wrote: > > Hello Gray, > > Gary Gregory <garydgreg...@gmail.com> schrieb am Sa., 19. Nov. 2016 um > 01:07 Uhr: > >> Just a thought: >> >> Does all the current (and future) string escaping code (XML, HTML, ...) >> really belong in [lang]? Would it be more natural to have it in [text]? >> > > My view on the whole think currently is, that we put stuff that is related > to strings in Lang. Code that works on texts should go to Text. To me a > text is more than just a string. A text contains works, that make up > sentences, which in turn build paragraphs. > > Using this description, I'd argue that escaping belongs into lang and not > into text, because it works on individual characters rather than on texts.
I think this is a difficult distinction to draw because fundamentally anything that does sufficient text processing necessarily operates on a character by character basis. I propose below a distinction more along the lines of potential usage. > > But this would also raise the question if the various edit distance > algorithms works on texts or on strings. So maybe my distinction is not > good at all. > > Do we need to better specify the scope of text? I definitely agree with the sentiment that we should find a clear line of distinction between lang and text with regards to strings. Some thoughts that spring to mind are more in the terms of how the algorithms are to be used. So let’s consider the two extremes of the spectrum of string/word/text algorithms. On one hand, we have utilities like “StringUtils.isBlank(String s)” which is ubiquitously used in standard day to day and is a foundational extension of java. On the other hand, we have algorithms like natural language processing or statistical processing of words for analysis of biological sequences (two chapters in M. Lothaire’s “Applied Combinatorics on Words). The extremes seem to point towards day-to-day usage in any variety of java applications, where as the other extreme seems to point to an application that is specifically designed at string/word/text processing. I don’t see folks in everyday usage wanting to find edit distance between two strings unless they’re writing something specifically doing text processing or something of that nature. Now clearly the problem with this distinction is the amount of grey area that it leaves in figuring out what goes where, so I don’t know if it’s the right way to go. It was just the thought that came to mind. Any thoughts out there? Cheers, -Rob > > Benedikt > > >> >> Gary >> >> -- >> E-Mail: garydgreg...@gmail.com | ggreg...@apache.org >> Java Persistence with Hibernate, Second Edition >> < >> https://www.amazon.com/gp/product/1617290459/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=1617290459&linkCode=as2&tag=garygregory-20&linkId=cadb800f39946ec62ea2b1af9fe6a2b8 >>> >> >> <http://// >> ir-na.amazon-adsystem.com/e/ir?t=garygregory-20&l=am2&o=1&a=1617290459> >> JUnit in Action, Second Edition >> < >> https://www.amazon.com/gp/product/1935182021/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=1935182021&linkCode=as2&tag=garygregory-20&linkId=31ecd1f6b6d1eaf8886ac902a24de418%22 >>> >> >> <http://// >> ir-na.amazon-adsystem.com/e/ir?t=garygregory-20&l=am2&o=1&a=1935182021> >> Spring Batch in Action >> < >> https://www.amazon.com/gp/product/1935182951/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=1935182951&linkCode=%7B%7BlinkCode%7D%7D&tag=garygregory-20&linkId=%7B%7Blink_id%7D%7D%22%3ESpring+Batch+in+Action >>> >> <http://// >> ir-na.amazon-adsystem.com/e/ir?t=garygregory-20&l=am2&o=1&a=1935182951> >> Blog: http://garygregory.wordpress.com >> Home: http://garygregory.com/ >> Tweet! http://twitter.com/GaryGregory >> --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org