> On 19 Nov 2016, at 15:38, Rob Tompkins <[email protected]> wrote: > > >> On Nov 19, 2016, at 6:33 AM, Benedikt Ritter <[email protected]> wrote: >> >> Hello Gray, >> >> Gary Gregory <[email protected]> schrieb am Sa., 19. Nov. 2016 um >> 01:07 Uhr: >> >>> Just a thought: >>> >>> Does all the current (and future) string escaping code (XML, HTML, ...) >>> really belong in [lang]? Would it be more natural to have it in [text]? >>> >> >> My view on the whole think currently is, that we put stuff that is related >> to strings in Lang. Code that works on texts should go to Text. To me a >> text is more than just a string. A text contains works, that make up >> sentences, which in turn build paragraphs. >> >> Using this description, I'd argue that escaping belongs into lang and not >> into text, because it works on individual characters rather than on texts. > > I think this is a difficult distinction to draw because fundamentally > anything that does sufficient text processing necessarily operates on a > character by character basis. I propose below a distinction more along the > lines of potential usage. > >> >> But this would also raise the question if the various edit distance >> algorithms works on texts or on strings. So maybe my distinction is not >> good at all. >> >> Do we need to better specify the scope of text? > > I definitely agree with the sentiment that we should find a clear line of > distinction between lang and text with regards to strings. Some thoughts that > spring to mind are more in the terms of how the algorithms are to be used. > > So let’s consider the two extremes of the spectrum of string/word/text > algorithms. On one hand, we have utilities like “StringUtils.isBlank(String > s)” which is ubiquitously used in standard day to day and is a foundational > extension of java. On the other hand, we have algorithms like natural > language processing or statistical processing of words for analysis of > biological sequences (two chapters in M. Lothaire’s “Applied Combinatorics on > Words). The extremes seem to point towards day-to-day usage in any variety of > java applications, where as the other extreme seems to point to an > application that is specifically designed at string/word/text processing. I > don’t see folks in everyday usage wanting to find edit distance between two > strings unless they’re writing something specifically doing text processing > or something of that nature. > > Now clearly the problem with this distinction is the amount of grey area that > it leaves in figuring out what goes where, so I don’t know if it’s the right > way to go. It was just the thought that came to mind. > > Any thoughts out there?
I think you're on the right track here. Lang is supposed to plug the gaps in Java's core packages. A certain amount of text manipulation is expected in many applications, but once we get into the realms of statistical analysis or fuzzy comparison methods then we've moved beyond that. Perhaps a tongue-in-cheek definition of "if you had to consult a book to write that, it belongs in Text". Duncan > > Cheers, > -Rob > >> >> Benedikt >> >> >>> >>> Gary >>> >>> -- >>> E-Mail: [email protected] | [email protected] >>> Java Persistence with Hibernate, Second Edition >>> < >>> https://www.amazon.com/gp/product/1617290459/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=1617290459&linkCode=as2&tag=garygregory-20&linkId=cadb800f39946ec62ea2b1af9fe6a2b8 >>>> >>> >>> <http://// >>> ir-na.amazon-adsystem.com/e/ir?t=garygregory-20&l=am2&o=1&a=1617290459> >>> JUnit in Action, Second Edition >>> < >>> https://www.amazon.com/gp/product/1935182021/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=1935182021&linkCode=as2&tag=garygregory-20&linkId=31ecd1f6b6d1eaf8886ac902a24de418%22 >>>> >>> >>> <http://// >>> ir-na.amazon-adsystem.com/e/ir?t=garygregory-20&l=am2&o=1&a=1935182021> >>> Spring Batch in Action >>> < >>> https://www.amazon.com/gp/product/1935182951/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=1935182951&linkCode=%7B%7BlinkCode%7D%7D&tag=garygregory-20&linkId=%7B%7Blink_id%7D%7D%22%3ESpring+Batch+in+Action >>>> >>> <http://// >>> ir-na.amazon-adsystem.com/e/ir?t=garygregory-20&l=am2&o=1&a=1935182951> >>> Blog: http://garygregory.wordpress.com >>> Home: http://garygregory.com/ >>> Tweet! http://twitter.com/GaryGregory >>> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
