> On Nov 19, 2016, at 6:33 AM, Benedikt Ritter <brit...@apache.org> wrote:
> 
> Hello Gray,
> 
> Gary Gregory <garydgreg...@gmail.com> schrieb am Sa., 19. Nov. 2016 um
> 01:07 Uhr:
> 
>> Just a thought:
>> 
>> Does all the current (and future) string escaping code (XML, HTML, ...)
>> really belong in [lang]? Would it be more natural to have it in [text]?
>> 
> 
> My view on the whole think currently is, that we put stuff that is related
> to strings in Lang. Code that works on texts should go to Text. To me a
> text is more than just a string. A text contains works, that make up
> sentences, which in turn build paragraphs.
> 
> Using this description, I'd argue that escaping belongs into lang and not
> into text, because it works on individual characters rather than on texts.

I think this is a difficult distinction to draw because fundamentally anything 
that does sufficient text processing necessarily operates on a character by 
character basis. I propose below a distinction more along the lines of 
potential usage.

> 
> But this would also raise the question if the various edit distance
> algorithms works on texts or on strings. So maybe my distinction is not
> good at all.
> 
> Do we need to better specify the scope of text?

I definitely agree with the sentiment that we should find a clear line of 
distinction between lang and text with regards to strings. Some thoughts that 
spring to mind are more in the terms of how the algorithms are to be used. 

So let’s consider the two extremes of the spectrum of string/word/text 
algorithms. On one hand, we have utilities like “StringUtils.isBlank(String s)” 
which is ubiquitously used in standard day to day and is a foundational 
extension of java. On the other hand, we have algorithms like natural language 
processing or statistical processing of words for analysis of biological 
sequences (two chapters in M. Lothaire’s “Applied Combinatorics on Words). The 
extremes seem to point towards day-to-day usage in any variety of java 
applications, where as the other extreme seems to point to an application that 
is specifically designed at string/word/text processing. I don’t see folks in 
everyday usage wanting to find edit distance between two strings unless they’re 
writing something specifically doing text processing or something of that 
nature.

Now clearly the problem with this distinction is the amount of grey area that 
it leaves in figuring out what goes where, so I don’t know if it’s the right 
way to go. It was just the thought that came to mind.

Any thoughts out there?

Cheers,
-Rob

> 
> Benedikt
> 
> 
>> 
>> Gary
>> 
>> --
>> E-Mail: garydgreg...@gmail.com | ggreg...@apache.org
>> Java Persistence with Hibernate, Second Edition
>> <
>> https://www.amazon.com/gp/product/1617290459/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=1617290459&linkCode=as2&tag=garygregory-20&linkId=cadb800f39946ec62ea2b1af9fe6a2b8
>>> 
>> 
>> <http:////
>> ir-na.amazon-adsystem.com/e/ir?t=garygregory-20&l=am2&o=1&a=1617290459>
>> JUnit in Action, Second Edition
>> <
>> https://www.amazon.com/gp/product/1935182021/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=1935182021&linkCode=as2&tag=garygregory-20&linkId=31ecd1f6b6d1eaf8886ac902a24de418%22
>>> 
>> 
>> <http:////
>> ir-na.amazon-adsystem.com/e/ir?t=garygregory-20&l=am2&o=1&a=1935182021>
>> Spring Batch in Action
>> <
>> https://www.amazon.com/gp/product/1935182951/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=1935182951&linkCode=%7B%7BlinkCode%7D%7D&tag=garygregory-20&linkId=%7B%7Blink_id%7D%7D%22%3ESpring+Batch+in+Action
>>> 
>> <http:////
>> ir-na.amazon-adsystem.com/e/ir?t=garygregory-20&l=am2&o=1&a=1935182951>
>> Blog: http://garygregory.wordpress.com
>> Home: http://garygregory.com/
>> Tweet! http://twitter.com/GaryGregory
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to