> On 21 May 2017, at 19:43, Gary Gregory <garydgreg...@gmail.com> wrote:
> 
> Pardon the obvious but what is missing from methods like
> https://docs.oracle.com/javase/7/docs/api/java/lang/Character.html#isLowerCase(char)
> 
> Gary


The WordUtils methods turn sentences into title case, which Java’s core 
libraries don’t offer. In fact, the core libraries make doing locale-sensitive 
title case conversions very difficult (see 
http://stackoverflow.com/questions/7360996/unicode-correct-title-case-in-java 
for example).

Doing title casing correctly is quite a subtle art. We don’t even do it 
correctly for English at the moment, which would normally capitalise “The Life 
of Reilly” rather than “The Life Of Reilly”. Other languages have completely 
different conventions or additional complexities.


> 
> On May 21, 2017 5:06 AM, "Duncan Jones" <dun...@wortharead.com> wrote:
> 
>> Hi everyone,
>> 
>> I’ve found some time to continue breaking WordUtils into separate classes
>> (eschewing the “big collection of static methods” approach). However, as I
>> read more about case handling in Unicode, I realise how simplistic the
>> WordUtils methods are and how complex a full solution would need to be.
>> 
>> Section 5.18 of the Unicode specification [1] describes these
>> complexities. The mains ones that bother me are:
>> 
>> 1. Title case conversions vary widely between different locales and
>> languages. I’m not clear whether any locale is satisfied by the current
>> simplistic implementation in WordUtils.capitalize(str). Supporting this
>> correctly would be a serious challenge.
>> 
>> 2. All types of case conversion may vary depending upon context/locale.
>> There are examples provided in [1] where the outcome is different in a
>> Turkish locale or if the letter in question is followed by another or not.
>> 
>> Does anyone have a suggestion for how to move forward with this work? I
>> see three options: 1] Admit defeat and avoid the case conversion mess
>> entirely. 2] Mimic the existing functionality, but document the
>> limitations. 3] Attempt to deliver a locale-dependent version, perhaps
>> still with limitations (or for certain languages).
>> 
>> I’m leaning towards 2, perhaps even calling the classes “SimpleX…”.
>> 
>> Thanks,
>> Duncan
>> 
>> 
>> [1] http://www.unicode.org/versions/Unicode9.0.0/ch05.pdf
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
>> For additional commands, e-mail: dev-h...@commons.apache.org
>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to