On May 21, 2017 10:56 PM, "Duncan Jones" <dun...@wortharead.com> wrote:
> On 21 May 2017, at 19:43, Gary Gregory <garydgreg...@gmail.com> wrote: > > Pardon the obvious but what is missing from methods like > https://docs.oracle.com/javase/7/docs/api/java/lang/ Character.html#isLowerCase(char) > > Gary The WordUtils methods turn sentences into title case, which Java’s core libraries don’t offer. In fact, the core libraries make doing locale-sensitive title case conversions very difficult (see http://stackoverflow.com/questions/7360996/unicode- correct-title-case-in-java for example). Doing title casing correctly is quite a subtle art. We don’t even do it correctly for English at the moment, which would normally capitalise “The Life of Reilly” rather than “The Life Of Reilly”. Other languages have completely different conventions or additional complexities. I see. So the hard part is coming up with the rules. Aside from that I could see creating an instance of a class "TitleCaseConverter" or some such with a Locale through a factory method. The factory can decide whether or not to create a Locale specific subclass. Maybe there are general rules that could be implemented in the parent class or even driven of a locale specific properties file... TBD ;-) Gary > > On May 21, 2017 5:06 AM, "Duncan Jones" <dun...@wortharead.com> wrote: > >> Hi everyone, >> >> I’ve found some time to continue breaking WordUtils into separate classes >> (eschewing the “big collection of static methods” approach). However, as I >> read more about case handling in Unicode, I realise how simplistic the >> WordUtils methods are and how complex a full solution would need to be. >> >> Section 5.18 of the Unicode specification [1] describes these >> complexities. The mains ones that bother me are: >> >> 1. Title case conversions vary widely between different locales and >> languages. I’m not clear whether any locale is satisfied by the current >> simplistic implementation in WordUtils.capitalize(str). Supporting this >> correctly would be a serious challenge. >> >> 2. All types of case conversion may vary depending upon context/locale. >> There are examples provided in [1] where the outcome is different in a >> Turkish locale or if the letter in question is followed by another or not. >> >> Does anyone have a suggestion for how to move forward with this work? I >> see three options: 1] Admit defeat and avoid the case conversion mess >> entirely. 2] Mimic the existing functionality, but document the >> limitations. 3] Attempt to deliver a locale-dependent version, perhaps >> still with limitations (or for certain languages). >> >> I’m leaning towards 2, perhaps even calling the classes “SimpleX…”. >> >> Thanks, >> Duncan >> >> >> [1] http://www.unicode.org/versions/Unicode9.0.0/ch05.pdf >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >> For additional commands, e-mail: dev-h...@commons.apache.org >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org