Hello Lazarus-List, Friday, December 3, 2010, 9:51:22 PM, you wrote:
>> Take a look over: "Why Applications Fail With The Turkish >> Language" at >> http://www.i18nguy.com/unicode/turkish-i18n.htm GF> There is no information on the language in a string, even not GF> in a Unicodestring. So it is impossible to react on this point GF> here. GF> The uppercase/lowercase tables have been generated purely on GF> the official Unicode-Character-Description. Characters having a GF> "SMALL" in their description are replaced by the one having GF> "CAPITAL" on that place and vice-versa. (only if the counterpart GF> exists) You can't do more on this level. Please feel free to GF> implement the functionality you mention, I'll be sure it will be GF> appreciated. I'm not trying to offed your work, just trying to ring a bell before somebody starts to complaint about different behavior in a system when using OS functions and when using native pascal ones. GF> We are Pascal, not C. And in Pascal NULL is a valid character. Once again, I'm not fighting against you. GF> Once again, I have taken most of this from LCLProc, but I GF> agree that improvements can be done here. But this was not the aim That's the reason I'm trying to let you note that there are some anomalies here and there in code that you are taking from other side. No more, no less. GF> On the other side there is a function called UTF8FixBroken to GF> take off invalid sequences and codepoints. But it is also not GF> perfect, because it is a C-style function. UTF8FixBroken is "broken" :) It fixes with spaces which is indeed wrong and it does not detect all broken strings, and also, yes, it is a NULL terminated string function :-? quite strange. If you want I can send you my code to normalize canonical strings if you wish to add it, but again it is a quite big table and country agnostic. -- Best regards, José -- _______________________________________________ Lazarus mailing list [email protected] http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
