At 2014-06-30 15:19:17 -0400, t...@sss.pgh.pa.us wrote: > > Anyway, this raises the question of whether the current patch is > actually a desirable way to do things, or whether it would be better > if the unaccenting rules were like "base-char accent-char" -> > "base-char".
It might be useful to be able to write such rules, but it would be highly impractical to do so instead of being able to single out accent-chars for removal. In all the languages I'm familiar with that use such accent-chars, any accent-char would form a valid combination with nearly every base-char, unlike European languages where you don't have to worry about k-umlaut, say. Also, a standalone accent-char would always be meaningless. (These accent-chars don't actually exist independently in the syllabary that a Hindi speaker might learn in school: they're combining forms of vowels and are treated differently from characters in practice.) > Also, if there are any contexts where the right translation of an > accent-char depends on the base-char, you couldn't do it with the > patch as it stands. I can't think of a satisfactory example at the moment, but that sounds entirely plausible. > It's not unlikely that we want this patch *and* an improvement that > allows multi-character src strings I think it's enough to apply just this patch, but I wouldn't object to doing both if it were easy. It's not clear to me if that's true after a quick glance at the code, but I'll look again when I'm properly awake. > Lastly, I didn't especially like the coding details of either proposed > patch, and rewrote it as attached. :-) -- Abhijit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers