Re: [HACKERS] PATCH: Allow empty targets in unaccent dictionary

Abhijit Menon-Sen Mon, 30 Jun 2014 13:11:37 -0700

At 2014-06-30 15:19:17 -0400, [email protected] wrote:
>
> Anyway, this raises the question of whether the current patch is
> actually a desirable way to do things, or whether it would be better
> if the unaccenting rules were like "base-char accent-char" ->
> "base-char".


It might be useful to be able to write such rules, but it would be
highly impractical to do so instead of being able to single out
accent-chars for removal.

In all the languages I'm familiar with that use such accent-chars, any
accent-char would form a valid combination with nearly every base-char,
unlike European languages where you don't have to worry about k-umlaut,
say. Also, a standalone accent-char would always be meaningless.

(These accent-chars don't actually exist independently in the syllabary
that a Hindi speaker might learn in school: they're combining forms of
vowels and are treated differently from characters in practice.)

> Also, if there are any contexts where the right translation of an
> accent-char depends on the base-char, you couldn't do it with the
> patch as it stands.

I can't think of a satisfactory example at the moment, but that sounds
entirely plausible.

> It's not unlikely that we want this patch *and* an improvement that
> allows multi-character src strings

I think it's enough to apply just this patch, but I wouldn't object to
doing both if it were easy. It's not clear to me if that's true after a
quick glance at the code, but I'll look again when I'm properly awake.

> Lastly, I didn't especially like the coding details of either proposed
> patch, and rewrote it as attached.

:-)

-- Abhijit


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Allow empty targets in unaccent dictionary

Reply via email to