Re: [Harbour] Stripping accents from a string.

Viktor Szakáts Fri, 30 Jan 2009 07:21:35 -0800

>
> This should be implemented in different way.
> We will need global unicode fallback table which will work for any CP
> looking for corresponding character replacement in the destination CP
> so it will be enough to make translation between any used CP and ASCII
> CP (we will have to introduce such unicode table where all characters
> which are not in range 32 <= x < 127 will be mapped to 0). Such fall
> back table will work also with multibyte translations or if necessary
> will replace single byte by multibyte phonetic sequence. f.e. personally
> I'm using such feature translation texts in Cyrillic to Latin characters.
>
> Meanwhile if you need such functionality then you can simply introduce
> new CPs which will have only ASCII characters for given langauge with
> some sufic like NONE or ASCII, f.e.:
>   "PLNONE"
>      [...]
>      "AABCCDEEFGHIJKLLMNNOOPQRSSTUVWXYZZZ",
>      "aabccdeefghijkllmnnoopqrsstuvwxyzzz",
>      [...]
>
> and then to strip accented characters you can make translations between
>   "PL*" -> "PLNONE"


The same you can make for any other languages.


I'm doing this locally in a slightly different way. It needs two
simple functions and some minimal high level management.
Good but not generic and torn off from core CP handling.

The question is really how to implement this properly as
part of Harbour.

Adding cp??asc?.c (or cp??non?.c) file for all languages
seems like not the most optimal solution, albeit indeed
this would solve it in the less intrusive way as part of core.


> The other solution is creating map from unicode table to Latin letters
> and then using this map for translations. This can be done as separate
> function and will also work for all langaues.
> It could be table indexed by unicode U16 value with ASCII characters
> or (if you want to introduce multibyte translations for languages which
> do not have corresponding unaccented single characters in Latin alphabet)
> with strings.


Probably that U16 table will have to point to a structure which
would hold all these information amongst the others.


> It will be limited to ASCII conversions fallback table I want to introduce.


Okay, I won't change anything then, what you say is obviously better,
but needs revamping the whole CP code, which goes beyond my
scope. Anyhow let's keep this feature in evidence when touching the
CP subsystem, because this feature seems to fit here the best and its
not very easy to replicate locally in a proper way, yet many developers
could benefit from it.

Thanks a lot for your feedback.

Brgds,
Viktor

_______________________________________________
Harbour mailing list
Harbour@harbour-project.org
http://lists.harbour-project.org/mailman/listinfo/harbour

Re: [Harbour] Stripping accents from a string.

Reply via email to