Re: [patch for 2.2] silence iconv warnings

Cyrille Artho Wed, 09 Apr 2014 16:05:23 -0700

Usually given names are not in a language dictionary, although many(translation) services have separate dictionaries for proper/given names.


We have two problems here:

(1) Language: I think most users are OK with proper names not beingaccepted by the spell checker (before learning them). However, otheroptions such as "Ignore" should work, too.

(2) Encoding: Words having characters that are not part of the normalcharacter set in a given language, should behave in the same way as wordsthat are. This includes "István", "Vološinov", etc. So we have to use UTF-8to look up words.

When down-converting text to the character set of the target language, wecan ignore non-convertible characters silently, but


        echo 'István' | iconv -c -f utf-8 -t ascii

yields "Istvn", which is not very useful.

I think we have to use Unicode for all the given operations and (a) eitherrisk a mismatch for each word that is not learned/ignored, or (b)up-convert words in the dictionary before they are matched. The lattersolution implies that the dictionary tool supports this; does anyone knowif that is the case (for at least one tool)?


This is mixing languages with writing systems, IMHO. In fact language
sometimes has an implication on the spelling of names (if it comes to
transliteration), but with rather surpring effects. For instance, the
Russian name Воло́шинов is usually written Vološinov in German, but
Voloshinov in English. Is "š" a "German" character?


I'm not a linguist and my knowledge about these things is limited. The
change of language is the only possibility I know of to get out of the
"broken" dictionary encoding scenario.

Also, I think that marking István as "Hungarian" absurds the language
concept.

More technically, I think it will be irritating for users that they
can add "István" to the personal dictionary, while "Ignore" and
"Ignore all" just won't work.


Yes, I agree.

With the given example "István" and having á in the dictionary encoding
the word is most probably mark as misspelled. But then it's possible to
Ignore it? Isn't there the option to discard the characters that cannot
be converted silently or replace them with something similar for the
dictionary lookup? Not quite correct, I know - but perhaps the better
strategy for the user?

Stephan


--
Regards,
Cyrille Artho - http://artho.com/
Perilous to all of us are the devices of an art deeper than we
ourselves possess.
                -- Gandalf the Grey [Tolkien, "Lord of the Rings"]

Re: [patch for 2.2] silence iconv warnings

Reply via email to