Re: normalization problem with `@anchor` targets

Gavin Smith Tue, 11 Feb 2025 14:07:28 -0800

On Fri, Feb 07, 2025 at 02:10:57PM +0000, Werner LEMBERG wrote:
> 
> >> My initial recalling was that the transliteration in file name was
> >> Karl demand.  [...]
> > 
> > It makes sense for Ukrainian or other non-Latin alphabet languages
> > but not so much for French or German, in my opinion.
> > 
> > German speakers tend to protest, in my experience, if the umlauts
> > are dropped from ü, ä and ö, preferring "ue", "ae" and "oe" to "u",
> > "a" and "o".
> > 
> > Likewise, Finnish speakers strongly protest if "ä" is rendered as
> > "ae".
> 
> And Japanese people are probably even offended because, as it
> currently happens, almost all Japanese characters are represented with
> Chinese syllables...


I didn't appreciate the point you were making initially and thought
you were making some joke about the origin of the Japanese writing
system.  However, I realise that was not the case.  It's the issue
discussed in the Text::Unidecode documentation:

  Another example: for hanzi/kanji/hanja, I have designed Unidecode to
  transliterate according to the value that that character has in Mandarin
  (otherwise Cantonese,...). Some users have complained that applying
  Unidecode to Japanese produces gibberish.

https://metacpan.org/pod/Text::Unidecode

So it seems to me that this transliteration produces bad results in many
cases and should be turned off by default.  Users could at least understand
why sequences like __00f6 were occuring in output file names, and should
have the option to either use transliteration or UTF-8 file names depending
on their preferences.

Re: normalization problem with `@anchor` targets

Reply via email to