[KBibTeX] [Bug 426856] File encoding is not always stored

Thomas Fischer Tue, 21 Nov 2023 12:36:03 -0800

https://bugs.kde.org/show_bug.cgi?id=426856


Thomas Fischer <fisc...@unix-ag.uni-kl.de> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
      Latest Commit|https://invent.kde.org/offi |4c2d0c3acdcfea0ab263a5ba688
                   |ce/kbibtex/commit/1e649222e |0ed22ebe7a6d8
                   |d54060eb561fcc5b70568ba7f60 |
                   |98fb                        |
   Version Fixed In|0.10                        |0.10.1

--- Comment #9 from Thomas Fischer <fisc...@unix-ag.uni-kl.de> ---
Comment #7 has the important point here: probably every of your BibTeX files
where you reported that KBibTeX switched from LaTeX encoding to UTF-8 was due
to KBibTeX not knowing how to map an Unicode character to a LaTeX equivalent.
Thus is was falling back to UTF-8 encoding in order to preserve the data, i.e.
it is a feature, not a bug ;-)
This is for two reasons: First, the mapping is manually crafted and simply does
not cover the thousands of characters and symbols that are in use. Second, for
some symbols, no clear mapping is possible. One particular example is the Greek
letter mu. Unicode knows U+00B5 (micro sign), U+03BC (Greek small letter mu),
U+1D6CD (Mathematical bold small mu), and possibly others. On the LaTeX side
you have \mu (in math mode), \textmu, \upmu, \muup, \textmugreek, and possibly
others.

Anyhow, I added U+00A0, U+2010, and U+202F to the manual mapping, as those were
mentioned earlier and seem most pressing. U+2010 and U+202F are
"unidirectional", i.e. they will be mapped to a simple ASCII dash/minus/hyphen
and '\,', respectively, and when again encoded to UTF-8 will stay ASCII minus
or become U+2009, respectively.
If you want to add more manual mappings or update existing ones, please let me
know, e.g. by commenting in this bug report and providing both Unicode number
and corresponding LaTeX command.
The manual mapping is coded in src/io/encoderlatex.cpp, in case you want to
look at the technical details.

(In reply to nobodyinperson from comment #7)
> Thanks for bringing this up again. It's currently also a big pain point for
> me. It seems that KBibTeX v0.10.0 doesn't how to encode some Unicode
> characters (non-breaking spaces, weird dashes, etc.) to LaTeX. Ran from the
> terminal, these are the errors for me (a location in the file would be
> helpful):
> 
> ```bash
> kbibtex.io: Don't know how to encode Unicode char "0x00a0"                  
> 
> kbibtex.io: Don't know how to encode Unicode char "0x2010"  
> kbibtex.io: Don't know how to encode Unicode char "0x2010"                  
> 
> kbibtex.io: Don't know how to encode Unicode char "0x202f"                  
> 
> ```
> 
> When I find-replace those characters in the file (in vim, do   
> `:%s/\%u202f/ /g`    and `%s/\%u00a0/-/g`  etc.), then KBibTeX is finally
> stable when saving the encoding again and stays at LaTeX encoding. 😮‍💨

-- 
You are receiving this mail because:
You are watching all bug changes.

[KBibTeX] [Bug 426856] File encoding is not always stored

Reply via email to