Update of bug #66287 (group groff): Status: Need Info => None Assigned to: barx => None
_______________________________________________________ Follow-up Comment #2: [comment #1 comment #1:] > [comment #0 original submission:] > > Unfortunately, preconv looks only at the first two lines of a file for encoding information. > > Only if the file isn't seekable... preconv looks at 0 lines if the file isn't seekable, and 2 lines if it is. Per its man page: "If the input stream is seekable, check the first two input lines for a GNU Emacs file-local variable identifying the character encoding." Under no circumstances will preconv find the tag if it appears after the first two lines. > `preconv` is a preprocessor.... For it to behave as you desire, I don't desire any change in preconv. I merely desire to change shipped groff files to give preconv a greater chance of getting the encoding right. > I think the status quo is the best we can do for shipped files > without heavily refactoring preconv and potentially doing > violence to the pipeline/filter concept. Putting the "coding:" tag in the first two lines, where preconv will find it, is a small change to two shipped files and no executables. > Hmm, can't reproduce a problem here with _groff_ 1.23.0 or Git HEAD. Ah, probably you have a uchardet library, which is preconv's next step after checking the first two lines for an encoding tag. > Can you do some experiments with `preconv -d` and see what it says? Sure. On a UTF-8 terminal, absent uchardet, preconv guesses the wrong encoding for groff_mmse.7.man: $ fgrep 'coding: ' contrib/mm/groff_mmse.7.man .\" coding: latin-1 $ echo $LC_CTYPE en_US.utf8 $ preconv -d contrib/mm/groff_mmse.7.man > /dev/null fallback encoding: 'UTF-8' processing 'contrib/mm/groff_mmse.7.man' no coding tag could not detect encoding with uchardet encoding used: 'UTF-8' incomplete UTF-8 sequence(s) in input stream: replacing each such sequence with 0xFFFD $ preconv --version GNU preconv (groff) version 1.23.0.1624-4d251-dirty with iconv support and without uchardet support And on a latin-1 terminal, it guesses the wrong encoding for meintro_fr.me.in: $ fgrep 'coding: ' doc/meintro_fr.me.in .\" coding: utf-8 $ echo $LC_CTYPE en_US.iso88591 $ preconv -d doc/meintro_fr.me.in > /dev/null fallback encoding: 'ISO-8859-1' processing 'doc/meintro_fr.me.in' no coding tag could not detect encoding with uchardet encoding used: 'ISO-8859-1' Putting the coding: tag at the tops of the files, following the examples of the two .mom files I cited, fixes both of these. _______________________________________________________ Reply to this item at: <https://savannah.gnu.org/bugs/?66287> _______________________________________________ Message sent via Savannah https://savannah.gnu.org/
signature.asc
Description: PGP signature