At 2024-10-03T18:35:37-0400, Dave wrote: > Follow-up Comment #2: > [comment #1 comment #1:] > > [comment #0 original submission:] > > > Unfortunately, preconv looks only at the first two lines of a file > > > for encoding information. > > > > Only if the file isn't seekable... > > preconv looks at 0 lines if the file isn't seekable, and 2 lines if it > is. Per its man page: "If the input stream is seekable, check the > first two input lines for a GNU Emacs file-local variable identifying > the character encoding." Under no circumstances will preconv find the > tag if it appears after the first two lines.
Hmm, right. Thanks for reminding me. I feel pulled in several directions lately... > I don't desire any change in preconv. I merely desire to change > shipped groff files to give preconv a greater chance of getting the > encoding right. This is fine if it doesn't fool Emacs into ignoring the local variables at the end of the file and making the overall file editing experience _worse_ for people who _do_ have uchardet installed. I reckon I'll test that. > Putting the "coding:" tag in the first two lines, where preconv will > find it, is a small change to two shipped files and no executables. > > > Hmm, can't reproduce a problem here with _groff_ 1.23.0 or Git HEAD. > > Ah, probably you have a uchardet library, which is preconv's next step > after checking the first two lines for an encoding tag. I assuredly do. > > Can you do some experiments with `preconv -d` and see what it says? > > Sure. On a UTF-8 terminal, absent uchardet, preconv guesses the wrong > encoding for groff_mmse.7.man: > > $ fgrep 'coding: ' contrib/mm/groff_mmse.7.man > .\" coding: latin-1 > $ echo $LC_CTYPE > en_US.utf8 > $ preconv -d contrib/mm/groff_mmse.7.man > /dev/null > fallback encoding: 'UTF-8' > processing 'contrib/mm/groff_mmse.7.man' > no coding tag > could not detect encoding with uchardet > encoding used: 'UTF-8' > incomplete UTF-8 sequence(s) in input stream: replacing each such sequence > with 0xFFFD > $ preconv --version > GNU preconv (groff) version 1.23.0.1624-4d251-dirty with iconv support and > without uchardet support > > And on a latin-1 terminal, it guesses the wrong encoding for > meintro_fr.me.in: > > $ fgrep 'coding: ' doc/meintro_fr.me.in > .\" coding: utf-8 > $ echo $LC_CTYPE > en_US.iso88591 > $ preconv -d doc/meintro_fr.me.in > /dev/null > fallback encoding: 'ISO-8859-1' > processing 'doc/meintro_fr.me.in' > no coding tag > could not detect encoding with uchardet > encoding used: 'ISO-8859-1' > > Putting the coding: tag at the tops of the files, following the > examples of the two .mom files I cited, fixes both of these. Hrm, yup. If that provokes GNU Emacs into bad ergonomics as noted above, it may be time to migrate at least these two files to UTF-8 in the source tree. That day is coming one way or the other... > {savane: user = 108747; tracker = bugs; item = 66287}
signature.asc
Description: PGP signature