Hi Dave, At 2024-08-06T15:28:25-0500, Dave Kemper wrote: > On Tue, Aug 6, 2024 at 1:34 PM G. Branden Robinson > <g.branden.robin...@gmail.com> wrote: > > The hyphenation language (`.hla`) and hyphenation mode (`.hy`) are > > the same for these two scenarios. > > Yes, sloppy wording on my part. By "default hyphenation" I meant no > aspect of it was changed by the input file. Command-line switches of > course had an effect.
Understood. > > Therefore these characters did not acquire nonzero hyphenation > > codes, and therefore were not valid hyphenation breakpoints. > > > > Does this make sense? > > Yes. It makes me wonder about the wisdom of commit 0629380a9's move > of the .hcode blocks. That is, I understand the reasoning for it you > and Werner put forth, that the underlying groff design didn't > contemplate a single run needing different languages' hyphenation > support. But it also didn't quite rule it out. We have been generating a document bearing this requirement since before the 1.23.0 release -- groff-man-pages.{pdf,utf8.txt}. It switches from English to Swedish and back to render groff_mmse(7). You can observe the dance that we perform to achieve this in our "doc" directory's Automake file. https://git.savannah.gnu.org/cgit/groff.git/tree/doc/doc.am?h=1.23.0#n251 > But tying an initial hyphenation scheme to a language seems to at > least tie it to the right thing at the outset, whereas tying it to an > encoding perhaps doesn't. There are two aspects to the hyphenation scheme, in this sense. 1. which characters are letters in the given character encoding 2. which letters behave exactly like other letters for hyphenation purposes in a given language Point 1 is determined by the character encoding. Point 2 is too, in part, for case-folding purposes. The remainder of point 2 would cover situations like "hyphenate 'n' just like 'ñ', as Spanish hypothetically might. However, to date, this remainder has never been addressed by groff's hyphenation support. It could be--it just demands contributors with the requisite knowledge of their language's hyphenation rules. You may notice something unusual about "latin5.tmac" in Git HEAD: .hcode İ i \" exceptional case; move to tr.tmac if we ever get one ...which, I'll grant, makes "point 1" more complicated again. Most languages don't change the lettercase mapping rules. Most languages aren't Turkish. I guess I should add .hcode I ı too, huh? > > If so, what I will do is make "en.tmac" `.mso latin1.tmac`. > > That will solve the problem for English. Are there other language > files that will need it? Every other groff localization file for a Western language -- almost -- `mso`s an encoding macro file already. $ grep mso tmac/{cs,de,den,es,fr,it,ru,sv}.tmac | grep -v trans tmac/cs.tmac:.mso latin2.tmac tmac/de.tmac:.mso latin1.tmac tmac/den.tmac:.do mso de.tmac tmac/es.tmac:.mso latin9.tmac tmac/fr.tmac:.mso latin9.tmac tmac/ru.tmac:.mso koi8-r.tmac tmac/sv.tmac:.mso latin1.tmac I will therefore add .mso latin1.tmac to both "en.tmac" _and_ "it.tmac". > Will some language files need other tmac/latin*.tmac sourced? Yes, but they have them already, and in some cases for a long time. $ git blame tmac/fr.tmac | grep 'mso.*latin' fd7264f136 (Werner LEMBERG 2006-02-07 05:46:08 +0000 156) .mso latin9.tmac Regards, Branden
signature.asc
Description: PGP signature