Follow-up Comment #23, bug #66919 (group groff): Now we get to where my conceptual groundwork of comment #20 starts interacting with concrete examples.
[comment #19 comment #19:] > So I would say that this: > > $ printf '.hcode A A\n.pchar aA\n' | groff 2>&1 | grep -E '(char|hyph)' > [snip] > > ...illustrates neither "generation" of a (presumptively new) hyphenation > code No character previously had a hyphenation code of 65, so groff has, conceptually, generated one. Your input above tells groff, "I want the character 'A' to be considered a potential hyphenation point, and to be considered not equivalent to any existing characters with hyphenation codes." This matches the typical user's (i.e., one who isn't peering into the implementation) understanding of what it means to generate a new hyphenation code. Your own words, in fact, betray your view being colored by knowledge of formatter internals: > It _was_ 97, and we knocked it back to 65. It was, in no sense meaningful to a user, knocked _back_ to anything, as it never had 65 as a hyphenation code. You happen to know James Clark's initialization algorithm, but even that is arbitrary: Clark could just as easily have initialized "A" and "a" to the same value without going through his set-it-then-change-it routine, at which point the "back" in your sentence becomes an actual and not merely conceptual misstatement. > I say that, for English, you cannot assume that `\[o~]` is > going to behave just like `õ`. The latter is _not defined in the > English alphabet_, so you can't rely on its hyphenation code having any > particular value. That's a fair statement. But even though I'm running groff with its default startup (English) files, the behavior I'm talking about in this ticket is in the formatter, not in any startup files. What I'm talking about has nothing to do with the input _language_ and everything to do with input _encoding_. (You'll notice that I'm not providing any sample input with any English words. The two words I've used, lanteronial, and lanterõnial--and then only to work around the lack of .pchar in older groffs--aren't part of any language that I'm aware of. So I'm talking about general formatter behavior, independent of any language setting.) In all other respects, groff treats \[o~] and the Latin-1 õ as the same character. If you want to make .hcode treat them as different, the documentation should clearly highlight this difference. But I suspect that, once you start trying to explain to readers why ".hcode \[o~] õ" and ".hcode õ õ" behave differently in Latin-1 input, you might start to think that's not actually such a wise thing for groff to do. > But one thing's for sure, in the KOI8-R locale: > > .hcode \[~o] õ > > ...is **not** reflexive, because in that locale it really looks like > this: > > .hcode \[~o] Т \" CYRILLIC CAPITAL LETTER TE Right, the meaning of any character with the 8th bit set depends on the input encoding. That's true whether we're talking about .hcode or any other part of groff input. So this point is not really relevant to .hcode itself. My examples are all Latin-1 input, which I've tried to be clear about (in the comment #0 example, by running "file" on the input; in the comment #4 one, by stating the encoding before presenting the input file). If groff offered a special character for CYRILLIC CAPITAL LETTER TE, I'm certain that using that with the above .hcode example in KOI8-R encoding would reveal the same behavior. > I claim that, in _groff_, special characters _cannot_, and have never > been able to, participate in reflexive hyphenation code assignments. I crafted the example in comment #4 precisely to show the change between older groffs and the current one when using a special character in a reflexive hcode. Using the made-up word seemed easier to me than backporting .pchar, but if the latter is easy for you to do, it can show the effect of various .hcode requests directly. _______________________________________________________ Reply to this item at: <https://savannah.gnu.org/bugs/?66919> _______________________________________________ Message sent via Savannah https://savannah.gnu.org/
signature.asc
Description: PGP signature