Follow-up Comment #23, bug #66919 (group groff):

Now we get to where my conceptual groundwork of comment #20 starts interacting
with concrete examples.

[comment #19 comment #19:]
> So I would say that this:
> 
> $ printf '.hcode A A\n.pchar aA\n' | groff 2>&1 | grep -E '(char|hyph)'
> [snip]
> 
> ...illustrates neither "generation" of a (presumptively new) hyphenation
> code

No character previously had a hyphenation code of 65, so groff has,
conceptually, generated one.  Your input above tells groff, "I want the
character 'A' to be considered a potential hyphenation point, and to be
considered not equivalent to any existing characters with hyphenation codes."
This matches the typical user's (i.e., one who isn't peering into the
implementation) understanding of what it means to generate a new hyphenation
code.

Your own words, in fact, betray your view being colored by knowledge of
formatter internals:

> It _was_ 97, and we knocked it back to 65.

It was, in no sense meaningful to a user, knocked _back_ to anything, as it
never had 65 as a hyphenation code.  You happen to know James Clark's
initialization algorithm, but even that is arbitrary: Clark could just as
easily have initialized "A" and "a" to the same value without going through
his set-it-then-change-it routine, at which point the "back" in your sentence
becomes an actual and not merely conceptual misstatement.

> I say that, for English, you cannot assume that `\[o~]` is
> going to behave just like `õ`.  The latter is _not defined in the
> English alphabet_, so you can't rely on its hyphenation code having any
> particular value.

That's a fair statement.  But even though I'm running groff with its default
startup (English) files, the behavior I'm talking about in this ticket is in
the formatter, not in any startup files.  What I'm talking about has nothing
to do with the input _language_ and everything to do with input _encoding_.
(You'll notice that I'm not providing any sample input with any English words.
 The two words I've used, lanteronial, and lanterõnial--and then only to work
around the lack of .pchar in older groffs--aren't part of any language that
I'm aware of.  So I'm talking about general formatter behavior, independent of
any language setting.)

In all other respects, groff treats \[o~] and the Latin-1 õ as the same
character.  If you want to make .hcode treat them as different, the
documentation should clearly highlight this difference.

But I suspect that, once you start trying to explain to readers why ".hcode
\[o~] õ" and ".hcode õ õ" behave differently in Latin-1 input, you might
start to think that's not actually such a wise thing for groff to do.

> But one thing's for sure, in the KOI8-R locale:
> 
> .hcode \[~o] õ
> 
> ...is **not** reflexive, because in that locale it really looks like
> this:
> 
> .hcode \[~o] Т \" CYRILLIC CAPITAL LETTER TE

Right, the meaning of any character with the 8th bit set depends on the input
encoding.  That's true whether we're talking about .hcode or any other part of
groff input.  So this point is not really relevant to .hcode itself.

My examples are all Latin-1 input, which I've tried to be clear about (in the
comment #0 example, by running "file" on the input; in the comment #4 one, by
stating the encoding before presenting the input file).  If groff offered a
special character for CYRILLIC CAPITAL LETTER TE, I'm certain that using that
with the above .hcode example in KOI8-R encoding would reveal the same
behavior.

> I claim that, in _groff_, special characters _cannot_, and have never
> been able to, participate in reflexive hyphenation code assignments.

I crafted the example in comment #4 precisely to show the change between older
groffs and the current one when using a special character in a reflexive
hcode.  Using the made-up word seemed easier to me than backporting .pchar,
but if the latter is easy for you to do, it can show the effect of various
.hcode requests directly.


    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?66919>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/

Attachment: signature.asc
Description: PGP signature

Reply via email to