Re: [Groff] Character class query

Werner LEMBERG Mon, 02 Mar 2009 09:30:08 -0800

> I'm working on resurrecting Brian M. Carlson's work on character
> classes, and attempting to update it in the light of Werner's
> comments in the short thread on the subject in January 2008.


Great!

> I have a query about my planned design that I'd like to run past you
> first, though. After playing about with a few possibilities, I noted
> that, at least in theory, we would want to be able to apply several of
> the same sets of attributes to character classes as we do to individual
> groff entities: [...]

Yes.

> [...] it seems sensible to simply put character classes in the same
> symbol table as ordinary groff entities, and add character-range and
> class-nesting support to 'class charinfo'.

Good idea.  It's so simple that noone has had this idea before.

> Obviously a class that consisted of more than just a single
> character wouldn't have a Unicode codepoint or a glyph number or
> anything, and \[CJKprepunct] wouldn't produce any output, but
> '.cflags 2 \[CJKprepunct]' or whatever would be a sensible thing to
> write.

We could introduce a naming convention for character classes, say, to
start such names with a dot, having the word `class' in its name, or
something similar.  Since the list of groff entities is not
extensible, we have a broad range of possibilities.  We could even use
names similar to POSIX character ranges, e.g.,

  .char \C'[:digit:]' 0123456789
  abc\C'[:digit:]'abc

Note that entities with a `]' in its name can't be accessed with
\[...]; this might work as an additional protection against accidental
misuse.

> A simple initial implementation would essentially just change the
> accessor methods of 'class charinfo' to look through all registered
> character classes for ones that include the current character
> (intentionally vague here as I haven't yet worked out how to deal
> with ranges of Unicode codepoints that haven't been given entity
> indices).

This should probably support fall-back classes too, similar to the
current mechanism for ordinary entities.

> For a small number of classes this ought to be perfectly adequate,
> and the lookups can be optimised later. My immediate needs (CJK
> support, of course!) only seem to require classes for
> no-break-before and no-break-after kinsoku shori, a general notion
> of "CJK character" so that we can adjust kerning between CJK and
> Latin characters,

This notion is also necessary to indicate that a break after the
current CJK character is allowed.

> and a class for double-width characters.

Not on the input side.

> I assume that the latter two would need to be done on the font side,

Exactly.

> BTW, in light of Werner's comments that glyphs are strictly an output
> notion, it isn't half confusing that 'class charinfo' is based on
> 'struct glyph' ...

Well, those names are historical, and while James Clark implemented
the character/glyph separation quite cleanly, he doesn't paid much
attention to proper structure and class names.


    Werner

Re: [Groff] Character class query

Reply via email to