Re: [Groff] Groff and Unicode code-point input.

Werner LEMBERG Sat, 14 May 2011 15:45:59 -0700

> I wonder if anybody knows the status of this:
> 
>  http://lists.gnu.org/archive/html/groff/2000-04/msg00036.html
> 
> In short, using \U'N' to input a Unicode codepoint N.


>From groff.info:

   * A glyph for Unicode character U+XXXX[X[X]] which is not a
     composite character is named `uXXXX[X[X]]'.  X must be an
     uppercase hexadecimal digit.  Examples: `u1234', `u008E',
     `u12DB8'.  The largest Unicode value is 0x10FFFF.  There must be at
     least four `X' digits; if necessary, add leading zeroes (after the
     `u').  No zero padding is allowed for character codes greater than
     0xFFFF.  Surrogates (i.e., Unicode values greater than 0xFFFF
     represented with character codes from the surrogate area
     U+D800-U+DFFF) are not allowed too.

   [...]

Note that this mechanism won't work for (printable) ASCII characters,
which you still have to use as-is.  If you use UTF-8 as a Unicode
representation, all characters longer than a single byte can be
converted to the \[uXXXX] representation form.

On the other hand, there is no longer a need to do this manually:
groff comes with `preconv', a preprocessor which can convert virtually
any encoding (using the `iconv' function) to \[uXXXX].


    Werner

Re: [Groff] Groff and Unicode code-point input.

Reply via email to