Re: [XeTeX] Unicode hyphens etc. and Xe(La)TeX

Roland Kuhn Mon, 01 Nov 2010 05:55:54 -0700

Generally, I recommend using the correct unicode characters in the TeX source 
and then define the behavior you want for them. In this case, this is fairly 
straight-forward:


1) TeX inserts empty discretionaries after each occurrence of the \hyphenchar 
(a per-font property which is usually equal to `-), which takes care of your 
first point quite nicely.

2) The soft hyphen can be made active and defined to yield “\-” (the only 
drawback to this character is that it is not very nicely displayed inside 
Terminal on MacOS):
\catcode` =\active
\def {\-}

3) The unicode hyphen "2010 can be made active and defined to yield “-” (ASCII 
hyphen), which is the right choice within TeX by construction:
\catcode`‐=\active
\def‐{-}

4) The non-breaking hyphen can also be made active and defined to yield 
“\hbox{-}” (the box prevents the discretionary after the ASCII hyphen from 
escaping, \nobreak does not help here):
\catcode`‑=\active
\def‑{\hbox{-}}

Where those characters are encountered does not matter much in my experience, 
but you can always include macros for disabling these activations, akin to
\catcode` =12
\catcode`‐=12
\catcode`‑=12

Given these, you should be able to adapt the procedure to solve the case with 
the middle dots.

Regards,

Roland

On Oct 31, 2010, at 23:09 , BPJ wrote:

> I'm trying to find out if and how Xe(La)TeX does
> or can be made to treat the following characters
> different frem each other and/or in a 'smart' way:
> 
>       1) U+002D HYPHEN-MINUS
>       2) U+00AD SOFT HYPHEN
>       3) U+2010 HYPHEN
>       4) U+2011 NON-BREAKING HYPHEN
> 
> Specifically I'd like to get the correct behavior for
> Swedish so that a linebreak may occur after an ASCII hyphen
> but not after a Unicode non-breaking hyphen. While globally
> replacing every Unicode soft hyphen with \- is easy you
> cannot, unfortunately, globally replace every ASCII hyphen
> with some command which would do the right thing (whatever
> that command may be) as the ASCII hyphen may occur in
> command arguments which I've already inserted, and which are
> not to be interpreted as text. (Though I think that such would typically be 
> followed by a digit rather than a letter...)
> 
> I also have sort of the same thoughts about
> 
>       5) U+00B7 MIDDLE DOT
>       6) U+2027 HYPHENATION POINT
> 
> or rather I would want some way to distinguish between a
> middle dot after which a linebreak may occur and one after
> which it may not.
> 
> I guess I'm basically looking for a \maylinebreak command!
> 
> /bpj
> 
> 
> --------------------------------------------------
> Subscriptions, Archive, and List information, etc.:
> http://tug.org/mailman/listinfo/xetex

--
I'm a physicist: I have a basic working knowledge of the universe and 
everything it contains!
    - Sheldon Cooper (The Big Bang Theory)




--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex

Re: [XeTeX] Unicode hyphens etc. and Xe(La)TeX

Reply via email to