Generally, I recommend using the correct unicode characters in the TeX source and then define the behavior you want for them. In this case, this is fairly straight-forward:
1) TeX inserts empty discretionaries after each occurrence of the \hyphenchar (a per-font property which is usually equal to `-), which takes care of your first point quite nicely. 2) The soft hyphen can be made active and defined to yield “\-” (the only drawback to this character is that it is not very nicely displayed inside Terminal on MacOS): \catcode` =\active \def {\-} 3) The unicode hyphen "2010 can be made active and defined to yield “-” (ASCII hyphen), which is the right choice within TeX by construction: \catcode`‐=\active \def‐{-} 4) The non-breaking hyphen can also be made active and defined to yield “\hbox{-}” (the box prevents the discretionary after the ASCII hyphen from escaping, \nobreak does not help here): \catcode`‑=\active \def‑{\hbox{-}} Where those characters are encountered does not matter much in my experience, but you can always include macros for disabling these activations, akin to \catcode` =12 \catcode`‐=12 \catcode`‑=12 Given these, you should be able to adapt the procedure to solve the case with the middle dots. Regards, Roland On Oct 31, 2010, at 23:09 , BPJ wrote: > I'm trying to find out if and how Xe(La)TeX does > or can be made to treat the following characters > different frem each other and/or in a 'smart' way: > > 1) U+002D HYPHEN-MINUS > 2) U+00AD SOFT HYPHEN > 3) U+2010 HYPHEN > 4) U+2011 NON-BREAKING HYPHEN > > Specifically I'd like to get the correct behavior for > Swedish so that a linebreak may occur after an ASCII hyphen > but not after a Unicode non-breaking hyphen. While globally > replacing every Unicode soft hyphen with \- is easy you > cannot, unfortunately, globally replace every ASCII hyphen > with some command which would do the right thing (whatever > that command may be) as the ASCII hyphen may occur in > command arguments which I've already inserted, and which are > not to be interpreted as text. (Though I think that such would typically be > followed by a digit rather than a letter...) > > I also have sort of the same thoughts about > > 5) U+00B7 MIDDLE DOT > 6) U+2027 HYPHENATION POINT > > or rather I would want some way to distinguish between a > middle dot after which a linebreak may occur and one after > which it may not. > > I guess I'm basically looking for a \maylinebreak command! > > /bpj > > > -------------------------------------------------- > Subscriptions, Archive, and List information, etc.: > http://tug.org/mailman/listinfo/xetex -- I'm a physicist: I have a basic working knowledge of the universe and everything it contains! - Sheldon Cooper (The Big Bang Theory) -------------------------------------------------- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex