On 23/2/16 15:29, Adam Twardoch (List) wrote:
Jonathan,
is there any method in XeTeX to explicitly emit "ActualText" or override the
automatic content generated by the new option?
Not currently. What you get is the Unicode text of each "word"
(consecutive run of non-space characters in a given font).
Or could you envision such a method? How would one need to approach it?
(I'm not saying you should try implement it right away). :)
For a document that wants some other kind of "ActualText", there's going
to need to be pretty detailed markup in the source, I think. (E.g. each
word, or similar unit, will need to be tagged to provide the desired
ActualText that goes with it.) At that point, I wonder if turning off
\XeTeXgenerateactualtext and just doing it "manually" with macros that
generate \special{}s would be the most reasonable way forward.
I suppose it's possible you might want automatic ActualText for most of
the content, but custom overrides for certain fragments. At this point,
there's no support for that -- \XeTeXgenerateactualtext is a switch that
takes effect at \shipout time, so in effect it is "global" for all the
content on a page -- but perhaps we could make it scoped, so that you
could toggle it on/off at will within the text.
That probably wouldn't be hard to do; I'll give it a bit more thought.
JK
A.
Sent from my mobile phone.
On 23.02.2016, at 16:00, Jonathan Kew <jfkth...@gmail.com> wrote:
On 23/2/16 14:52, Adam Twardoch (List) wrote:
Jonathan,
this is splendid. Adding support for the PDF "ActualText" tagging layer
is a huge step.
I wonder — what happens in case of mathematical formulae?
At this point, nothing in particular. :)
I think it would be rather clever to embed the TeX notation or even, huh
huh, MathML into the ActualText layer for the math mode — per equation,
of course :) .
I think these are ideas that could usefully be explored/implemented at the
macro level, rather than being built in to the engine.
JK
Or use the "Unicode math linear format" as proposed by
Microsoft:
http://www.unicode.org/notes/tn28/UTN28-PlainTextMath-v3.pdf
A.
Sent from my mobile phone.
On 23.02.2016, at 15:43, Jonathan Kew <jfkth...@gmail.com
<mailto:jfkth...@gmail.com>> wrote:
The code for the \XeTeXgenerateactualtext feature (it's an integer
parameter; set it to 1 to get ActualText added to the PDF, for better
copy/paste and search in Acrobat) is now on sourceforge, in an
"actualtext" branch, for anyone who wants to try building and
experimenting with it.
Note that this requires a new version of xdvipdfmx, as it uses a new
DVI opcode. The patch for xdvipdfmx is attached here (based on the
current TeXLive svn source).
Akira, if you could check that the patch seems OK, that would be
great. I've not really looked at dvipdfm-x code in a long time. I
haven't pushed this it to TL yet, as it's all rather experimental, but
I hope we can safely include it for TL'16.
JK
<xdvipdfmx-for-xetex-0_99995.patch>
--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
http://tug.org/mailman/listinfo/xetex
--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
http://tug.org/mailman/listinfo/xetex
--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
http://tug.org/mailman/listinfo/xetex
--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
http://tug.org/mailman/listinfo/xetex
--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
http://tug.org/mailman/listinfo/xetex