Wow! This is wonderful, Jonathan. ShreeDevi ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Tue, Feb 23, 2016 at 3:36 PM, Jonathan Kew <jfkth...@gmail.com> wrote: > On 23/2/16 02:54, Andrew Cunningham wrote: > >> It would probably more than double, i was under the impression that >> ActualText was a tag attrubute, so extensive tagging would be needed, >> and actual text added to the tags. >> > > The ActualText tagging is highly compressible, so in practice the increase > in overall PDF size is not all that great. > > >> But the question is how to practically make use of ActualText if there >> is a visible text layer. >> >> PDF/UA for instance leaves the question deliberately ambigious. >> ActualText is the way to make the content accessible, but developers >> creating tools for PDF do not actually have to process the ActualText. >> >> So to index and search PDF files you need to build a discovery system >> utilising tools that allow you to specify the use of ActualText in >> preference to a visible text layer. >> >> > Acrobat Reader uses it, if present, so that Copy/Paste from the PDF > results in the correct Unicode text (more or less), and Find behaves as > expected. > > Other PDF readers (such as Apple's Preview) may well ignore the ActualText > tagging, in which case it doesn't help. I don't know whether tools like > Evince or Okular handle it.... > > > I'm attaching two sample PDFs with a simple chunk of Hindi text (from the > Unicode web site). The first, dev-old.pdf, is what XeTeX currently > generates (using the "Annapurna SIL" OpenType font). In general, Copy/Paste > and text search don't work very well -- a few characters may be OK, but > others are junk. > > The second sample, dev-actualtext.pdf, was generated with an experimental > new \XeTeXgenerateactualtext feature, which automatically "tags" each word > with an ActualText representation. > > Some points to note: > > - The file size is 24662 bytes, while dev-old was 22875 bytes. Not too > bad. Of course, a lot of that is the embedded font data; with longer > documents that have lots of text but only a few fonts, the difference would > presumably be somewhat greater. > > - Copy/Paste and Search work pretty well in Acrobat Reader. Not in > Preview.app. > > - Highlighting of selected text (in Acrobat Reader) is somewhat broken, > apparently due to the ActualText tagging (it looks better in dev-old). This > may be fixable by tweaking exactly how the tagging is written into the PDF; > I haven't investigated it further. > > > No guarantees at this point as to whether/when this feature will actually > be available. It was just a quick attempt to hack something up, to see how > promising the results might be... > > JK > > > > > -------------------------------------------------- > Subscriptions, Archive, and List information, etc.: > http://tug.org/mailman/listinfo/xetex > >
-------------------------------------------------- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex