Re: [XeTeX] New feature REQUEST for xetex

2016-02-23 Thread Zdenek Wagner
Hi all, several years ago I did some texts with pdflatex and the devnag package (XeTeX did not exist at that time), it is still here: http://icebearsoft.euweb.cz/dvngpdf/ The situation in the Indic scripts are much more complex and cannot be solved by a ToUnicode map. Half-consonants can be mappe

Re: [XeTeX] New feature REQUEST for xetex

2016-02-23 Thread Mojca Miklavec
Just curious: how does Word or InDesign solve such problems (once a PDF gets generated)? Mojca -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex

Re: [XeTeX] New feature REQUEST for xetex

2016-02-23 Thread Jonathan Kew
On 23/2/16 02:54, Andrew Cunningham wrote: It would probably more than double, i was under the impression that ActualText was a tag attrubute, so extensive tagging would be needed, and actual text added to the tags. The ActualText tagging is highly compressible, so in practice the increase in

Re: [XeTeX] New feature REQUEST for xetex

2016-02-23 Thread ShreeDevi Kumar
Wow! This is wonderful, Jonathan. ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, Feb 23, 2016 at 3:36 PM, Jonathan Kew wrote: > On 23/2/16 02:54, Andrew Cunningham wrote: > >> It would probably more than double,

Re: [XeTeX] New feature REQUEST for xetex

2016-02-23 Thread Zdenek Wagner
How Jonathan, how do you put the ActualText to PDF? Is it per syllable, or per word? We have a commercial OCR software that can convert scanned PDF to pages with selectable texts. I have not examined it thoroughly but it seems to me that it analyzes the scanned image, splits it to subimages "per w

Re: [XeTeX] New feature REQUEST for xetex

2016-02-23 Thread Jonathan Kew
On 23/2/16 10:37, Zdenek Wagner wrote: How Jonathan, how do you put the ActualText to PDF? Is it per syllable, or per word? Per word. We have a commercial OCR software that can convert scanned PDF to pages with selectable texts. I have not examined it thoroughly but it seems to me that it an

Re: [XeTeX] New feature REQUEST for xetex

2016-02-23 Thread Andrew Cunningham
They don't solve it. On Tuesday, 23 February 2016, Mojca Miklavec wrote: > Just curious: how does Word or InDesign solve such problems (once a > PDF gets generated)? > > Mojca > > > -- > Subscriptions, Archive, and List information, etc.: > http:/

Re: [XeTeX] New feature REQUEST for xetex

2016-02-23 Thread ShreeDevi Kumar
I am attaching a sample pdf and it's OCRed text using Tesseract OCR ( https://github.com/tesseract-ocr/tesseract). The resulting pdf allows for search as well as copy paste for devanagri unicode text. The pdf is rendered using the original image, but the OCRed text is available as text layer maki

[XeTeX] potential new feature: \XeTeXgenerateactualtext

2016-02-23 Thread Jonathan Kew
The code for the \XeTeXgenerateactualtext feature (it's an integer parameter; set it to 1 to get ActualText added to the PDF, for better copy/paste and search in Acrobat) is now on sourceforge, in an "actualtext" branch, for anyone who wants to try building and experimenting with it. Note tha

Re: [XeTeX] potential new feature: \XeTeXgenerateactualtext

2016-02-23 Thread Adam Twardoch (List)
Jonathan, this is splendid. Adding support for the PDF "ActualText" tagging layer is a huge step. I wonder — what happens in case of mathematical formulae? I think it would be rather clever to embed the TeX notation or even, huh huh, MathML into the ActualText layer for the math mode — per

Re: [XeTeX] potential new feature: \XeTeXgenerateactualtext

2016-02-23 Thread Jonathan Kew
On 23/2/16 14:52, Adam Twardoch (List) wrote: Jonathan, this is splendid. Adding support for the PDF "ActualText" tagging layer is a huge step. I wonder — what happens in case of mathematical formulae? At this point, nothing in particular. :) I think it would be rather clever to embed the

Re: [XeTeX] potential new feature: \XeTeXgenerateactualtext

2016-02-23 Thread Adam Twardoch (List)
Jonathan, is there any method in XeTeX to explicitly emit "ActualText" or override the automatic content generated by the new option? Or could you envision such a method? How would one need to approach it? (I'm not saying you should try implement it right away). :) A. Sent from my mobile p

Re: [XeTeX] potential new feature: \XeTeXgenerateactualtext

2016-02-23 Thread Akira Kakuto
Hi Jonathan, Akira, if you could check that the patch seems OK, that would be great. I've not really looked at dvipdfm-x code in a long time. I haven't pushed this it to TL yet, as it's all rather experimental, but I hope we can safely include it for TL'16. Thanks very much. I think it is OK

Re: [XeTeX] potential new feature: \XeTeXgenerateactualtext

2016-02-23 Thread Jonathan Kew
On 23/2/16 15:29, Adam Twardoch (List) wrote: Jonathan, is there any method in XeTeX to explicitly emit "ActualText" or override the automatic content generated by the new option? Not currently. What you get is the Unicode text of each "word" (consecutive run of non-space characters in a giv

Re: [XeTeX] potential new feature: \XeTeXgenerateactualtext

2016-02-23 Thread Zdenek Wagner
Hi Akira, I have a similar problem in Linux. As Jonathan wrote, highlighting is quite weird but the result is OK. Zdeněk Wagner http://ttsm.icpf.cas.cz/team/wagner.shtml http://icebearsoft.euweb.cz 2016-02-23 17:46 GMT+01:00 Akira Kakuto : > Hi Honathan, > > The code for the \XeTeXgenerateactua

Re: [XeTeX] potential new feature: \XeTeXgenerateactualtext

2016-02-23 Thread Jonathan Kew
On 23/2/16 16:50, Zdenek Wagner wrote: Hi Akira, I have a similar problem in Linux. As Jonathan wrote, highlighting is quite weird but the result is OK. Yes, that's what I meant about highlighting not working very well. I don't know yet whether there's something we can do when generating the

Re: [XeTeX] potential new feature: \XeTeXgenerateactualtext

2016-02-23 Thread Philip Taylor
Using Akira-san's "actest.pdf" as sample, Adobe Acrobat Pro 7.1 allows me to select only half of the text whereas Adobe Reader DC allows me to select it all; neither allows me to select individual kanji. ** Phil. -- Subscriptions, Archive, and List

Re: [XeTeX] potential new feature: \XeTeXgenerateactualtext

2016-02-23 Thread Jonathan Kew
On 23/2/16 17:39, Philip Taylor wrote: Using Akira-san's "actest.pdf" as sample, Adobe Acrobat Pro 7.1 allows me to select only half of the text whereas Adobe Reader DC allows me to select it all; neither allows me to select individual kanji. Ah, right... as there are no spaces between the kan

Re: [XeTeX] potential new feature: \XeTeXgenerateactualtext

2016-02-23 Thread Philip Taylor
Jonathan Kew wrote: > In either case, copy&paste actually gives you the whole text, even > though AAPro only highlights half of it, I guess? Yes, six (consecutive) instances of 日本国憲法 ** Phil. -- Subscriptions, Archive, and List information, etc.

Re: [XeTeX] potential new feature: \XeTeXgenerateactualtext

2016-02-23 Thread Andrew Cunningham
Is it copying actualtext or the text layer? A beeter test would be one of the complex scripts. Andrew On Wednesday, 24 February 2016, Philip Taylor wrote: > > > Jonathan Kew wrote: > >> In either case, copy&paste actually gives you the whole text, even >> though AAPro only highlights half of it

Re: [XeTeX] New feature planned for xetex

2016-02-23 Thread Bobby de Vos
On 2016-02-19 03:31, Jonathan Kew wrote: > Note that the new features in xetex do not in any way enforce a > particular way of writing (for Urdu or anything else). The inter-word > spacing is primarily under the control of the font designer; > \XeTeXinterwordspaceshaping merely makes it possible fo