Hi all,
several years ago I did some texts with pdflatex and the devnag package
(XeTeX did not exist at that time), it is still here:
http://icebearsoft.euweb.cz/dvngpdf/
The situation in the Indic scripts are much more complex and cannot be
solved by a ToUnicode map. Half-consonants can be mappe
Just curious: how does Word or InDesign solve such problems (once a
PDF gets generated)?
Mojca
--
Subscriptions, Archive, and List information, etc.:
http://tug.org/mailman/listinfo/xetex
On 23/2/16 02:54, Andrew Cunningham wrote:
It would probably more than double, i was under the impression that
ActualText was a tag attrubute, so extensive tagging would be needed,
and actual text added to the tags.
The ActualText tagging is highly compressible, so in practice the
increase in
Wow! This is wonderful, Jonathan.
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Tue, Feb 23, 2016 at 3:36 PM, Jonathan Kew wrote:
> On 23/2/16 02:54, Andrew Cunningham wrote:
>
>> It would probably more than double,
How Jonathan,
how do you put the ActualText to PDF? Is it per syllable, or per word? We
have a commercial OCR software that can convert scanned PDF to pages with
selectable texts. I have not examined it thoroughly but it seems to me that
it analyzes the scanned image, splits it to subimages "per w
On 23/2/16 10:37, Zdenek Wagner wrote:
How Jonathan,
how do you put the ActualText to PDF? Is it per syllable, or per word?
Per word.
We have a commercial OCR software that can convert scanned PDF to pages
with selectable texts. I have not examined it thoroughly but it seems to
me that it an
They don't solve it.
On Tuesday, 23 February 2016, Mojca Miklavec
wrote:
> Just curious: how does Word or InDesign solve such problems (once a
> PDF gets generated)?
>
> Mojca
>
>
> --
> Subscriptions, Archive, and List information, etc.:
> http:/
I am attaching a sample pdf and it's OCRed text using Tesseract OCR (
https://github.com/tesseract-ocr/tesseract).
The resulting pdf allows for search as well as copy paste for devanagri
unicode text.
The pdf is rendered using the original image, but the OCRed text is
available as text layer maki
The code for the \XeTeXgenerateactualtext feature (it's an integer
parameter; set it to 1 to get ActualText added to the PDF, for better
copy/paste and search in Acrobat) is now on sourceforge, in an
"actualtext" branch, for anyone who wants to try building and
experimenting with it.
Note tha
Jonathan,
this is splendid. Adding support for the PDF "ActualText" tagging layer is a
huge step.
I wonder — what happens in case of mathematical formulae?
I think it would be rather clever to embed the TeX notation or even, huh huh,
MathML into the ActualText layer for the math mode — per
On 23/2/16 14:52, Adam Twardoch (List) wrote:
Jonathan,
this is splendid. Adding support for the PDF "ActualText" tagging layer
is a huge step.
I wonder — what happens in case of mathematical formulae?
At this point, nothing in particular. :)
I think it would be rather clever to embed the
Jonathan,
is there any method in XeTeX to explicitly emit "ActualText" or override the
automatic content generated by the new option?
Or could you envision such a method? How would one need to approach it?
(I'm not saying you should try implement it right away). :)
A.
Sent from my mobile p
Hi Jonathan,
Akira, if you could check that the patch seems OK, that would be great.
I've not really looked at dvipdfm-x code in a long time. I haven't
pushed this it to TL yet, as it's all rather experimental, but I hope we
can safely include it for TL'16.
Thanks very much. I think it is OK
On 23/2/16 15:29, Adam Twardoch (List) wrote:
Jonathan,
is there any method in XeTeX to explicitly emit "ActualText" or override the
automatic content generated by the new option?
Not currently. What you get is the Unicode text of each "word"
(consecutive run of non-space characters in a giv
Hi Akira,
I have a similar problem in Linux. As Jonathan wrote, highlighting is quite
weird but the result is OK.
Zdeněk Wagner
http://ttsm.icpf.cas.cz/team/wagner.shtml
http://icebearsoft.euweb.cz
2016-02-23 17:46 GMT+01:00 Akira Kakuto :
> Hi Honathan,
>
> The code for the \XeTeXgenerateactua
On 23/2/16 16:50, Zdenek Wagner wrote:
Hi Akira,
I have a similar problem in Linux. As Jonathan wrote, highlighting is
quite weird but the result is OK.
Yes, that's what I meant about highlighting not working very well. I
don't know yet whether there's something we can do when generating the
Using Akira-san's "actest.pdf" as sample, Adobe Acrobat Pro 7.1 allows
me to select only half of the text whereas Adobe Reader DC allows me to
select it all; neither allows me to select individual kanji.
** Phil.
--
Subscriptions, Archive, and List
On 23/2/16 17:39, Philip Taylor wrote:
Using Akira-san's "actest.pdf" as sample, Adobe Acrobat Pro 7.1 allows
me to select only half of the text whereas Adobe Reader DC allows me to
select it all; neither allows me to select individual kanji.
Ah, right... as there are no spaces between the kan
Jonathan Kew wrote:
> In either case, copy&paste actually gives you the whole text, even
> though AAPro only highlights half of it, I guess?
Yes, six (consecutive) instances of 日本国憲法
** Phil.
--
Subscriptions, Archive, and List information, etc.
Is it copying actualtext or the text layer?
A beeter test would be one of the complex scripts.
Andrew
On Wednesday, 24 February 2016, Philip Taylor wrote:
>
>
> Jonathan Kew wrote:
>
>> In either case, copy&paste actually gives you the whole text, even
>> though AAPro only highlights half of it
On 2016-02-19 03:31, Jonathan Kew wrote:
> Note that the new features in xetex do not in any way enforce a
> particular way of writing (for Urdu or anything else). The inter-word
> spacing is primarily under the control of the font designer;
> \XeTeXinterwordspaceshaping merely makes it possible fo
21 matches
Mail list logo