On 24/2/16 09:22, ShreeDevi Kumar wrote:
Testing dev-actualtext.pdf sent by JK
* Adobe Acrobat Reader XI on Windows 10
o Does not highlight text fully
o SEARCH finds words and word parts correctly but usually
highlights only beginning of the word containing the letter
o COPY paste to NOTEPAD++, OPENOFFICE WRITER works correctly,
o Save as TXT file does not work correctly - only saves ... in it,
not the actual unicode text which can be copied
So it looks like Acrobat makes use of the ActualText for Search and
Copy, but sadly its "Save as Text" doesn't support Unicode.
I'm pleasantly surprised to see the Gmail previewer also handles it.
The others (Foxit, Edge) sound like they're just working from the glyph
stream, which is basically doomed to failure.
For a further data point, I tried Evince (Document Viewer) on Ubuntu
15.10, and found that Copy and Search work well; it looks like it is
using the ActualText correctly. This is thanks to the poppler library, I
believe. The (poppler-based) "pdftotext" tool was also able to extract
the Unicode text correctly from the PDF, although "pdftohtml" didn't do
so well.
One issue with Evince is that drag-selecting text to highlight it (as
for Copy/Paste) looks bad: the highlighting completely obscures the
selected text, although it will end up being copied correctly.
Interestingly, its highlighting of search results doesn't suffer from
this problem, and it even makes a fair attempt (not completely accurate)
at highlighting specific letters within a word, not just entire words.
JK
* Foxit Reader 7.3 on Windows 10
o Highlights text fully,
o smallest highlight unit is word,
o COPY paste to notepad++ as well as SEARCH does NOT work
correctly as Unicode text is not fully correct.
ूय
िनकोड क्या ह ? ै
o
Save as TXT file does not work correctly - saves the unicode
text with same problems as in copy and paste
*
Microsoft Edge Viewer on Windows 10
o
Highlights text fully,
o COPY paste to notepad++ as well as SEARCH does NOT work
correctly as Unicode text is not fully correct.
य ूिनकोड क्या है?
*
Previewing from within gmail in Chrome on Windows 10 -
o Highlights text fully,
o smallest highlight unit is word,
o COPY paste to NOTEPAD++, OPENOFFICE WRITER works correctly,
o (highlights only first letter of first word in
paragraph यू rather than full word यूनिकोड)
o there is NO SEARCH feature
o there is no save as TXT file feature
* Same as above while Previewing from within gmail in Internet
Explorer on Windows 10
ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Tue, Feb 23, 2016 at 11:30 PM, Jonathan Kew <jfkth...@gmail.com
<mailto:jfkth...@gmail.com>> wrote:
On 23/2/16 17:39, Philip Taylor wrote:
Using Akira-san's "actest.pdf" as sample, Adobe Acrobat Pro 7.1
allows
me to select only half of the text whereas Adobe Reader DC
allows me to
select it all; neither allows me to select individual kanji.
Ah, right... as there are no spaces between the kanji, they'll end
up in the same text object. That's a shortcoming of how the current
implementation works, for scripts that don't use inter-word spaces.
In either case, copy&paste actually gives you the whole text, even
though AAPro only highlights half of it, I guess?
JK
--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
http://tug.org/mailman/listinfo/xetex
--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
http://tug.org/mailman/listinfo/xetex
--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
http://tug.org/mailman/listinfo/xetex