Jonathan, This is a really useful feature and I look forward to using it once it is released in TLY2016.
Since how well the search and copy paste features work could also be font dependent, I would like to test some more PDFs in unicode devanagari created by this new feature using other fonts. I usually use Siddhanta and Sanskrit2003 font. I would appreciate if you or other members who have this feature installed can provide a few more sample PDFs in devanagari for testing. Thanks! - sent from my phone. excuse the brevity. On 24-Feb-2016 3:37 pm, "Jonathan Kew" <jfkth...@gmail.com> wrote: > On 24/2/16 09:22, ShreeDevi Kumar wrote: > >> Testing dev-actualtext.pdf sent by JK >> >> * Adobe Acrobat Reader XI on Windows 10 >> o Does not highlight text fully >> o SEARCH finds words and word parts correctly but usually >> highlights only beginning of the word containing the letter >> o COPY paste to NOTEPAD++, OPENOFFICE WRITER works correctly, >> o Save as TXT file does not work correctly - only saves ... in it, >> not the actual unicode text which can be copied >> > > So it looks like Acrobat makes use of the ActualText for Search and Copy, > but sadly its "Save as Text" doesn't support Unicode. > > I'm pleasantly surprised to see the Gmail previewer also handles it. > > The others (Foxit, Edge) sound like they're just working from the glyph > stream, which is basically doomed to failure. > > For a further data point, I tried Evince (Document Viewer) on Ubuntu > 15.10, and found that Copy and Search work well; it looks like it is using > the ActualText correctly. This is thanks to the poppler library, I believe. > The (poppler-based) "pdftotext" tool was also able to extract the Unicode > text correctly from the PDF, although "pdftohtml" didn't do so well. > > One issue with Evince is that drag-selecting text to highlight it (as for > Copy/Paste) looks bad: the highlighting completely obscures the selected > text, although it will end up being copied correctly. Interestingly, its > highlighting of search results doesn't suffer from this problem, and it > even makes a fair attempt (not completely accurate) at highlighting > specific letters within a word, not just entire words. > > JK > > > * Foxit Reader 7.3 on Windows 10 >> o Highlights text fully, >> o smallest highlight unit is word, >> o COPY paste to notepad++ as well as SEARCH does NOT work >> correctly as Unicode text is not fully correct. >> >> ूय >> >> िनकोड क्या ह ? ै >> >> o >> Save as TXT file does not work correctly - saves the unicode >> text with same problems as in copy and paste >> >> * >> Microsoft Edge Viewer on Windows 10 >> o >> >> Highlights text fully, >> o COPY paste to notepad++ as well as SEARCH does NOT work >> correctly as Unicode text is not fully correct. >> >> य ूिनकोड क्या है? >> >> * >> >> Previewing from within gmail in Chrome on Windows 10 - >> o Highlights text fully, >> o smallest highlight unit is word, >> o COPY paste to NOTEPAD++, OPENOFFICE WRITER works correctly, >> o (highlights only first letter of first word in >> paragraph यू rather than full word यूनिकोड) >> o there is NO SEARCH feature >> o there is no save as TXT file feature >> * Same as above while Previewing from within gmail in Internet >> Explorer on Windows 10 >> >> >> ShreeDevi >> ____________________________________________________________ >> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >> >> On Tue, Feb 23, 2016 at 11:30 PM, Jonathan Kew <jfkth...@gmail.com >> <mailto:jfkth...@gmail.com>> wrote: >> >> On 23/2/16 17:39, Philip Taylor wrote: >> >> Using Akira-san's "actest.pdf" as sample, Adobe Acrobat Pro 7.1 >> allows >> me to select only half of the text whereas Adobe Reader DC >> allows me to >> select it all; neither allows me to select individual kanji. >> >> >> Ah, right... as there are no spaces between the kanji, they'll end >> up in the same text object. That's a shortcoming of how the current >> implementation works, for scripts that don't use inter-word spaces. >> >> In either case, copy&paste actually gives you the whole text, even >> though AAPro only highlights half of it, I guess? >> >> JK >> >> >> >> >> -------------------------------------------------- >> Subscriptions, Archive, and List information, etc.: >> http://tug.org/mailman/listinfo/xetex >> >> >> >> >> >> >> -------------------------------------------------- >> Subscriptions, Archive, and List information, etc.: >> http://tug.org/mailman/listinfo/xetex >> >> > > > -------------------------------------------------- > Subscriptions, Archive, and List information, etc.: > http://tug.org/mailman/listinfo/xetex >
-------------------------------------------------- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex