OCR is the last thing you want to use. Far too many errors. If you are on Mac you can use Applescript to open, Select All and Copy and this will result in 100% of the text being available, no OCR errors. Unfortunately, depending on your document the output might not exactly match the input. This will be particularly true pages pages containing multi-column data or there are tables of data. It's easy enough to test, open the pdf in question, Select All, Copy and then Paste into TextEdit. You'll be left with 3 possibilities:
1) You are extremely lucky and your pdfs are very basic and the text output is a 100% match. LC solution very easy. 2) 90% of the document is fine but a couple of tables don't match. Your pdfs are standardised and these tables (or multi columns) appear in the same place. Will be possible to parse the data and use LC to correct the formatting. Development time will be considerably longer. 3) Your pdfs are random and there are tables and multi columns all over the place resulting in output that is anywhere between 1% to 10% accurate. Forget it, it will be almost impossible to reconstruct the jumble of text back to the original layout. HTH On Wed, Sep 17, 2014 at 4:25 AM, Jonathan Scott <so...@agate.plala.or.jp> wrote: > Hi, > I have some PDF files that I'd like to create a search stack for. I > looked on the net and found a way to display them in LiveCode, but is there > actually a way to read the OCR text that's in each file? If not, there > should be. > Thanks in advance. > _______________________________________________ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your subscription > preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode _______________________________________________ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode