Re: [tesseract-ocr] tesseract output is of first page only

ilevy Fri, 09 Aug 2019 11:12:27 -0700

That worked, thank you very much Shree!

I could tell right away that it was working because it was writing to 
stdout:


Tesseract Open Source OCR Engine v4.1.0 with Leptonica

Page 1

Page 2

Page 3

Page 4

Page 5

Page 6

Page 7

Page 8

Page 9

Page 10

Page 11

Page 12

Page 13

Detected 14 diacritics

Page 14

and so no. And finally I had the txt with all of the text as expected.

Something should be noted somewhere that at least in certain contexts 
multipage png files -- whatever "multipage" means in the case of these 
files -- will not render correctly.

On Friday, August 9, 2019 at 7:42:59 AM UTC-7, shree wrote:
>
> Try creating a multipage tiff from your pdf and try.
>
> On Fri, 9 Aug 2019, 11:11 ilevy, <textr...@gmail.com <javascript:>> wrote:
>
>> I'm trying tesseract for the first time with a png of a multipage 
>> document I saved out of a pdf (which itself was just an image).
>>
>> When I run tesseract, I get an output of the first page, but that's all. 
>> I notice that there's a control-L (^L) at the end of the text file.
>>
>> How do I get the entire file output to txt?
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesser...@googlegroups.com <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/4067da33-b1d1-4bbe-9909-9b5552c49549%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/4067da33-b1d1-4bbe-9909-9b5552c49549%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/dca9c4ef-a731-40b1-b914-6f4c225153f1%40googlegroups.com.

Re: [tesseract-ocr] tesseract output is of first page only

Reply via email to