That worked, thank you very much Shree! I could tell right away that it was working because it was writing to stdout:
Tesseract Open Source OCR Engine v4.1.0 with Leptonica Page 1 Page 2 Page 3 Page 4 Page 5 Page 6 Page 7 Page 8 Page 9 Page 10 Page 11 Page 12 Page 13 Detected 14 diacritics Page 14 and so no. And finally I had the txt with all of the text as expected. Something should be noted somewhere that at least in certain contexts multipage png files -- whatever "multipage" means in the case of these files -- will not render correctly. On Friday, August 9, 2019 at 7:42:59 AM UTC-7, shree wrote: > > Try creating a multipage tiff from your pdf and try. > > On Fri, 9 Aug 2019, 11:11 ilevy, <textr...@gmail.com <javascript:>> wrote: > >> I'm trying tesseract for the first time with a png of a multipage >> document I saved out of a pdf (which itself was just an image). >> >> When I run tesseract, I get an output of the first page, but that's all. >> I notice that there's a control-L (^L) at the end of the text file. >> >> How do I get the entire file output to txt? >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesser...@googlegroups.com <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/4067da33-b1d1-4bbe-9909-9b5552c49549%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/4067da33-b1d1-4bbe-9909-9b5552c49549%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/dca9c4ef-a731-40b1-b914-6f4c225153f1%40googlegroups.com.