OCFmyPDF is a command-line utility that will take image-only PDFs, perform OCR and add a text layer to the PDF, allowing it to be searched. It is written in Python and C++, and on Linux is installed via the Python 'pip' installer.
I tried installing it under Cygwin64 but ran into a compiler error while building a dependency, pikepdf. This turned out to be fixable by a single CFLAGS change (from -std=c++14 to -std=gnu++14), which the maintainer of pikepdf (and OCRmyPDF) graciously fast-tracked. The instructions for installing under Cygwin are: 1. Install the following Cygwin packages: python36 (or later) python3?-devel python3?-pip python3?-lxml (where 3? means match the version of python3 you installed) gcc-g++ ghostscript libexempi3 libexempi-devel libffi6 libffi-devel pngquant qpdf libqpdf-devel tesseract-ocr tesseract-ocr-devel 2. In a terminal, run the following commands pip3 install wheel pip3 install ocrmypdf Note: You may get a warning about the version of pip that came with Cygwin being out of date. It is not required, but if you want you can update pip to the latest version with pip3 install --upgrade pip But note that if you do this the command name will now be just 'pip' instead of 'pip3'. There is one optional dependency, "unpaper" that is currently not available under Cygwin. Without it, certain options such as --clean will produce an error message. However, the OCR-to-text-layer functionality is available. I'll take a look at building a Cygwin version of unpaper. I've tried this in a clean, minimal Cygwin install but would like to get confirmation from a few other people before submitting this to the OCRmyPDF maintainer for inclusion in their install instructions. Is there anyone with interest in OCRmyPDF willing to try these instructions and report back? Off-list is fine if that would be off- topic here. Thanks -- Jim Garrison j...@acm.org -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation: https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple