Re: [tesseract-ocr] Re: Post OCR Verification and Editing

2024-03-08 Thread Merlijn B.W. Wajer
Hi Mark, On 08/03/2024 20:24, Mark Pellegrino wrote: Thank you Merlijn, this is very helpful. I'm very interested in IA's process so I'll have a deep dive through those tools.  This confirms my suspicions that there's no way to use an off-the-shelf text editor with a glyphless font. I'll explo

Re: [tesseract-ocr] Re: Post OCR Verification and Editing

2024-03-08 Thread Mark Pellegrino
Thank you Merlijn, this is very helpful. I'm very interested in IA's process so I'll have a deep dive through those tools. This confirms my suspicions that there's no way to use an off-the-shelf text editor with a glyphless font. I'll explore these hOCR editor options. All the best, On Fri, Mar

Re: [tesseract-ocr] Re: Post OCR Verification and Editing

2024-03-08 Thread Mark Pellegrino
Thanks Zedenko, PyMuPDF is an intriguing option. I'll check it out further. On Fri, Mar 8, 2024 at 6:14 AM Zdenko Podobny wrote: > Hello, > > > I am not sure if OCRmyPDF(https://ocrmypdf.readthedocs.io/en/latest/) > allows redaction. > > If you would to implement text layer by yourself with cust

[tesseract-ocr] i got Failed to continue from: data/eng/eng_num_vert.lstm

2024-03-08 Thread thangaraj r
Warning: LSTMTrainer deserialized an LSTMRecognizer! Error, data/eng/eng_num_vert.lstm is an integer (fast) model, cannot continue training Failed to continue from: data/eng/eng_num_vert.lstm make: *** [Makefile:351: data/eng_num_vert/checkpoints/eng_num_vert_checkpoint] Error 1 i need to Fine t

Re: [tesseract-ocr] Re: Post OCR Verification and Editing

2024-03-08 Thread Merlijn B.W. Wajer
Hi Mark, On 07/03/2024 20:53, Mark Pellegrino wrote: I found more info here: https://github.com/tesseract-ocr/tesseract/issues/1769#issuecomment-509490277 Glyphless appears to be an 'invisible font' and all that Tesseract supports. It seems like the solution it to use Tesseract to generate hO

Re: [tesseract-ocr] Re: Post OCR Verification and Editing

2024-03-08 Thread Zdenko Podobny
Hello, I am not sure if OCRmyPDF(https://ocrmypdf.readthedocs.io/en/latest/) allows redaction. If you would to implement text layer by yourself with custom font, have a look at PyMuPDF: - https://github.com/pymupdf/PyMuPDF/discussions/775 (Adding text layer to a scanned PDF) - https://