Prima folks. I haven’t
>>>>>> done much correction of hand-written materials but Alethia seems
>>>>>> flexible
>>>>>> for a windows environment and exports the page format. You also can
>>>>>> start
>>>>>&
t;>>>> allows the use of the Alethia editor [1] from the Prima folks. I haven’t
>>>>>> done much correction of hand-written materials but Alethia seems flexible
>>>>>> for a windows environment and exports the page format. You also can start
>>>
gt;>>> for a windows environment and exports the page format. You also can start
>>>>> with hocr and/or roundtrip between alto, hocr, page, and other xml
>>>>> formats
>>>>> with the ocr-fileformat project [2], which include
ng.
>>>> Merlijn and the IA folks have great tools for combing hocr and images to
>>>> make a lightweight PDF if that’s your end-goal [3].
>>>>
>>>>
>>>>
>>>> Best,
>>>>
>>>>
>>>>
>>&
Hi Mark,
On 08/03/2024 20:24, Mark Pellegrino wrote:
Thank you Merlijn, this is very helpful. I'm very interested in IA's
process so I'll have a deep dive through those tools. This confirms my
suspicions that there's no way to use an off-the-shelf text editor with
a glyphless font. I'll explo
Thank you Merlijn, this is very helpful. I'm very interested in IA's
process so I'll have a deep dive through those tools. This confirms my
suspicions that there's no way to use an off-the-shelf text editor with a
glyphless font. I'll explore these hOCR editor options. All the best,
On Fri, Mar
Thanks Zedenko, PyMuPDF is an intriguing option. I'll check it out further.
On Fri, Mar 8, 2024 at 6:14 AM Zdenko Podobny wrote:
> Hello,
>
>
> I am not sure if OCRmyPDF(https://ocrmypdf.readthedocs.io/en/latest/)
> allows redaction.
>
> If you would to implement text layer by yourself with cust
Hi Mark,
On 07/03/2024 20:53, Mark Pellegrino wrote:
I found more info here:
https://github.com/tesseract-ocr/tesseract/issues/1769#issuecomment-509490277
Glyphless appears to be an 'invisible font' and all that Tesseract
supports. It seems like the solution it to use Tesseract to generate
hO
Hello,
I am not sure if OCRmyPDF(https://ocrmypdf.readthedocs.io/en/latest/)
allows redaction.
If you would to implement text layer by yourself with custom font, have a
look at PyMuPDF:
- https://github.com/pymupdf/PyMuPDF/discussions/775 (Adding text layer
to a scanned PDF)
- https://
9 matches
Mail list logo