Re: [tesseract-ocr] Combining output from multiple jobs into one hOCR file

2021-02-04 Thread Vidar
Thanks a million, both of these seem like excellent options! :D On Thursday, February 4, 2021 at 8:36:27 PM UTC Merlijn Wajer wrote: > Hi Vidar, > > On 04/02/2021 21:11, Vidar wrote: > > > > Hi, > > > > I'm running some processing on a Windows machine using the recent > Mannheim > > 5.0 alpha

Re: [tesseract-ocr] Combining output from multiple jobs into one hOCR file

2021-02-04 Thread Merlijn B.W. Wajer
Hi Vidar, On 04/02/2021 21:11, Vidar wrote: > > Hi, > > I'm running some processing on a Windows machine using the recent Mannheim > 5.0 alpha builds, outputting to hOCR. When I run it on a job with a few > hundred pages, the CPU usage constantly hovers around 10% (1 thread), and > memory/GPU

[tesseract-ocr] Combining output from multiple jobs into one hOCR file

2021-02-04 Thread Vidar
Hi, I'm running some processing on a Windows machine using the recent Mannheim 5.0 alpha builds, outputting to hOCR. When I run it on a job with a few hundred pages, the CPU usage constantly hovers around 10% (1 thread), and memory/GPU usage doesn't seem to change much. Now, while I could spl