Just a nudge to see if there is any feedback on this question.  

Many thanks


Iain

On Saturday, November 16, 2024 at 11:51:04 AM UTC Iain Downs wrote:

> I'm writing a program to convert tiff images of books to ePubs.  I have a 
> bunch (4000) book images which I converted in the 2000's with FineReader.  
> I want to improve the results and am too cheap to buy an updated program.  
> Plus, it's fun.
>
> Tesseract looks like it gives equal or better results than my original 
> system, however, the current incarnation does not support bold or italic, 
> which is important, though arguably not essential.
>
> The last I could find on this was from 2022 
> <https://github.com/sirfz/tesserocr/issues/292>.  A bit more informative 
> is this <https://github.com/tesseract-ocr/tesseract/issues/1074>.
>
> Basically, the latter says that the information for bold and italic (at 
> least) is available at some level in the code hierarchy, but would need 
> some work to expose (from theraysmith) - or at least this is how I 
> interpreted it.  There was some indication that this would be desirable, 
> but I'm not sure it's on your roadmap.
>
> If it is, do you know when?  If not, could it be added?  If no to that, is 
> it possible to run both Version 3 and Version 5 recognition?
>
> My concern with the latter is that it appears that version 3 paths are 
> explicitly commented out in V5 though a #define.  This #define seems to be 
> generated early in the compilation process by some Linuxy tools that are 
> well beyond my (limited) Linux experience.  How could I generate a library 
> / set of dlls which would allow me to run both recognisers (one after the 
> other probably and then pick the 'best' result)?
>
> Hope this makes sense, and thanks in advance
>
> Iain
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion visit 
https://groups.google.com/d/msgid/tesseract-ocr/ccc07d8c-8560-47d5-a45c-4fded122006an%40googlegroups.com.

Reply via email to