Just a nudge to see if there is any feedback on this question. Many thanks
Iain On Saturday, November 16, 2024 at 11:51:04 AM UTC Iain Downs wrote: > I'm writing a program to convert tiff images of books to ePubs. I have a > bunch (4000) book images which I converted in the 2000's with FineReader. > I want to improve the results and am too cheap to buy an updated program. > Plus, it's fun. > > Tesseract looks like it gives equal or better results than my original > system, however, the current incarnation does not support bold or italic, > which is important, though arguably not essential. > > The last I could find on this was from 2022 > <https://github.com/sirfz/tesserocr/issues/292>. A bit more informative > is this <https://github.com/tesseract-ocr/tesseract/issues/1074>. > > Basically, the latter says that the information for bold and italic (at > least) is available at some level in the code hierarchy, but would need > some work to expose (from theraysmith) - or at least this is how I > interpreted it. There was some indication that this would be desirable, > but I'm not sure it's on your roadmap. > > If it is, do you know when? If not, could it be added? If no to that, is > it possible to run both Version 3 and Version 5 recognition? > > My concern with the latter is that it appears that version 3 paths are > explicitly commented out in V5 though a #define. This #define seems to be > generated early in the compilation process by some Linuxy tools that are > well beyond my (limited) Linux experience. How could I generate a library > / set of dlls which would allow me to run both recognisers (one after the > other probably and then pick the 'best' result)? > > Hope this makes sense, and thanks in advance > > Iain > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion visit https://groups.google.com/d/msgid/tesseract-ocr/ccc07d8c-8560-47d5-a45c-4fded122006an%40googlegroups.com.