Re: [tesseract-ocr] Re: Trouble reading text "in between lines"

2019-06-26 Thread Lorenzo Bolzani
I was referring to the image sample you posted where there are three columns. Regarding the new diagrams, I do not know what informations you need and if all the diagrams have the same layout. Anyway I would first cut individual boxes from the bottom right table or at least three columns. I would

Re: [tesseract-ocr] Re: Trouble reading text "in between lines"

2019-06-26 Thread 'Hu gePanic' via tesseract-ocr
Maybe I don't understand your idea of cutting. Here is a samle of a "similar" drawing (i cant upload the original drawings). http://zalaco.com/wp-content/uploads/2013/10/zalaco-drawing-sample-1024x721.jpg How wou

Re: [tesseract-ocr] Re: Trouble reading text "in between lines"

2019-06-26 Thread Lorenzo Bolzani
Cut the image in half with gimp and try to see if it is the case. Each image will be smaller so, if you discard empty white borders it could even be faster. I do it in my application with no problems. I do not understand why you need overlap. Maybe you cannot cut the image in the way I would expect

Re: [tesseract-ocr] Re: Trouble reading text "in between lines"

2019-06-26 Thread 'Hu gePanic' via tesseract-ocr
Hi, sure I can cut the image and process all pieces. In the complete image there are many of these blocks as given in the example. I would have to process many slices with some overlap. I assume this would cost a lot of time... Am Mittwoch, 26. Juni 2019 15:22:32 UTC+2 schrieb Lorenzo Blz: >

Re: [tesseract-ocr] Re: Trouble reading text "in between lines"

2019-06-26 Thread Lorenzo Bolzani
Can you cut the image vertically in a simple way? Lorenzo Il giorno mer 26 giu 2019 alle ore 11:08 'Hu gePanic' via tesseract-ocr < tesseract-ocr@googlegroups.com> ha scritto: > I have "sort of" solved the problem. > > I run tesseract 2 times. > After the first run I delete all the text already

[tesseract-ocr] Re: Trouble reading text "in between lines"

2019-06-26 Thread 'Hu gePanic' via tesseract-ocr
I have "sort of" solved the problem. I run tesseract 2 times. After the first run I delete all the text already found by overwriting all positions with known text. Then on the 2nd run tesseract finds only the "in between lines" text. -- You received this message because you are subscribed to t