Re: [tesseract-ocr] Extender letter recognized as underline for arabic text

2023-12-04 Thread Sifdin Nahhas
the extended character it's not in the ara.punc On Monday, November 20, 2023 at 3:44:52 PM UTC+1 elvi...@gmail.com wrote: > Can you try to remove it from the list of punctuations? > > To do that, you need to extract the components of the traineddata file, > edit the ara.punc file, and then rec

[tesseract-ocr] Re: jTessBoxEditor

2023-12-04 Thread Simon
I just saw the second picture I attached should be the following. In that one you can see the .box file information. [image: GoogleGroupsQuestion2.png] Simon schrieb am Sonntag, 3. Dezember 2023 um 10:38:51 UTC+1: > Hello everybody, > > is anyone familar with jTessBoxEditor. > I am currently g

[tesseract-ocr] Newbie question: Bad results on a Korean case

2023-12-04 Thread 'Nick S.' via tesseract-ocr
[image: KoreanOCRExample.PNG] Hi all, as a Tesseract/OCR newbie, I am currently working on deepening my understanding of the Tesseract foundations and OCR basics. This is why I came across the following strange results: When scanning some Korean Wikipedia pages (related to mathematics), Tesser