I have a scanned documents in which few pages are scanned and oriented wrongly
90, 180, 270
But --psm 0 flag on tesseract to give orientation, Opencv Hough lines, Opencv
Bounding box are not working.
Could any one of you please suggest a method to detect orientation correctly
and rotate to make
I have made a wiki page for using user_patterns with API. Please see
https://github.com/tesseract-ocr/tesseract/wiki/APIExample-user_patterns
You can try similarly for user_words.
On Thu, Jul 4, 2019 at 4:40 PM Jochen Naumann
wrote:
> user_words_file also does not work, the file is not loaded
Can anyone explain Joined and BROKEN symbols explanation in the
autogenerated xxx.unicharset files ?
As when training started, for some short time Joined symbol appears in the
output log, then disappeares.
But: after training finished, sometimes it (Joined) appeares even in the
recognized outp
Also, there're some changes in results depending in recognition mode. All
said was for PSM_SINGLE_CHAR mode. libtesseract-4.dll has bug for this
mode, at least it produces some debug info that should not appear.
After I changed to PSM_SINGLE_LINE, coordinates returned are much better.
--
You re
Thanks, but as I see the problem is active since 2017, and no clear
solution is present.
Now I tried to get recognition result via iterator API, and that's really a
strange thing.
All the characted are listed, and those that are "duplicates" share the
same coordinates as the correct ones, but h
Hi,
Building *tesseract* for Android, I have a question about a src snippet.
In the file *src/ccutil/fileio.cpp* there's a method *DeleteMatchingFiles()*
.
Scanning src, I only find this method in *src/training/pango_font_info.cpp*
in *HardInitFontConfig().*
Is there a way to execute a bin, as *
This is an open issue - see
https://github.com/tesseract-ocr/tesseract/issues/1060
and other related issues
On Thu, Jul 4, 2019 at 5:33 PM Abstract wrote:
> Some more information on my trained data:
> real data:12345678903542331100244117021234567
> recognized: 1234567890354233141110024411702
Some more information on my trained data:
real data:12345678903542331100244117021234567
recognized: 12345678903542331411100244117021234567
(see, instead of 11 were reported several chars 14111 - in this case it
does not like letter "4")
another pair real/recognized:
2345678905423342392200712
Thank you, I have uploaded an issue there ...
On Wednesday, July 3, 2019 at 6:27:06 PM UTC+3, shree wrote:
>
> Bugs are to reported in github under issues. If it is specific to windows
> and uses prebuilt binaries, please report in repo of the source.
>
> On Wed, 3 Jul 2019, 20:26 _ Flaviu, >
>
user_words_file also does not work, the file is not loaded ( checked with
file monitor).
Am Mi., 3. Juli 2019 um 20:31 Uhr schrieb Zdenko Podobny :
> If command line work for you that most easy way is to follow tesseract
> executable code[1]:
> IMO you need to use variable user_words_file; AFA
See related discussion at
https://github.com/tesseract-ocr/tesseract/issues/2532
On Monday, July 1, 2019 at 3:51:15 PM UTC+5:30, Jochen Naumann wrote:
>
> Thanks, this seems to be what I need. But how do I set this
> lstm_choice_mode with the api?
>
> Am Montag, 1. Juli 2019 11:55:02 UTC+2 schri
11 matches
Mail list logo