[tesseract-ocr] How do I train vertical japanese?

2020-12-27 Thread Kamui 7
Are there any tutorials on this? I can't find any documentation regarding this. Tesstrain doesn't take jpn_vert as a language so I do not know what to do. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group a

[tesseract-ocr] Numerous different bugs while training jpn

2021-01-07 Thread Kamui 7
I have a script to train tesseract and I ran it on Arch Linux, Debian, and even a docker container and they all produce the same errors. I checked to make sure the script is correct as well. Bug 1: This happens when tesstrain runs text2image. The max pages parameter does not work at all. It en

Re: [tesseract-ocr] Numerous different bugs while training jpn

2021-01-07 Thread Kamui 7
rsday, January 7, 2021 at 11:01:55 AM UTC-6 shree wrote: > Old versions of tesstrain.sh used to limit training to 3 pages. Looks like > you may have an old version in the path somewhere. > > On Thu, Jan 7, 2021 at 10:17 PM Kamui 7 wrote: > >> I have a script to train tesseract

Re: [tesseract-ocr] Numerous different bugs while training jpn

2021-01-07 Thread Kamui 7
; seems to have samples of both languages. > > On Thu, Jan 7, 2021, 22:40 Kamui 7 wrote: > >> I did a find command in the root directory and searched for the tesstrain >> script. It could only find the script that i pulled from the latest >> tesseract git repo. My tra

Re: [tesseract-ocr] Numerous different bugs while training jpn

2021-01-07 Thread Kamui 7
pages. You need to use a larger text if you want more pages. > > Also check that your fonts support both English and Japanese as the text > seems to have samples of both languages. > > On Thu, Jan 7, 2021, 22:40 Kamui 7 wrote: > >> I did a find command in the root directory

Re: [tesseract-ocr] Numerous different bugs while training jpn

2021-01-09 Thread Kamui 7
could be if the characters in training text are not in the > unicharset. > > On Fri, Jan 8, 2021, 00:46 Kamui 7 wrote: > >> Looks like that fixed bug #1. Now it is able to successfully create 400 >> pages. Do you have any ideas as to why the other 2 errors are occurring? &

Re: [tesseract-ocr] Numerous different bugs while training jpn

2021-01-12 Thread Kamui 7
>> own unicharset file. >> On Friday, January 8, 2021 at 12:58:27 AM UTC-6 shree wrote: >> >>> Are any of these vertical fonts? >>> >>> Encoding errors could be if the characters in training text are not in >>> the unicharset. >>

Re: [tesseract-ocr] Numerous different bugs while training jpn

2021-01-13 Thread Kamui 7
training text, because those are the samples > that will be used for training. > > Why do you want to use a different unicharset? > > > On Tue, Jan 12, 2021, 23:47 Kamui 7 wrote: > >> >> >> Great! The PR that you submitted fixed issue #3. All that's left