Re: [tesseract-ocr] Tesseract training for New font/language

Ali Abedian Sat, 01 Apr 2023 07:57:03 -0700

Is it best to train a new language? 

On Saturday, April 1, 2023 at 7:54:30 a.m. UTC-7 shree wrote:


> Aurebesh seems to be different symbols mapped to the English alphabet 
> rather than a new font for English, hence training would need to be for a 
> new language rather than just fine-tuning.
>
> On Sat, Apr 1, 2023, 10:47 Ali Abedian <[email protected]> wrote:
>
>> Hello,
>>
>> Thank you for providing the references, but I'm still a bit confused. I 
>> have trained tesseract using the same method as described in 
>> https://github.com/tesseract-ocr/tesstrain/blob/main/ocrd-testset.zip, 
>> with 100,000 sentences and a maximum iteration of 10,000. However, it still 
>> cannot recognize a 6-letter word that I input from a TIF file using the 
>> same font and settings. I have tried using fewer iterations, such as 1,000, 
>> as well as more iterations, such as 20,000 and 100,000, but still no 
>> results. Additionally, the BCER (Character Error Rate) doesn't seem to 
>> change significantly with largere iterations, remaining at 3.56%. I'm 
>> unsure of what I'm doing wrong or what I should do next, but any help would 
>> be appreciated.
>>
>> Thank you.
>> On Saturday, April 1, 2023 at 12:05:36 a.m. UTC-7 zdenop wrote:
>>
>>> Please have a look  at https://github.com/tesseract-ocr/tesstrain 
>>> (especially 
>>> https://github.com/tesseract-ocr/tesstrain/blob/main/ocrd-testset.zip)
>>>
>>>
>>> Zdenko
>>>
>>>
>>> pi 31. 3. 2023 o 7:03 Ali Abedian <[email protected]> napísal(a):
>>>
>>>> Hey everyone! I'm currently working on a personal project where I'm 
>>>> training a new font for the English language using Tesseract. The font is 
>>>> called Aurebesh and it's from the Star Wars universe. Basically, each 
>>>> letter in Aurebesh corresponds to a letter in English. I've collected 
>>>> close 
>>>> to 100,000 images and their corresponding translations, but I'm not sure 
>>>> how many iterations I should run for a file of this size. I've tried 
>>>> training with only 100 images, but it didn't work out. Can anyone advise 
>>>> me 
>>>> on how many iterations I should run and whether it's even possible to 
>>>> train 
>>>> a new font like this?
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/tesseract-ocr/1b20c2e0-76b2-41a0-bc9f-e1a16b9c67a2n%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/1b20c2e0-76b2-41a0-bc9f-e1a16b9c67a2n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>>
> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/2cab8f1d-b81e-4926-a21b-8065a4178d04n%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/2cab8f1d-b81e-4926-a21b-8065a4178d04n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/65de7c17-c593-4bba-ac92-4f7952f78509n%40googlegroups.com.

Re: [tesseract-ocr] Tesseract training for New font/language

Reply via email to