Is it best to train a new language? On Saturday, April 1, 2023 at 7:54:30 a.m. UTC-7 shree wrote:
> Aurebesh seems to be different symbols mapped to the English alphabet > rather than a new font for English, hence training would need to be for a > new language rather than just fine-tuning. > > On Sat, Apr 1, 2023, 10:47 Ali Abedian <[email protected]> wrote: > >> Hello, >> >> Thank you for providing the references, but I'm still a bit confused. I >> have trained tesseract using the same method as described in >> https://github.com/tesseract-ocr/tesstrain/blob/main/ocrd-testset.zip, >> with 100,000 sentences and a maximum iteration of 10,000. However, it still >> cannot recognize a 6-letter word that I input from a TIF file using the >> same font and settings. I have tried using fewer iterations, such as 1,000, >> as well as more iterations, such as 20,000 and 100,000, but still no >> results. Additionally, the BCER (Character Error Rate) doesn't seem to >> change significantly with largere iterations, remaining at 3.56%. I'm >> unsure of what I'm doing wrong or what I should do next, but any help would >> be appreciated. >> >> Thank you. >> On Saturday, April 1, 2023 at 12:05:36 a.m. UTC-7 zdenop wrote: >> >>> Please have a look at https://github.com/tesseract-ocr/tesstrain >>> (especially >>> https://github.com/tesseract-ocr/tesstrain/blob/main/ocrd-testset.zip) >>> >>> >>> Zdenko >>> >>> >>> pi 31. 3. 2023 o 7:03 Ali Abedian <[email protected]> napísal(a): >>> >>>> Hey everyone! I'm currently working on a personal project where I'm >>>> training a new font for the English language using Tesseract. The font is >>>> called Aurebesh and it's from the Star Wars universe. Basically, each >>>> letter in Aurebesh corresponds to a letter in English. I've collected >>>> close >>>> to 100,000 images and their corresponding translations, but I'm not sure >>>> how many iterations I should run for a file of this size. I've tried >>>> training with only 100 images, but it didn't work out. Can anyone advise >>>> me >>>> on how many iterations I should run and whether it's even possible to >>>> train >>>> a new font like this? >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/tesseract-ocr/1b20c2e0-76b2-41a0-bc9f-e1a16b9c67a2n%40googlegroups.com >>>> >>>> <https://groups.google.com/d/msgid/tesseract-ocr/1b20c2e0-76b2-41a0-bc9f-e1a16b9c67a2n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> > To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/2cab8f1d-b81e-4926-a21b-8065a4178d04n%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/2cab8f1d-b81e-4926-a21b-8065a4178d04n%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/65de7c17-c593-4bba-ac92-4f7952f78509n%40googlegroups.com.

