please share sample of image you're trying to recognize суббота, 1 апреля 2023 г. в 10:56:58 UTC-4, ali8a...@gmail.com:
> Is it best to train a new language? > > On Saturday, April 1, 2023 at 7:54:30 a.m. UTC-7 shree wrote: > >> Aurebesh seems to be different symbols mapped to the English alphabet >> rather than a new font for English, hence training would need to be for a >> new language rather than just fine-tuning. >> >> On Sat, Apr 1, 2023, 10:47 Ali Abedian <ali8a...@gmail.com> wrote: >> >>> Hello, >>> >>> Thank you for providing the references, but I'm still a bit confused. I >>> have trained tesseract using the same method as described in >>> https://github.com/tesseract-ocr/tesstrain/blob/main/ocrd-testset.zip, >>> with 100,000 sentences and a maximum iteration of 10,000. However, it still >>> cannot recognize a 6-letter word that I input from a TIF file using the >>> same font and settings. I have tried using fewer iterations, such as 1,000, >>> as well as more iterations, such as 20,000 and 100,000, but still no >>> results. Additionally, the BCER (Character Error Rate) doesn't seem to >>> change significantly with largere iterations, remaining at 3.56%. I'm >>> unsure of what I'm doing wrong or what I should do next, but any help would >>> be appreciated. >>> >>> Thank you. >>> On Saturday, April 1, 2023 at 12:05:36 a.m. UTC-7 zdenop wrote: >>> >>>> Please have a look at https://github.com/tesseract-ocr/tesstrain >>>> (especially >>>> https://github.com/tesseract-ocr/tesstrain/blob/main/ocrd-testset.zip) >>>> >>>> >>>> Zdenko >>>> >>>> >>>> pi 31. 3. 2023 o 7:03 Ali Abedian <ali8a...@gmail.com> napísal(a): >>>> >>>>> Hey everyone! I'm currently working on a personal project where I'm >>>>> training a new font for the English language using Tesseract. The font is >>>>> called Aurebesh and it's from the Star Wars universe. Basically, each >>>>> letter in Aurebesh corresponds to a letter in English. I've collected >>>>> close >>>>> to 100,000 images and their corresponding translations, but I'm not sure >>>>> how many iterations I should run for a file of this size. I've tried >>>>> training with only 100 images, but it didn't work out. Can anyone advise >>>>> me >>>>> on how many iterations I should run and whether it's even possible to >>>>> train >>>>> a new font like this? >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to tesseract-oc...@googlegroups.com. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/tesseract-ocr/1b20c2e0-76b2-41a0-bc9f-e1a16b9c67a2n%40googlegroups.com >>>>> >>>>> <https://groups.google.com/d/msgid/tesseract-ocr/1b20c2e0-76b2-41a0-bc9f-e1a16b9c67a2n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to tesseract-oc...@googlegroups.com. >>> >> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/2cab8f1d-b81e-4926-a21b-8065a4178d04n%40googlegroups.com >>> >>> <https://groups.google.com/d/msgid/tesseract-ocr/2cab8f1d-b81e-4926-a21b-8065a4178d04n%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/56e4beb3-644b-4be6-8c21-84e9856ec013n%40googlegroups.com.