Aurebesh seems to be different symbols mapped to the English alphabet rather than a new font for English, hence training would need to be for a new language rather than just fine-tuning.
On Sat, Apr 1, 2023, 10:47 Ali Abedian <ali8abed...@gmail.com> wrote: > Hello, > > Thank you for providing the references, but I'm still a bit confused. I > have trained tesseract using the same method as described in > https://github.com/tesseract-ocr/tesstrain/blob/main/ocrd-testset.zip, > with 100,000 sentences and a maximum iteration of 10,000. However, it still > cannot recognize a 6-letter word that I input from a TIF file using the > same font and settings. I have tried using fewer iterations, such as 1,000, > as well as more iterations, such as 20,000 and 100,000, but still no > results. Additionally, the BCER (Character Error Rate) doesn't seem to > change significantly with largere iterations, remaining at 3.56%. I'm > unsure of what I'm doing wrong or what I should do next, but any help would > be appreciated. > > Thank you. > On Saturday, April 1, 2023 at 12:05:36 a.m. UTC-7 zdenop wrote: > >> Please have a look at https://github.com/tesseract-ocr/tesstrain >> (especially >> https://github.com/tesseract-ocr/tesstrain/blob/main/ocrd-testset.zip) >> >> >> Zdenko >> >> >> pi 31. 3. 2023 o 7:03 Ali Abedian <ali8a...@gmail.com> napísal(a): >> >>> Hey everyone! I'm currently working on a personal project where I'm >>> training a new font for the English language using Tesseract. The font is >>> called Aurebesh and it's from the Star Wars universe. Basically, each >>> letter in Aurebesh corresponds to a letter in English. I've collected close >>> to 100,000 images and their corresponding translations, but I'm not sure >>> how many iterations I should run for a file of this size. I've tried >>> training with only 100 images, but it didn't work out. Can anyone advise me >>> on how many iterations I should run and whether it's even possible to train >>> a new font like this? >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to tesseract-oc...@googlegroups.com. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/1b20c2e0-76b2-41a0-bc9f-e1a16b9c67a2n%40googlegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/1b20c2e0-76b2-41a0-bc9f-e1a16b9c67a2n%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/2cab8f1d-b81e-4926-a21b-8065a4178d04n%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/2cab8f1d-b81e-4926-a21b-8065a4178d04n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUQWE6_ifz1ShNNGTQPQDmAb%2BtpPUQDJZNrpGMHvpdyJQ%40mail.gmail.com.