Yes you can. This video is very good. https://www.youtube.com/watch?v=SvhoBT-PnME&lc=UgyKAwYjRNAb0P45CYp4AaABAg
You should use the most recent ara.traineddata file from the tesseract "best" repository as your basis https://github.com/tesseract-ocr/tessdata_best I found that training it further actually made the results worse. The existing ARA file will probably already do what you need. On Fri, 18 Apr 2025 at 09:12, Ishak DÖLEK <ishakdole...@gmail.com> wrote: > Hello, > > I am writing to inquire about the possibility of training a Tesseract > model using my custom dataset. This dataset consists of Arabic image lines > paired with corresponding Latin-based text lines. > > Specifically, I have the following questions: > > Is it possible to train Tesseract with a dataset where the images contain > right-to-left (RTL) Arabic script and the corresponding text lines are > left-to-right (LTR) Latin-based text? I am sharing the attached example. > > If training with such a dataset is possible, are there any specific > documents or tutorials available that outline the process? Any guidance on > how to structure the training data and the training commands would be > greatly appreciated. > > Thank you for your time and assistance. I look forward to your guidance on > this matter. > > > > make LANG_TYPE=RTL MODEL_NAME=ara GROUND_TRUTH_DIR=data/ara-ground-truth > PSM=13 TESSDATA=/tessdata EPOCHS=20 training > > > Sincerely, > Ishak Dölek > > -- > Dr. İshak Dölek > Mina AR-GE, Kurucu Ortak > ishakdole...@gmail.com <atakanh...@gmail.com> > is...@osmanlica.com > ishakdo...@subu.edu.tr <atakan.k...@istanbul.edu.tr> > https://ishakdolek.github.io > > > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion visit > https://groups.google.com/d/msgid/tesseract-ocr/CAA%3DdkubGBEpdCOHP0RBKXjgc3zSz%3DExhS-2PmhOWv2LFiXeH_w%40mail.gmail.com > <https://groups.google.com/d/msgid/tesseract-ocr/CAA%3DdkubGBEpdCOHP0RBKXjgc3zSz%3DExhS-2PmhOWv2LFiXeH_w%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion visit https://groups.google.com/d/msgid/tesseract-ocr/CAN%2BihQTf2rYpbGPjN6ObOjpFN1rvbEAH2WH4-mC1s6X9cxUBNQ%40mail.gmail.com.