Yes you can. This video is very good.
https://www.youtube.com/watch?v=SvhoBT-PnME&lc=UgyKAwYjRNAb0P45CYp4AaABAg

You should use the most recent ara.traineddata file from the tesseract
"best" repository as your basis
https://github.com/tesseract-ocr/tessdata_best

I found that training it further actually made the results worse. The
existing ARA file will probably already do what you need.


On Fri, 18 Apr 2025 at 09:12, Ishak DÖLEK <ishakdole...@gmail.com> wrote:

> Hello,
>
> I am writing to inquire about the possibility of training a Tesseract
> model using my custom dataset. This dataset consists of Arabic image lines
> paired with corresponding Latin-based text lines.
>
> Specifically, I have the following questions:
>
> Is it possible to train Tesseract with a dataset where the images contain
> right-to-left (RTL) Arabic script and the corresponding text lines are
> left-to-right (LTR) Latin-based text? I am sharing the attached example.
>
> If training with such a dataset is possible, are there any specific
> documents or tutorials available that outline the process? Any guidance on
> how to structure the training data and the training commands would be
> greatly appreciated.
>
> Thank you for your time and assistance. I look forward to your guidance on
> this matter.
>
>
>
> make LANG_TYPE=RTL MODEL_NAME=ara GROUND_TRUTH_DIR=data/ara-ground-truth
> PSM=13 TESSDATA=/tessdata EPOCHS=20 training
>
>
> Sincerely,
> Ishak Dölek
>
> --
> Dr. İshak Dölek
> Mina AR-GE, Kurucu Ortak
> ishakdole...@gmail.com <atakanh...@gmail.com>
> is...@osmanlica.com
> ishakdo...@subu.edu.tr <atakan.k...@istanbul.edu.tr>
> https://ishakdolek.github.io
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion visit
> https://groups.google.com/d/msgid/tesseract-ocr/CAA%3DdkubGBEpdCOHP0RBKXjgc3zSz%3DExhS-2PmhOWv2LFiXeH_w%40mail.gmail.com
> <https://groups.google.com/d/msgid/tesseract-ocr/CAA%3DdkubGBEpdCOHP0RBKXjgc3zSz%3DExhS-2PmhOWv2LFiXeH_w%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAN%2BihQTf2rYpbGPjN6ObOjpFN1rvbEAH2WH4-mC1s6X9cxUBNQ%40mail.gmail.com.

Reply via email to