I do not think you need to do training. Try to use Fraktur language data ( https://github.com/tesseract-ocr/tessdata_best/blob/main/script/Fraktur.traineddata) or frk.traineddata. E.g. tesseract "Screen Shot 2021-09-29 at 9.35.27 AM.png" - -l script/Fraktur
As far as I remember (but I can not find the link ;-) ) our German friends did quite complex training for this type of text/font. Zdenko st 29. 9. 2021 o 11:28 Mozhi <mozhgan.baya...@gmail.com> napĂsal(a): > Hi, > I would like to finetune/train tesseract for scanned document similar. For > example the funsd data set here : https://guillaumejaume.github.io/FUNSD/ > so far what I find out there is git repo tesstrain > https://github.com/tesseract-ocr/tesstrain . > I looked at the examples provided for this repo in internet, it mentioned > that, your training samples should be only one line of text like below > photo: > > > [image: Screen Shot 2021-09-29 at 9.35.27 AM.png] > > But I would like to give data like Forms in FUNSD data set and json files > contain boxes and their text. How to do end-2-end training for tesseract, > including the detection phase and line finding to find the boxes around > text. > > Thanks in advance! > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/be4b6cb4-afe1-49d6-ac76-72ec7e198573n%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/be4b6cb4-afe1-49d6-ac76-72ec7e198573n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8z%3DiOSo%2BpVbpo9OB-DvkJwvRcbo2x-Kv9zxV5Rge56fWA%40mail.gmail.com.