I do not think you need to do training.
Try to use Fraktur  language data (
https://github.com/tesseract-ocr/tessdata_best/blob/main/script/Fraktur.traineddata)
or frk.traineddata.
E.g.
tesseract "Screen Shot 2021-09-29 at 9.35.27 AM.png" - -l script/Fraktur

As far as I remember (but I can not find the link ;-) ) our German friends
did quite complex training for this type of text/font.


Zdenko


st 29. 9. 2021 o 11:28 Mozhi <mozhgan.baya...@gmail.com> napĂ­sal(a):

> Hi,
> I would like to finetune/train tesseract for scanned document similar. For
> example the funsd data set here : https://guillaumejaume.github.io/FUNSD/
> so far what I find out there is git repo tesstrain
> https://github.com/tesseract-ocr/tesstrain .
> I looked at the examples provided for this repo in internet, it mentioned
> that, your training samples should be only one line of text like below
> photo:
>
>
> [image: Screen Shot 2021-09-29 at 9.35.27 AM.png]
>
> But I would like to give data like Forms in FUNSD data set and json files
> contain boxes and their text. How to do end-2-end training for tesseract,
> including the detection phase and line finding to find the boxes around
> text.
>
> Thanks in advance!
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/be4b6cb4-afe1-49d6-ac76-72ec7e198573n%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/be4b6cb4-afe1-49d6-ac76-72ec7e198573n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8z%3DiOSo%2BpVbpo9OB-DvkJwvRcbo2x-Kv9zxV5Rge56fWA%40mail.gmail.com.

Reply via email to