Hi, I would like to finetune/train tesseract for scanned document similar. For example the funsd data set here : https://guillaumejaume.github.io/FUNSD/ so far what I find out there is git repo tesstrain https://github.com/tesseract-ocr/tesstrain . I looked at the examples provided for this repo in internet, it mentioned that, your training samples should be only one line of text like below photo:
[image: Screen Shot 2021-09-29 at 9.35.27 AM.png] But I would like to give data like Forms in FUNSD data set and json files contain boxes and their text. How to do end-2-end training for tesseract, including the detection phase and line finding to find the boxes around text. Thanks in advance! -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/be4b6cb4-afe1-49d6-ac76-72ec7e198573n%40googlegroups.com.