[tesseract-ocr] Training tesseract for image with complex layout

Mozhi Wed, 29 Sep 2021 02:28:49 -0700

Hi, 
I would like to finetune/train tesseract for scanned document similar. For 
example the funsd data set here : https://guillaumejaume.github.io/FUNSD/
so far what I find out there is git repo tesstrain 
https://github.com/tesseract-ocr/tesstrain 
.
I looked at the examples provided for this repo in internet, it mentioned 
that, your training samples should be only one line of text like below 
photo:



[image: Screen Shot 2021-09-29 at 9.35.27 AM.png]

But I would like to give data like Forms in FUNSD data set and json files 
contain boxes and their text. How to do end-2-end training for tesseract, 
including the detection phase and line finding to find the boxes around 
text. 

Thanks in advance!

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/be4b6cb4-afe1-49d6-ac76-72ec7e198573n%40googlegroups.com.

[tesseract-ocr] Training tesseract for image with complex layout

Reply via email to