Thank you for your responses. Regarding my question and referring to the official documentation at Tesseract Documentation, the generated .box files have the *same coordinates* for every character because they use line-level boxes instead of character-level boxes. Also, I have a couple of concerns: 1) I'm working on license plate recognition and have 80K car plate images with noise. Most of the .box files generated by lstmbox are incorrect compared with ground truth text. Manually editing all these box files will be very time-consuming. Do you have any suggestions to shorten the time? 2) Do I need to manually check all 80K box files to ensure the accuracy of my training data?
On Wednesday, November 1, 2023 at 9:21:36 PM UTC+9 desal...@gmail.com wrote: > "Please note that box files generated using makebox config file are OK > for training legacy models but not for LSTM training.". Makebox is the > tool included inside tesseract to generate box files. It looks like that > was used for the legacy model. For the current model, text2image is the way > to do it. > > On Wednesday, November 1, 2023 at 3:02:28 PM UTC+3 Des Bw wrote: > >> >> I don't know what you are trying to do. I am not familiar with this >> method of box generation. But, I think the command you are running is >> supposed to generate them with the same coordinates. Look at the example >> here: https://tesseract-ocr.github.io/tessdoc/tess4/Make-Box-Files.html >> >> >> On Wednesday, November 1, 2023 at 2:57:46 PM UTC+3 elvi...@gmail.com >> wrote: >> >>> On 1 Nov 2023 at 11:51:27 AM, TRAN TRONG KHANH[학생](대학원 컴퓨터공학과) < >>> khanht...@khu.ac.kr> wrote: >>> >>>> >>> Are you trying to generate box files from the images (tif files)? >>> >> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/66783bad-e184-484f-a2aa-34648e6a4d75n%40googlegroups.com.