I'm preparing text images (JPG) for Tesseract OCR conversion to text files (TXT) I note that it is important to resize my image docs so that capital letters are about 30-32 pixels in height. See Optimal image resolution (dpi/ppi) for Tesseract 4.0.0 and eng.traineddata? <https://groups.google.com/g/tesseract-ocr/c/Wdh_JJwnw94/m/24JHDYQbBQAJ?pli=1>
I am using the Fiji/ImageJ to count capital letter height in pixels. From https://imagej.nih.gov/ij/docs/pdfs/ImageJ.pdf - Open image file - Enlarge text (zoom in) - Draw parallel vertical line beside vertical of number or straight edge letter - Select Analyze>Set Scale (see image below) [image: fiji first.png] How to count pixels? Do I count the 'half pixels'? Where the pixel 'block' is a half-tone? In other words, for my total count, do I estimate the true height by including these half-tones. Does anyone have a better procedure than this? My aim is to come up with a resizing ratio that I can apply to a large collection of text files using a Python script. This being another step along the way to preparing docs for Tesseract. Any suggestions would be appreciated. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/7f69a40a-cff5-4619-be41-58c9026a8946n%40googlegroups.com.