I get the best result with PBM images, i.e b&w. Doing that way, there would be no half-tones… (Don't know if this could help…)
Il giorno lunedì 13 marzo 2023 alle 23:17:23 UTC+1 da...@mranderson.co.nz ha scritto: > I'm preparing text images (JPG) for Tesseract OCR conversion to text files > (TXT) I note that it is important to resize my image docs so that capital > letters are about 30-32 pixels in height. See Optimal image resolution > (dpi/ppi) for Tesseract 4.0.0 and eng.traineddata? > <https://groups.google.com/g/tesseract-ocr/c/Wdh_JJwnw94/m/24JHDYQbBQAJ?pli=1> > > I am using the Fiji/ImageJ to count capital letter height in pixels. From > https://imagej.nih.gov/ij/docs/pdfs/ImageJ.pdf > > - Open image file > - Enlarge text (zoom in) > - Draw parallel vertical line beside vertical of number or straight > edge letter > - Select Analyze>Set Scale (see image below) > > [image: fiji first.png] > > How to count pixels? Do I count the 'half pixels'? Where the pixel 'block' > is a half-tone? In other words, for my total count, do I estimate the true > height by including these half-tones. > > Does anyone have a better procedure than this? > > My aim is to come up with a resizing ratio that I can apply to a large > collection of text files using a Python script. This being another step > along the way to preparing docs for Tesseract. > > Any suggestions would be appreciated. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/2434d564-f2b5-40df-b180-8465bc9c5c42n%40googlegroups.com.