I get the best result with PBM images, i.e b&w. Doing that way, there would 
be no half-tones… (Don't know if this could help…)



Il giorno lunedì 13 marzo 2023 alle 23:17:23 UTC+1 da...@mranderson.co.nz 
ha scritto:

> I'm preparing text images (JPG) for Tesseract OCR conversion to text files 
> (TXT) I note that it is important to resize my image docs so that capital 
> letters are about 30-32 pixels in height. See Optimal image resolution 
> (dpi/ppi) for Tesseract 4.0.0 and eng.traineddata? 
> <https://groups.google.com/g/tesseract-ocr/c/Wdh_JJwnw94/m/24JHDYQbBQAJ?pli=1>
>
> I am using the Fiji/ImageJ to count capital letter height in pixels. From 
> https://imagej.nih.gov/ij/docs/pdfs/ImageJ.pdf 
>
>    - Open image file
>    - Enlarge text (zoom in) 
>    - Draw parallel vertical line beside vertical of number or straight 
>    edge letter
>    - Select Analyze>Set Scale (see image below)
>
> [image: fiji first.png]
>
> How to count pixels? Do I count the 'half pixels'? Where the pixel 'block' 
> is a half-tone? In other words, for my total count, do I estimate the true 
> height by including these half-tones. 
>
> Does anyone have a better procedure than this?
>
> My aim is to come up with a resizing ratio that I can apply to a large 
> collection of text files using a Python script. This being another step 
> along the way to preparing docs for Tesseract. 
>
> Any suggestions would be appreciated.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/2434d564-f2b5-40df-b180-8465bc9c5c42n%40googlegroups.com.

Reply via email to