Thank you for your input. I appreciate the PBM file type has its uses. But my source material is JPG. And there are a lot of files!
On Wed, Mar 15, 2023 at 10:41 AM 'Isidore Paris' via tesseract-ocr < tesseract-ocr@googlegroups.com> wrote: > I get the best result with PBM images, i.e b&w. Doing that way, there > would be no half-tones… (Don't know if this could help…) > > > > Il giorno lunedì 13 marzo 2023 alle 23:17:23 UTC+1 da...@mranderson.co.nz > ha scritto: > >> I'm preparing text images (JPG) for Tesseract OCR conversion to text >> files (TXT) I note that it is important to resize my image docs so that >> capital letters are about 30-32 pixels in height. See Optimal image >> resolution (dpi/ppi) for Tesseract 4.0.0 and eng.traineddata? >> <https://groups.google.com/g/tesseract-ocr/c/Wdh_JJwnw94/m/24JHDYQbBQAJ?pli=1> >> >> I am using the Fiji/ImageJ to count capital letter height in pixels. From >> https://imagej.nih.gov/ij/docs/pdfs/ImageJ.pdf >> >> - Open image file >> - Enlarge text (zoom in) >> - Draw parallel vertical line beside vertical of number or straight >> edge letter >> - Select Analyze>Set Scale (see image below) >> >> [image: fiji first.png] >> >> How to count pixels? Do I count the 'half pixels'? Where the pixel >> 'block' is a half-tone? In other words, for my total count, do I estimate >> the true height by including these half-tones. >> >> Does anyone have a better procedure than this? >> >> My aim is to come up with a resizing ratio that I can apply to a large >> collection of text files using a Python script. This being another step >> along the way to preparing docs for Tesseract. >> >> Any suggestions would be appreciated. >> > -- > You received this message because you are subscribed to a topic in the > Google Groups "tesseract-ocr" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/tesseract-ocr/bZh3j_i8MYU/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/2434d564-f2b5-40df-b180-8465bc9c5c42n%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/2434d564-f2b5-40df-b180-8465bc9c5c42n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAKu11d2ULRoxZ1O1b03msX6AZevROiNcGFK8VxM%2Bj%3DmGEm9q8w%40mail.gmail.com.