Thank you for your input. I appreciate the PBM file type has its uses. But
my source material is JPG. And there are a lot of files!

On Wed, Mar 15, 2023 at 10:41 AM 'Isidore Paris' via tesseract-ocr <
tesseract-ocr@googlegroups.com> wrote:

> I get the best result with PBM images, i.e b&w. Doing that way, there
> would be no half-tones… (Don't know if this could help…)
>
>
>
> Il giorno lunedì 13 marzo 2023 alle 23:17:23 UTC+1 da...@mranderson.co.nz
> ha scritto:
>
>> I'm preparing text images (JPG) for Tesseract OCR conversion to text
>> files (TXT) I note that it is important to resize my image docs so that
>> capital letters are about 30-32 pixels in height. See Optimal image
>> resolution (dpi/ppi) for Tesseract 4.0.0 and eng.traineddata?
>> <https://groups.google.com/g/tesseract-ocr/c/Wdh_JJwnw94/m/24JHDYQbBQAJ?pli=1>
>>
>> I am using the Fiji/ImageJ to count capital letter height in pixels. From
>> https://imagej.nih.gov/ij/docs/pdfs/ImageJ.pdf
>>
>>    - Open image file
>>    - Enlarge text (zoom in)
>>    - Draw parallel vertical line beside vertical of number or straight
>>    edge letter
>>    - Select Analyze>Set Scale (see image below)
>>
>> [image: fiji first.png]
>>
>> How to count pixels? Do I count the 'half pixels'? Where the pixel
>> 'block' is a half-tone? In other words, for my total count, do I estimate
>> the true height by including these half-tones.
>>
>> Does anyone have a better procedure than this?
>>
>> My aim is to come up with a resizing ratio that I can apply to a large
>> collection of text files using a Python script. This being another step
>> along the way to preparing docs for Tesseract.
>>
>> Any suggestions would be appreciated.
>>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "tesseract-ocr" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/tesseract-ocr/bZh3j_i8MYU/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/2434d564-f2b5-40df-b180-8465bc9c5c42n%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/2434d564-f2b5-40df-b180-8465bc9c5c42n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAKu11d2ULRoxZ1O1b03msX6AZevROiNcGFK8VxM%2Bj%3DmGEm9q8w%40mail.gmail.com.

Reply via email to