Hi Jozef,
Thanks for great advise. After playing around I have found that advise
in this post
<http://www.imagemagick.org/discourse-server/viewtopic.php?f=1&t=18707#p72284>
works for me:
convert image.tiff -write MPR:source -morphology close rectangle:3x4
-clip-mask MPR:source -morphology erode:8 square +clip-mask
image-close.tif
Looks like I need to pipe images through ImageMagick but I can't decide
when it is really necessary. Perhaps I can run Tesseract twice: first
time to determine confidence level
<https://groups.google.com/forum/#%21msg/tesseract-ocr/w8DytLnMMH8/2tu1PUnp3LAJ>
and then make cleanup & recognize again (if needed).
Many thanks again for pointing the direction!
On 30.05.2013 18:50, jm wrote:
> Regarding open and close operators:
>
> First, look at
> http://tpgit.github.io/Leptonica/morph_8c_source.html
> pixDilate
> pixErode
> and for a real example see
> http://www.imagemagick.org/Usage/morphology/#erode
>
> I think that this code snippet says it all (open is erode and dilate)
>
> _PIX_ <http://tpgit.github.io/Leptonica/struct_pix.html> *
> _00405_
> <http://tpgit.github.io/Leptonica/morph_8c.html#a3b06b38bfe1f244a3aa9cf9568ab037d>
> _pixOpen_
> <http://tpgit.github.io/Leptonica/leptprotos_8h.html#af4c30de96cb73b128fb0253b79e397ae>(_PIX_
> <http://tpgit.github.io/Leptonica/struct_pix.html> *pixd,
> 00406 _PIX_
> <http://tpgit.github.io/Leptonica/struct_pix.html> *pixs,
> 00407 SEL *sel)
> 00408 {
> 00409 _PIX_ <http://tpgit.github.io/Leptonica/struct_pix.html> *pixt;
> 00410
> 00411 _PROCNAME_
> <http://tpgit.github.io/Leptonica/environ_8h.html#a1a16952819bcbc526c998e3ac86f2e78>("pixOpen");
> 00412
> 00413 if ((pixd = _processMorphArgs2_
> <http://tpgit.github.io/Leptonica/morph_8c.html#abc65437bd0a9599c317ae53f9c28909f>(pixd,
> pixs, sel)) == _NULL_
> <http://tpgit.github.io/Leptonica/environ_8h.html#a070d2ce7b6bb7e5c05602aa8c308d0c4>)
> 00414 return (_PIX_
> <http://tpgit.github.io/Leptonica/struct_pix.html> *)_ERROR_PTR_
> <http://tpgit.github.io/Leptonica/environ_8h.html#a38a8310a83847948c9ce6620983be468>("pixd
> not returned", procName, pixd);
> 00415
> 00416 if ((pixt = _pixErode_
> <http://tpgit.github.io/Leptonica/leptprotos_8h.html#a4ad2d04919aa0b65fab93f693c9323b1>(_NULL_
> <http://tpgit.github.io/Leptonica/environ_8h.html#a070d2ce7b6bb7e5c05602aa8c308d0c4>,
> pixs, sel)) == _NULL_
> <http://tpgit.github.io/Leptonica/environ_8h.html#a070d2ce7b6bb7e5c05602aa8c308d0c4>)
> 00417 return (_PIX_
> <http://tpgit.github.io/Leptonica/struct_pix.html> *)_ERROR_PTR_
> <http://tpgit.github.io/Leptonica/environ_8h.html#a38a8310a83847948c9ce6620983be468>("pixt
> not made", procName, pixd);
> 00418 _pixDilate_
> <http://tpgit.github.io/Leptonica/leptprotos_8h.html#adac7488ed26d7c269cd10d0ea9cc2c79>(pixd,
> pixt, sel);
> 00419 _pixDestroy_
> <http://tpgit.github.io/Leptonica/leptprotos_8h.html#a1b238e61b64e4e5c62f44849cddb9658>(&pixt);
> 00420
> 00421 return pixd;
> 00422 }
> Cheers,
> Jozef
>
>
>
> On Thursday, May 30, 2013 4:57:05 PM UTC+2, Dmitry Katsubo wrote:
>
> Many thanks for the information, Johannes!
>
> I have played with *textord_max_noise_size *and it turned out that
> noise in my particular case is not removed even when I set
> *textord_max_noise_size=45*. Above that value almost all other
> characters have been considered as noise.
>
> However *textord_heavy_nr=1* worked well for me. It looks like
> this very setting works on its own and does not depend on values
> for other settings mentioned.
>
> On 30.05.2013 9:11, Johannes Richter wrote:
>> The parameter i meant is "*textord_max_noise_size*" and it
>> defines the maximum size of noise in pixels. You could also try
>> the one you have found in the list "*textord_heavy_nr*".
>>
>> "Opening and Closing Operators" are morphological operators. I
>> searched Wikipedia fo a nice example, but the english version is
>> only a stub.
>> In your case the opening-operation is the way to go. Many image
>> processing frameworks include morphological operations. If your
>> software does not provide a opening operator look for *erosion*
>> and *dilation*.(opening is just a erosion followed by dilation)
>>
>> I made a quick example in gimp.
>> the picture "before.png" shows my object (the circle) with some
>> noise i want to remove. I executed the erosion operation on this
>> picture with a proper filter mask. The result is in picture
>> "after erosion.png". The circle has changed in size (and shape).
>> As last step i executed the dilation operation in gimp. The
>> resulting image "after dilation.png" shows only the circle.
>>
>> Depending on your objects and noise you need to choose a proper
>> filter mask for this operations. This operation will change the
>> shape of your characters slightly.
>
--
With best regards,
Dmitry
--
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en
---
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.