Hi Jozef,

Thanks for great advise. After playing around I have found that advise
in this post
<http://www.imagemagick.org/discourse-server/viewtopic.php?f=1&t=18707#p72284>
works for me:

    convert image.tiff -write MPR:source -morphology close rectangle:3x4
    -clip-mask MPR:source -morphology erode:8 square +clip-mask
    image-close.tif

Looks like I need to pipe images through ImageMagick but I can't decide
when it is really necessary. Perhaps I can run Tesseract twice: first
time to determine confidence level
<https://groups.google.com/forum/#%21msg/tesseract-ocr/w8DytLnMMH8/2tu1PUnp3LAJ>
and then make cleanup & recognize again (if needed).

Many thanks again for pointing the direction!

On 30.05.2013 18:50, jm wrote:
> Regarding open and close operators:
>  
> First, look at
> http://tpgit.github.io/Leptonica/morph_8c_source.html
>   pixDilate
>   pixErode
> and for a real example see
> http://www.imagemagick.org/Usage/morphology/#erode
>  
> I think that this code snippet says it all (open is erode and dilate)
>  
>  _PIX_ <http://tpgit.github.io/Leptonica/struct_pix.html> *
> _00405_
> <http://tpgit.github.io/Leptonica/morph_8c.html#a3b06b38bfe1f244a3aa9cf9568ab037d>
> _pixOpen_
> <http://tpgit.github.io/Leptonica/leptprotos_8h.html#af4c30de96cb73b128fb0253b79e397ae>(_PIX_
> <http://tpgit.github.io/Leptonica/struct_pix.html>  *pixd,
> 00406         _PIX_
> <http://tpgit.github.io/Leptonica/struct_pix.html>  *pixs,
> 00407         SEL  *sel)
> 00408 {
> 00409 _PIX_ <http://tpgit.github.io/Leptonica/struct_pix.html>  *pixt;
> 00410
> 00411     _PROCNAME_
> <http://tpgit.github.io/Leptonica/environ_8h.html#a1a16952819bcbc526c998e3ac86f2e78>("pixOpen");
> 00412
> 00413     if ((pixd = _processMorphArgs2_
> <http://tpgit.github.io/Leptonica/morph_8c.html#abc65437bd0a9599c317ae53f9c28909f>(pixd,
> pixs, sel)) == _NULL_
> <http://tpgit.github.io/Leptonica/environ_8h.html#a070d2ce7b6bb7e5c05602aa8c308d0c4>)
> 00414         return (_PIX_
> <http://tpgit.github.io/Leptonica/struct_pix.html> *)_ERROR_PTR_
> <http://tpgit.github.io/Leptonica/environ_8h.html#a38a8310a83847948c9ce6620983be468>("pixd
> not returned", procName, pixd);
> 00415
> 00416     if ((pixt = _pixErode_
> <http://tpgit.github.io/Leptonica/leptprotos_8h.html#a4ad2d04919aa0b65fab93f693c9323b1>(_NULL_
> <http://tpgit.github.io/Leptonica/environ_8h.html#a070d2ce7b6bb7e5c05602aa8c308d0c4>,
> pixs, sel)) == _NULL_
> <http://tpgit.github.io/Leptonica/environ_8h.html#a070d2ce7b6bb7e5c05602aa8c308d0c4>)
> 00417         return (_PIX_
> <http://tpgit.github.io/Leptonica/struct_pix.html> *)_ERROR_PTR_
> <http://tpgit.github.io/Leptonica/environ_8h.html#a38a8310a83847948c9ce6620983be468>("pixt
> not made", procName, pixd);
> 00418     _pixDilate_
> <http://tpgit.github.io/Leptonica/leptprotos_8h.html#adac7488ed26d7c269cd10d0ea9cc2c79>(pixd,
> pixt, sel);
> 00419     _pixDestroy_
> <http://tpgit.github.io/Leptonica/leptprotos_8h.html#a1b238e61b64e4e5c62f44849cddb9658>(&pixt);
> 00420
> 00421     return pixd;
> 00422 }
> Cheers,
> Jozef
>  
>  
>
> On Thursday, May 30, 2013 4:57:05 PM UTC+2, Dmitry Katsubo wrote:
>
>     Many thanks for the information, Johannes!
>
>     I have played with *textord_max_noise_size *and it turned out that
>     noise in my particular case is not removed even when I set
>     *textord_max_noise_size=45*. Above that value almost all other
>     characters have been considered as noise.
>
>     However *textord_heavy_nr=1* worked well for me. It looks like
>     this very setting works on its own and does not depend on values
>     for other settings mentioned.
>
>     On 30.05.2013 9:11, Johannes Richter wrote:
>>     The parameter i meant is  "*textord_max_noise_size*" and it
>>     defines the maximum size of noise in pixels. You could also try
>>     the one you have found in the list "*textord_heavy_nr*".
>>
>>     "Opening and Closing Operators" are morphological operators. I
>>     searched Wikipedia fo a nice example, but the english version is
>>     only a stub.
>>     In your case the opening-operation is the way to go. Many image
>>     processing frameworks include morphological operations. If your
>>     software does not provide a opening operator look for *erosion*
>>     and *dilation*.(opening is just a erosion followed by dilation)
>>
>>     I made a quick example in gimp.
>>     the picture "before.png" shows my object (the circle) with some
>>     noise i want to remove. I executed the erosion operation on this
>>     picture with a proper filter mask. The result is in picture
>>     "after erosion.png". The circle has changed in size (and shape).
>>     As last step i executed the dilation operation in gimp. The
>>     resulting image "after dilation.png" shows only the circle.
>>
>>     Depending on your objects and noise you need to choose a proper
>>     filter mask for this operations. This operation will change the
>>     shape of your characters slightly.
>


-- 
With best regards,
Dmitry

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to