On Apr 19, 2009, at 5:16 AM, Debayan Banerjee wrote:

> I take the liberty of top posting since i copied the mail's contents
> from archives and bottom posting will require messing with the text
> below to much. In reply to this particular line:
> " It takes the old "matra removal" approach, and he's
> facing the same problems I did (notice in his first example that গ  
> is
> segmented into 2 parts, and শু is not)."
>
> Kindly see 
> http://picasaweb.google.com/debayanin/TesseractIndicOCR#5325782929614608690 
> .
>
> Below is the original conversation.
>
> On 7/2/08, Golam Mortuza Hossain <[EMAIL PROTECTED]> wrote:
>> On Wed, Jul 2, 2008 at 9:32 AM, Sayamindu Dasgupta <[EMAIL  
>> PROTECTED]>
>>
>>> This guy seems to be doing some interesting progress for a Bangla  
>>> OCR
>>> - or more precisely, enabling Bangla in Tesseract.
>>> http://debayanin.googlepages.com/hackingtesseract
>
> Cool. I had some interaction with the tesseract/ocropus folks, and it
> sounded like a good base. It's nice that someone's actually doing
> something with it. It takes the old "matra removal" approach, and he's
> facing the same problems I did (notice in his first example that গ  
> is
> segmented into 2 parts, and শু is not). On the other hand, having
> something that works even partly is a good start.
>
>> Yes, it looks definitely interesting.
>>
>>> Looks like he needs some more training data - can we provide him  
>>> with some
>> ?
>>
>> If I remember correctly, there was a sample file for testing  
>> completeness
>> of Bengali fonts. Since it has all letters and conjuncts typed-in,  
>> the
>> file might
>> be useful for training Tesseract as well .
>>
>> Deepayan should be able to give some input here. He has working  
>> experience
>> with R and may have some training sample as well.
>
> Well, we have a bunch of unicode documents. For some of them, I have
> print versions too, and can scan them if needed. A simpler approach
> would be to render them using different fonts and take screenshots.
>
> Apparently he also needs some box-files, whatever they are, which need
> to be produced using tesseract. I haven't installed tesseract yet, and
> will try, but let me know if anyone else manages.
>
> -Deepayan
>
>
>

Dear all,

  I was working with OCR for my university. I took most of the idea  
from bocra.sourceforge.net

It is written using graphicsmagick library & C++.  Any suggestion from  
you about matching alphabet.


Here is my progress....
http://picasaweb.google.com/salahuddin66/OCR#


regards
salahuddin

salahuddin66.blogspot.com


>
> -- 
> Be Intelligent, Use GNU/Linux
>
> http://debayanin.googlepages.com/
> http://debayan.wordpress.com
> http://lug.nitdgp.ac.in
>
> ------------------------------------------------------------------------------
> Stay on top of everything new and different, both inside and
> around Java (TM) technology - register by April 22, and save
> $200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco.
> 300 plus technical and hands-on sessions. Register today.
> Use priority code J9JMT32. http://p.sf.net/sfu/p
> _______________________________________________
> Bengalinux-core mailing list
> Bengalinux-core@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bengalinux-core


------------------------------------------------------------------------------
Stay on top of everything new and different, both inside and 
around Java (TM) technology - register by April 22, and save
$200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco.
300 plus technical and hands-on sessions. Register today. 
Use priority code J9JMT32. http://p.sf.net/sfu/p
_______________________________________________
Bengalinux-core mailing list
Bengalinux-core@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bengalinux-core

Reply via email to