Re: Improving accuracy on Tesseract 3.0 (also Issue 265)

Zdenko Podobný Sun, 01 Aug 2010 09:44:21 -0700

Dňa 28.07.2010 17:02, Jimmy O'Regan wrote / napísal(a):
>
>>>> I grepped the code and it seems to be looking for something called
>>>> LANG.user-words, but that didn't seem to do anything -- I got the same
>>>> garbled text when I ran Tesseract 3 the second time.
>>>>         
>> Turns out T3 doesn't even access $LANG.user-words. I suspect it's looking
>> for it in the traineddata file...
>>
>>     
> Hmm... probably... which is quite a stupid thing to do, really, but I
> presume nobody in Google actually uses this, so it's probably quite
> neglected.
>
> I'm toying with the idea of adding support for an actual *user* list -
> i.e., that tesseract would check $HOME/.tesseract/lang.user-words -
> because assuming a single user system that the user has full control over is 
> still a braindamaged assumption.
>   
just idea: maybe this should be handled by environment variable. If I
set up:
export TESSDATA_PREFIX=~/.
tesseract will try to get ALL files from "$HOME/.tessdata"


Problem is that if tesseract did not find all needed files (e.g. 
eng.traineddata) in $TESSDATA_PREFIX it stops... (e.g. it will not look at 
"standard" installation directories like /usr/share/tessdata or 
/usr/local/share/tessdata).

I tried to use "export TESSDATA_PREFIX=~/.:/usr/local/share/tessdata" but it 
did not worked (tesseract tried to open file 
"/home/zdeno/.:/usr/local/share/tessdatatessdata/eng.traineddata" that is not 
correct)


Zd.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-...@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Re: Improving accuracy on Tesseract 3.0 (also Issue 265)

Reply via email to