[tesseract-ocr] Re: Can't encode transcription error with Sinhala language

2018-01-18 Thread Sumedhe Dissanayake
I am using the latest version (from the github). On Sunday, January 14, 2018 at 12:31:17 PM UTC+5:30, Sumedhe Dissanayake wrote:

Re: [tesseract-ocr] Re: Can't encode transcription error with Sinhala language

2018-01-18 Thread Sumedhe Dissanayake
I am using the latest version (from the github). On Thursday, January 18, 2018 at 12:14:12 PM UTC+5:30, shree wrote: > > What vers

Re: [tesseract-ocr] Re: Can't encode transcription error with Sinhala language

2018-01-18 Thread Sumedhe Dissanayake
I am using the latest version of the tesseract (from the github) On Thursday, January 18, 2018 at 12:14:12 PM UTC+5:30, shree wrot

Re: [tesseract-ocr] Re: Can't encode transcription error with Sinhala language

2018-01-18 Thread ShreeDevi Kumar
>I am using the latest version (from the github). Have you cloned the master branch of the tesseract-ocr repository and built it? Which commit number? If you are using https://github.com/tesseract-ocr/tesseract/releases/tag/4.00.00alpha , that will not work - that is from Nov 8, 2016. ShreeDevi

Re: [tesseract-ocr] Re: Can't encode transcription error with Sinhala language

2018-01-18 Thread ShreeDevi Kumar
The tags have NOT been updated, hence version showing 4.00.00alpha is meaningless, since there have been hundreds of commits to the code after that tag. Please build using latest commit from master branch, or use the ppa by Alex

Re: [tesseract-ocr] Re: Can't encode transcription error with Sinhala language

2018-01-18 Thread ShreeDevi Kumar
Also see https://github.com/tesseract-ocr/tesseract/search?q=Can%27t+encode+transcription+error&type=Issues -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tes

[tesseract-ocr] Criminal record JPGs: Improving image quality

2018-01-18 Thread brad . solomon . 1124
Hello--I am attempting to pull full text from a few hundred JPGs that contain information on death row executions hosted by the Texas Department of Criminal Justice (TDCJ). Here's one example: http://www.tdcj.state.tx.us/death_row/dr_info/ruizroland.jpg; another: http://www.tdcj.state.tx.us/de

[tesseract-ocr] How to extract character by character using tesseract and pass it to other engine for detection.

2018-01-18 Thread Hardik Sutaria
How do i extract one character at a time and pass it to other engine lets say CNN for OCR detection. Any help would be helpful. Thanks in advance Avinash Tiwari -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group an

[tesseract-ocr] The file 'z:\dev\interne\cs\tesseract-ocr-svn\dotnet\tessnet2.cpp' does not exist.

2018-01-18 Thread Prie Priehanto
Dear All Please help me...i have problem to runing OCR (_0CR)...my problems "The file 'z:\dev\interne\cs\tesseract-ocr-svn\dotnet\tessnet2.cpp' does not exist." -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group

[tesseract-ocr] Tesseract 4.00 for Android

2018-01-18 Thread Shariful Islam Foysal
How can we use Tesseract 4.00 LSTM in an android application? I am trying to use Tesseract for Bengali language where the latest version seems to be good enough. But tess-two does not have the 4.00 version. Thanks in advance! -- You received this message because you are subscribed to the Goog

[tesseract-ocr] Re: Criminal record JPGs: Improving image quality

2018-01-18 Thread brad . solomon . 1124
Update: I provided a more detailed walkthrough of my process thus far here: https://stackoverflow.com/questions/48327567/fixing-text-grainy-ness-with-opencv On Thursday, January 18, 2018 at 7:49:22 AM UTC-5, brad.sol...@gmail.com wrote: > > Hello--I am attempting to pull full text from a few hun

[tesseract-ocr] Re: Variables having no effect on C# Tesseract.net 4.0.0.6 wrapper

2018-01-18 Thread James Q
I think there are 9 DLL files that come with that package beginning "pvt.cppan.demo...". I experimented placing them in various locations along the execution path until the app worked. On my project they are now in .\lib\x64' and in '..\bin\Debug'. On Wednesday, January 10, 2018 at 1:07:28 PM U

[tesseract-ocr] Re: Criminal record JPGs: Improving image quality

2018-01-18 Thread James Q
In my experience Tesseract gives poor results with lines within the text. You can test this by manually whiting out the lines in a paint editor and retrying Tesseract with the new image. If the results are improved then you will likely need to do this programatically. This is not straightforward

[tesseract-ocr] Re: How to extract character by character using tesseract and pass it to other engine for detection.

2018-01-18 Thread James Q
I haven't done this myself, but I believe you should be able to generate a box file from the source image and use this to crop character subimages from that source image. Tesseract won't always get the boxes right though. On Thursday, January 18, 2018 at 12:49:22 PM UTC, Hardik Sutaria wrote: >

Re: [tesseract-ocr] Re: Can't encode transcription error with Sinhala language

2018-01-18 Thread ShreeDevi Kumar
Take a look at the lines that are getting the error and check that all characters are in the unicharset generated by training. The size of lstm-unicharset is different than the one generated by the training text, note the message shown at beginning of training. Check github issues, one of the mos

Re: [tesseract-ocr] Re: Can't encode transcription error with Sinhala language

2018-01-18 Thread ShreeDevi Kumar
I also noticed that you are using just one font for training, and also using the same font for evaluation. While probably unrelated to the errors you are getting, lstm training from scratch requires a large number of fonts and training text. You should try fine-tune training to modify current best