Re: [tesseract-ocr] Trying to build Tesseract-ocr for Linux arm64 Ubuntu 16.04 on Nvidia Jetson TX2 development board

2019-01-25 Thread Zdenko Podobny
seems like you build leptonica with jbg (jbig?) library but this library is not used for linking with tesseract... Please provide all steps how you build leptonica and tesseract. Make use you use the latest code (leptonica and tesseract) - there are several fixes included... Zdenko pi 25. 1. 201

[tesseract-ocr] Trying to build Tesseract-ocr for Linux arm64 Ubuntu 16.04 on Nvidia Jetson TX2 development board

2019-01-25 Thread Kyle Clinton
Hello, I have been trying to build from source the 4.0.0 rc2 and the release code bases for Linux arm64. I have been trying to use it in conjunction with the javacpp project which has shell scripts for building both the Leptonica and Tesseract projects. I received a bit of help when I submit

Re: [tesseract-ocr] Box file layout for training tesseract4

2019-01-25 Thread Timothy Snyder
I have successfully trained Tesseract 4.0 using boxes that cover an entire line. I was similarly confused by the mismatch between the docs and that example. I haven't tested training with character-bounding boxes but I can confirm that textline boxes works fine. On Fri, Jan 25, 2019 at 5:56 AM Jul

Re: [tesseract-ocr] Evaluating Tesseract with new domain-specific documents

2019-01-25 Thread Shree Devi Kumar
also see https://github.com/impactcentre/ocrevalUAtion https://github.com/Shreeshrii/ocr-evaluation-tools https://github.com/tesseract-ocr/test/tree/master/unlvtests On Fri, Jan 25, 2019 at 5:17 PM Lorenzo Bolzani wrote: > This is an option if you want to consider missing/extra chars too: >

Re: [tesseract-ocr] Evaluating Tesseract with new domain-specific documents

2019-01-25 Thread Lorenzo Bolzani
This is an option if you want to consider missing/extra chars too: https://en.wikipedia.org/wiki/Levenshtein_distance You should be able to find implementations for most languages. Bye Lorenzo Il giorno ven 25 gen 2019 alle ore 11:56 Matthew Hodgskiss < matthew.hodgsk...@gmail.com> ha scrit

[tesseract-ocr] Re: use multi threads in tesseract

2019-01-25 Thread cosmin . n . moisii
I got the best results by using it in combination with GNU's parallel. export OMP_THREAD_LIMIT=1 ls -U | parallel 'tesseract -l ./{} ./../extraction/{}' -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop re

[tesseract-ocr] Box file layout for training tesseract4

2019-01-25 Thread Jul ius
Hi, I'm interested in training tesseract 4 with real data. As the documentation seems very poor and only captures training with font files, I have a general question. On: https://github.com/tesseract-ocr/tesseract/wiki/Making-Box-Files---4.0 It says that the boxes need to cover the whole line

[tesseract-ocr] Evaluating Tesseract with new domain-specific documents

2019-01-25 Thread Matthew Hodgskiss
Hi, I am interested in evaluating the performance of Tesseract against some domain specific test. I would like to perform a baseline using vanilla settings and then with some domain-specific user-words and user-patterns as documented here