date:20190131

[tesseract-ocr] Question about "Failed loading language"

2019-01-31 Thread nampyo hong

[image: tesseract.PNG] When I was running tesseract 3.0.4, there was no problem. I tried to install tesseract 4.0.0 in ubuntu 16.04 by building it from source, but there was an issue. I referenced https://bingrao.github.io/blog/post/2017/07/16/Install-Tesseract-4.0-in-ubuntun-16.04.html this

[tesseract-ocr] Re: convert a .tiff file to text file

2019-01-31 Thread George Varghese

Does not work in Tesseract 4. On Wednesday, January 30, 2019 at 11:34:42 AM UTC-8, George Varghese wrote: > > I am using tesseract v4 to convert .tiff file to text, only the first > page. The script - run from command line on Windows 2012 takes almost 8 > seconds to convert only the first pa

[tesseract-ocr] Tesseract for invoices

2019-01-31 Thread Shailesh Barve

Hey all, I have a requirement to process invoices and extract few data elements from it (e.g. invoice number, date, customer name, total amount). Incoming invoices are of different formats with relative positions of data elements. E.g. invoice number may be on right or to the left etc. How would

Re: [tesseract-ocr] Re: convert a .tiff file to text file

2019-01-31 Thread Zdenko Podobny

https://groups.google.com/forum/#!topic/tesseract-ocr/e3lqpY0pMpw https://groups.google.com/forum/#!topic/tesseract-ocr/UidqCx6OE0Q https://github.com/OpenGreekAndLatin/greek-dev/wiki/uzn-format https://github.com/jsoma/tesseract-uzn ... PS: I hope it works with tesseract 4 too ;-) I did not teste

[tesseract-ocr] Re: convert a .tiff file to text file

2019-01-31 Thread George Varghese

I am using tesseract v4.0.0.20181030 , leptonica -1.76.0 in short - using command line to convert a .tiff format to .txt file - no loop or any custom solution used. Yes the first 30 lines have the same location and I am specifying to OCR only my first page you mentioned about usage of unz f

Re: [tesseract-ocr] convert a .tiff file to text file

2019-01-31 Thread Zdenko Podobny

It is not clear for me what do you want to achieve - for me it looks it is case for custom solution with using tesseract API (C, C++, Python, maybe others). If you are can use only tesseract executable and your "30 lines" have the same location (or you know their location in advance), you can have

Re: [tesseract-ocr] Re: Tesseract not giving the desired output

2019-01-31 Thread Zdenko Podobny

see inline comments. st 30. 1. 2019 o 15:17 Lorenzo Bolzani napísal(a): > > I suppose this means that the image is always binarized, is this correct? > Yes > > Is there any way to avoid it? > Why? IMO OCR engines are running on binarized images see e.g. https://www.abbyy.com/en-eu/ocr-sdk/key-

Re: [tesseract-ocr] Should i use lstm training or TIFF/BOX file training?

2019-01-31 Thread Kristóf Horváth

Yes and as far as i know that requires different training than LSTM because in current state tesseract doesnt support that 2019. január 31., csütörtök 15:16:18 UTC+1 időpontban Timothy Snyder a következőt írta: > > When you refer to TIFF/BOX file training, do you mean manually creating > your o

Re: [tesseract-ocr] pytesseract: errors with recognized digits

2019-01-31 Thread Lorenzo Bolzani

Check the API: https://pypi.org/project/pytesseract/ There is an example under: Support for OpenCV image/NumPy array objects You may also try different languages (I had different results just on numbers). Il giorno gio 31 gen 2019 alle ore 15:18 Aaron Spell <8383...@gmail.com> ha scritto: >

Re: [tesseract-ocr] pytesseract: errors with recognized digits

2019-01-31 Thread Aaron Spell

Lorenzo Blz, thanks for your reply PSM 13 results are better than PSM 6 crop white border not give some results will try to train tesseract. *How can I send byte array to Tesseract from avoid saving and open picture to the hard disk?* среда, 30 января 2019 г., 17:25:26 UTC+3 пользователь

Re: [tesseract-ocr] Should i use lstm training or TIFF/BOX file training?

2019-01-31 Thread Timothy Snyder

When you refer to TIFF/BOX file training, do you mean manually creating your own boxfiles from your own set of images? Note that by default, lstmtraining does generate TIFF/BOX files from the fonts that you specify it to train on. With a little bit of wrangling, you can actually configure lstmtrai

[tesseract-ocr] How should i open langdata files? (for example: desired characters, eng.numbers, eng.unicharambigs)

2019-01-31 Thread Kristóf Horváth

What is the recommended format for opening and editing these kind of files? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegrou

Re: [tesseract-ocr] Training for a specific wordlist and font

2019-01-31 Thread Lorenzo Bolzani

You can have a look at ocrd-train https://github.com/OCR-D/ocrd-train You just have to prepare cropped tiff and txt files with the same name containing a single line of text. At the same time, if you already set up everything for the font based training, I'd give it a try (time permitting): you

[tesseract-ocr] Should i use lstm training or TIFF/BOX file training?

2019-01-31 Thread Kristóf Horváth

Im planning on training tesseract to recognise sensitive information (3 letter followed by numbers, the point is to find the 3 letters so in post processing we can lock that document because it has sensitive information). While sensitive information is high priority Accuracy is key too and som

Re: [tesseract-ocr] Evaluating Tesseract with new domain-specific documents

2019-01-31 Thread Matthew Hodgskiss

Thanks very much for the advice. The ocr-evaluation tools look particularly useful On Friday, 25 January 2019 12:04:13 UTC, shree wrote: > > also see > > https://github.com/impactcentre/ocrevalUAtion > > https://github.com/Shreeshrii/ocr-evaluation-tools > > https://github.com/tesseract-ocr/test/

Re: [tesseract-ocr] Training for a specific wordlist and font

2019-01-31 Thread Daniel Ferenc

Is there a guide somewhere how to setup training like this? How to pair the images and text, etc..? And thank you for the insight, it really is helpful. On Thursday, January 31, 2019 at 11:18:35 AM UTC+1, Lorenzo Blz wrote: > > Yes, generating text is faster and easier. > > But the real extracted

Re: [tesseract-ocr] Training for a specific wordlist and font

2019-01-31 Thread Lorenzo Bolzani

Yes, generating text is faster and easier. But the real extracted and cleaned text you are going to eventually recognize is going to be different from this, more or less depending on a lot of factors: - how similar your training font actually is - how good your cleanup will be (test this in advanc

Re: [tesseract-ocr] Re: How to do lstm training using box/tiff files?

2019-01-31 Thread Kristóf Horváth

Well you just repeated yourself and did not provide any new information. Like i said im using latest so what am i doing wrong? Also im not working in ubuntu but cygwin (not the same). 2019. január 31., csütörtök 10:57:45 UTC+1 időpontban 易鑫 a következőt írta: > > @Kristóf Horváth > Oh i see, bu

Re: [tesseract-ocr] Re: How to do lstm training using box/tiff files?

2019-01-31 Thread 易鑫

@Shree Devi Kumar: Thanks for your reply. lstm training using box/tiff files is NOT supported. Use tesstrain.sh with a UTF8 training_text and fonts. Maybe you are right.But I think using training_text will also generate tiff/box files in /tmp folder,so I think using box/tiff files and training_

Re: [tesseract-ocr] Re: How to do lstm training using box/tiff files?

2019-01-31 Thread 易鑫

@Kristóf Horváth Oh i see, but i dont know what you mean by this: you can use the master branch,latest code. I compiled the latest version on my cygwin setup so i dont know what you are refering to Sorry, I don't not say clearly.It means use master branch. I have successfully trained lstm model in

[tesseract-ocr] How to write .unicharambigs file?

2019-01-31 Thread 易鑫

Hello,everyone: I have trained a new lstm model in my project,but the result is not so good as I expected. I notice that some characters often mistake in my result. I learned that add some rules in .unicharambigs can reduce the mistakes? I extract the eng.traineddata and get the

Re: [tesseract-ocr] Re: How to do lstm training using box/tiff files?

2019-01-31 Thread Shree Devi Kumar

lstm training using box/tiff files is NOT supported. Use tesstrain.sh with a UTF8 training_text and fonts. On Thu, Jan 31, 2019 at 3:04 PM Kristóf Horváth wrote: > Oh i see, but i dont know what you mean by this: you can use the master > branch,latest code. I compiled the latest version on my c

Re: [tesseract-ocr] Re: How to do lstm training using box/tiff files?

2019-01-31 Thread Kristóf Horváth

Oh i see, but i dont know what you mean by this: you can use the master branch,latest code. I compiled the latest version on my cygwin setup so i dont know what you are refering to 2019. január 31., csütörtök 10:27:17 UTC+1 időpontban 易鑫 a következőt írta: > > Thanks for your reply. I have alrea

[tesseract-ocr] Re: How to do lstm training using box/tiff files?

2019-01-31 Thread Kristóf Horváth

EDIT: Environment - Tesseract Version: 4.0.0 - Platform: Win10 64 (cygwin) Current Behavior: Confusing af (pls fix wiki, as soon as i can make my demo work i will have to document it so im gonna send it so you guys will be able to have a wiki)Expected Behavior: run as intended copied f

Re: [tesseract-ocr] Re: How to do lstm training using box/tiff files?

2019-01-31 Thread 易鑫

Thanks for your reply. I have already tried to do lstm trianing on ubuntu successfully, but the result is not so good as I expected and I do not use my tiff/box file,so I want to add more sample,that's why I ask how to do lstm training using box/tiff file. as your mentioned: " > tesstrain.sh --f

[tesseract-ocr] Training tesseract tesstrain.sh exits with a warning

2019-01-31 Thread Kristóf Horváth

Currently I am trying to make sense of tesseract training and finially after days of diging finially managed to gain access to tesstrain.sh and lstmtraining commands in my cygwin. I was so happy because wiki is no help in setting up training for tesseract, but as soon as i wanted to start doi

[tesseract-ocr] Re: How to do lstm training using box/tiff files?

2019-01-31 Thread Kristóf Horváth

> > I feel you. Im currently trying to understand lstm training but wiki is >> weak as hell so im doing try and errorr blindly. So far I managed to setup >> tesseract training on cygwin so i have access to tesstrain and lstmtraining >> command. Achiving this should be your first step then i sug

[tesseract-ocr] Question about "Failed loading language"

[tesseract-ocr] Re: convert a .tiff file to text file

[tesseract-ocr] Tesseract for invoices

Re: [tesseract-ocr] Re: convert a .tiff file to text file

[tesseract-ocr] Re: convert a .tiff file to text file

Re: [tesseract-ocr] convert a .tiff file to text file

Re: [tesseract-ocr] Re: Tesseract not giving the desired output

Re: [tesseract-ocr] Should i use lstm training or TIFF/BOX file training?

Re: [tesseract-ocr] pytesseract: errors with recognized digits

Re: [tesseract-ocr] pytesseract: errors with recognized digits

Re: [tesseract-ocr] Should i use lstm training or TIFF/BOX file training?

[tesseract-ocr] How should i open langdata files? (for example: desired characters, eng.numbers, eng.unicharambigs)

Re: [tesseract-ocr] Training for a specific wordlist and font

[tesseract-ocr] Should i use lstm training or TIFF/BOX file training?

Re: [tesseract-ocr] Evaluating Tesseract with new domain-specific documents

Re: [tesseract-ocr] Training for a specific wordlist and font

Re: [tesseract-ocr] Training for a specific wordlist and font

Re: [tesseract-ocr] Re: How to do lstm training using box/tiff files?

Re: [tesseract-ocr] Re: How to do lstm training using box/tiff files?

Re: [tesseract-ocr] Re: How to do lstm training using box/tiff files?

[tesseract-ocr] How to write .unicharambigs file?

Re: [tesseract-ocr] Re: How to do lstm training using box/tiff files?

Re: [tesseract-ocr] Re: How to do lstm training using box/tiff files?

[tesseract-ocr] Re: How to do lstm training using box/tiff files?

Re: [tesseract-ocr] Re: How to do lstm training using box/tiff files?

[tesseract-ocr] Training tesseract tesstrain.sh exits with a warning

[tesseract-ocr] Re: How to do lstm training using box/tiff files?

27 matches

Site Navigation

Mail list logo

Footer information