Re: [tesseract-ocr] error when make training

2019-06-03 Thread Zdenko Podobny
I am not familiar with Mac, but AFAIK there is no problem with compiling tesseract on Mac. Error mean that linker is not able to find&link text2image against pango-1.0 library... Try to add path to pango library to LDFLAGS. Zdenko po 3. 6. 2019 o 18:36 Jingjing Lin napísal(a): > The above er

Re: [tesseract-ocr] How to link "eng.trainedata" to "tesseract" in Window using "vcpkg" ?

2019-06-03 Thread Zdenko Podobny
There is nothing like linking eng.traineddata to tesseract library. IMHO message is clear: this data patch is not at the place it should be => you did not specified where this data is located. Zdenko po 3. 6. 2019 o 19:42 narendra kintali napísal(a): > I have installed Tesseract using Vcpkg in

Re: [tesseract-ocr] Error when I try to run the command ./configure

2019-06-03 Thread Zdenko Podobny
you need to start here: https://github.com/tesseract-ocr/tesseract/wiki/Compiling-%E2%80%93-GitInstallation Zdenko po 3. 6. 2019 o 19:42 Ricardo Junior napísal(a): > Hello everybody, > I am trying to train the tesseract following this tutorial: > https://github.com/tesseract-ocr/tesseract/wik

Re: [tesseract-ocr] failing tests while compiling in Ubuntu 16.04

2019-06-04 Thread Zdenko Podobny
make check make test are for developers and not for users. Zdenko ut 4. 6. 2019 o 16:56 xaraxura xaraxura napísal(a): > Hi > > I am compiling tesseract 4.0.0 on Ubuntu 16.04. > > Unfortunately some tests are failing. I tried changing tessdata, adding > tessdata location, export TESSDATA_PREFI

Re: [tesseract-ocr] failing tests while compiling in Ubuntu 16.04

2019-06-04 Thread Zdenko Podobny
yes (or try and see ;-) ) Zdenko ut 4. 6. 2019 o 17:04 xaraxura xaraxura napísal(a): > Okay so you are telling that release work fine despite test fails? > > On Tuesday, June 4, 2019 at 7:00:21 PM UTC+4, zdenop wrote: >> >> make check >> make test >> >> are for developers and not for users. >>

Re: [tesseract-ocr] Output is junk characters

2019-06-06 Thread Zdenko Podobny
Image is hardly readable by human. Why do you expect (any) sw wold read it? What kind of techniques described on wiki you already tried? Zdenko št 6. 6. 2019 o 15:29 Balaji Gurunathan napísal(a): > I'm trying to to OCR for the attached image and all I could is junk > characters as the output.

Re: [tesseract-ocr] Changes in Tesseract 4.0 to 4.1 causing loss in precision

2019-06-10 Thread Zdenko Podobny
Can you provide testing case for your problem? Zdenko po 10. 6. 2019 o 19:00 Beck Olson napísal(a): > Greetings! > I just upgrade a system that I was using to parse spraces text out of > images from tesseract 4.0 to 4.1. I was surprised to find a significant > loss in accuracy. One of the

Re: [tesseract-ocr] Changes in Tesseract 4.0 to 4.1 causing loss in precision

2019-06-10 Thread Zdenko Podobny
didn't see any obvious changes myself nor was I able to > find any release notes that might identify major changes between the > version. > > > > > On Mon, Jun 10, 2019 at 10:03 AM Zdenko Podobny wrote: > >> Can you provide testing case for your problem?

Re: [tesseract-ocr] Tesseract source directory: ./configure

2019-06-12 Thread Zdenko Podobny
Which documentation you read about building tesseract from source that is not clear??? Zdenko st 12. 6. 2019 o 20:05 Sivan Langer napísal(a): > where is it exactly in the mac os and also in ubuntu I find the > documentation unclear on this issue and can't find where to compile the > training p

Re: [tesseract-ocr] Can you help me get text from the similar following images, with tesseract?

2019-06-17 Thread Zdenko Podobny
Good luck http://www.robindavid.fr/opencv-tutorial/cracking-basic-captchas-with-opencv.html or https://github.com/ksankaran6/captcha/blob/master/sankaran_k_FinalProject.pdf Zdenko št 13. 6. 2019 o 14:18 Nabi K.A.Z. napísal(a): > Can you help me get text from the similar following images, with

Re: [tesseract-ocr] Increasing RAM while using tesseract

2019-06-20 Thread Zdenko Podobny
IMO you have to call END() to correctly close tesseract instance. Zdenko št 20. 6. 2019 o 13:02 _ Flaviu napísal(a): > I am using tesseract 4 on a VC++ (MFC) app, to read text from images (A4 > sizes). I noticed that while I using this app on several PCs (Win10 64 bit, > *GB RAM), the RAM occu

Re: [tesseract-ocr] Increasing RAM while using tesseract

2019-06-21 Thread Zdenko Podobny
yes - otherwise you will see memoryleak message when you app will finish...ä Zdenko pi 21. 6. 2019 o 10:28 _ Flaviu napísal(a): > As far I understand, > > pTess->End(); > is called on destructor, am I right ? > > On Thursday, June 20, 2019 at 2:42:49 PM UTC+3, zdenop wrote: >> >> IMO you have t

Re: [tesseract-ocr] Wrong full page box in hocr output

2019-06-21 Thread Zdenko Podobny
try the latest code: there was fix regarding bbox after releasing tesseract 4.0 Zdenko št 20. 6. 2019 o 14:16 Arno Loo napísal(a): > Hello, > > I have a recurrent problem and I don't understand why it is happening : > for many of the documents I have, Tesseract's hocr output is giving > incohe

Re: [tesseract-ocr] Increasing RAM while using tesseract

2019-06-23 Thread Zdenko Podobny
Hello, if you are really interesting in help/involving others for testing you need to make it as simple as possible. E.g. providing most simple example. I started to look on this one night, but need to stop cause I do no have much time. So here I was I was able to do: *1. get the test data ;-)*

Re: [tesseract-ocr] Re: Increasing RAM while using tesseract

2019-06-24 Thread Zdenko Podobny
AFAIR: there were always problem with windows tesseract debug build (MSVC). Release version works fine, but debug failed. So it would be great if somebody experienced with windows debugging have a look at this. Zdenko ne 23. 6. 2019 o 20:29 _ Flaviu napísal(a): > Ok, good idea, I am working ri

Re: [tesseract-ocr] issue #1393: Android NDK: LOCAL_MODULE definition in jni/Android.mk must not contain space

2019-06-24 Thread Zdenko Podobny
ogle.com/forum/#!topic/android-ndk/EJTJ17EhLOY> as an > not proper tutorial to pass. > > @*JB*Δ <http://jbigdata.fr/jbigdata/android.html> > > > Le sam. 25 mai 2019 à 08:49, Zdenko Podobny a écrit : > >> Hi all, >> >> this issue[1] is open for quit

Re: [tesseract-ocr] Re: Increasing RAM while using tesseract

2019-06-24 Thread Zdenko Podobny
IMHO you should call Clear in loop with SetImage : https://github.com/tesseract-ocr/tesseract/blob/master/src/api/baseapi.h#L675 Zdenko po 24. 6. 2019 o 10:25 _ Flaviu napísal(a): > I come in just in time: I have developed a little test application, which > have debug and release, source cod

Re: [tesseract-ocr] Re: Increasing RAM while using tesseract

2019-06-24 Thread Zdenko Podobny
And also pixDestroy should be in loop too. You just create new objects and you did not destroy them, so why you wander you run out of memory? Zdenko po 24. 6. 2019 o 10:35 Zdenko Podobny napísal(a): > IMHO you should call Clear in loop with SetImage : > https://github.com/tessera

Re: [tesseract-ocr] Re: Increasing RAM while using tesseract

2019-06-24 Thread Zdenko Podobny
you wander you run out of memory? >> >> >> Zdenko >> >> >> po 24. 6. 2019 o 10:35 Zdenko Podobny napísal(a): >>> >>> IMHO you should call Clear in loop with SetImage : >>> https://github.com/tesseract-ocr/tesseract/blob/master/src/api/baseapi.h#

Re: [tesseract-ocr] issue #1393: Android NDK: LOCAL_MODULE definition in jni/Android.mk must not contain space

2019-06-26 Thread Zdenko Podobny
the Androïd build of tesseract (June src git clone) and test on >>> device. >>> I use the post [tesseract-ocr] Re: Recognition of "5" instead of "S" as a >>> testcase, see attached screenshot. >>> >>> Tesseract Androïd build is not a trivial w

Re: [tesseract-ocr] Re: Invalid resolution 0, using 70dpi instead

2019-06-30 Thread Zdenko Podobny
Is there any reason why not to use tesseract --dpi option? Zdenko so 29. 6. 2019 o 14:56 Mox Betex napísal(a): > > I tried to solve this problem using exiftool to write Units:PixelsPerInch in > png file because it was Units:Undefined, but I had no luck. > > -- > You received this message becaus

Re: [tesseract-ocr] Re: ScrollView problem

2019-06-30 Thread Zdenko Podobny
Description how to run Segmenter Debug Mode is described in wiki . Which version of tesseract you used? Did you compiled by yourself ScrollView.jar? What does it mean that you run tesseract on ubuntu server? Is it headless

Re: [tesseract-ocr] Re: ScrollView problem

2019-06-30 Thread Zdenko Podobny
tesseract start server and open (java) window with debugging info. So IMO you can not run it in real headless system, because there is no way how to show graphical results. But you can try to use tools like e.g. mobaxterm on windows that can start grap

Re: [tesseract-ocr] Re: Invalid resolution 0, using 70dpi instead

2019-06-30 Thread Zdenko Podobny
If you try something, please describe exactly what you did + provide other usefull information e.g. tesseract version. Zdenko ne 30. 6. 2019 o 21:15 Mox Betex napísal(a): > I tried to use it, but it says dpi parameter not valid. > Give me example how to use it. > > Besides, what to do when doi

Re: [tesseract-ocr] Generating traindata

2019-07-02 Thread Zdenko Podobny
Do you use the latest code? Or which version you used? Zdenko ut 2. 7. 2019 o 16:44 Purushotham Rao Eravalli napísal(a): > I am getting the below error while trying to run the command > *tesstrain.sh --fonts_dir ../../fonts --fontlist 'ABCDEFGHIJKLM' --lang > eng --linedata_only --langdata_dir

Re: [tesseract-ocr] Trying to use "-c tessedit_page_number=1" at the end of command to process only page two of a multipage tiff

2019-07-02 Thread Zdenko Podobny
I guess you have tiff with jpeg compression ... You need to use the latest tesseract code and leptonica >1.77 Zdenko ut 2. 7. 2019 o 21:22 Laurent Sabourin napísal(a): > I am us

Re: [tesseract-ocr] Trying to use "-c tessedit_page_number=1" at the end of command to process only page two of a multipage tiff

2019-07-02 Thread Zdenko Podobny
I see the same behaviour on windows. Can you please create issue ? Zdenko ut 2. 7. 2019 o 22:14 Laurent Sabourin napísal(a): > I am attaching a sample tiff that reproduce the issue, this one is a G3 > compression with the same issue... > >

Re: [tesseract-ocr] Re: setting user-words in api?

2019-07-03 Thread Zdenko Podobny
If command line work for you that most easy way is to follow tesseract executable code[1]: IMO you need to use variable user_words_file; AFAIR user_words_suffix specifies only file extension... Then it should work[2] e.g. tessseract will load user words (effect on recognition is other topic). [1]

Re: [tesseract-ocr] How to detect page orientation correctly

2019-07-05 Thread Zdenko Podobny
I am sorry but is not clear what you did... Tesseract does not provide Opencv Hough lines, Opencv Bounding box... Post testing image, tesseract command you used, your result, expected results etc... Zdenko pi 5. 7. 2019 o 7:40 hrishikesh kaulwar napísal(a): > I have a scanned documents in whi

Re: [tesseract-ocr] src snippet question

2019-07-05 Thread Zdenko Podobny
That means it it call by training tools only. Try text2image --list_available_fonts Zdenko št 4. 7. 2019 o 14:42 JB Data31 napísal(a): > Hi, > > Building *tesseract* for Android, I have a question about a src snippet. > In the file *src/ccutil/fileio.cpp* there's a method > *DeleteMatchingFile

Re: [tesseract-ocr] How to detect page orientation correctly

2019-07-05 Thread Zdenko Podobny
Once again provide image there tesseract does not provide correct output so other people can test it. Zdenko pi 5. 7. 2019 o 11:23 hrishikesh kaulwar napísal(a): > I have tried 3 methods. > 1. Tesseract IMG output --psm 0 command on terminal which gives > orientation info but it's not always

Re: [tesseract-ocr] Choice Iterator only shows one choice for each character

2019-07-05 Thread Zdenko Podobny
IMO link should be https://github.com/tesseract-ocr/tesseract/issues/2536 Zdenko št 4. 7. 2019 o 11:39 shree napísal(a): > See related discussion at > https://github.com/tesseract-ocr/tesseract/issues/2532 > > On Monday, July 1, 2019 at 3:51:15 PM UTC+5:30, Jochen Naumann wrote: >> >> Thanks,

Re: [tesseract-ocr] segment image before send it to tesseract

2019-07-05 Thread Zdenko Podobny
see SetRectangle. There is example on wiki [1] how to use it. [1] https://github.com/tesseract-ocr/tesseract/wiki/APIExample#example-of-iterator-over-the-classifier-choices-for-a-single-symbol Zdenko pi 5. 7. 2019 o 19:18 Jingjing Lin napísal(a): > If we do the segmentation beforehand, can w

Re: [tesseract-ocr] Installation failure while installing tesseract 4.1.0 in ubuntu 18.04

2019-07-06 Thread Zdenko Podobny
Use the latest 4.1.0 code. Dňa ne 7. 7. 2019, 7:11 Chanda Nikhil kumar napísal(a): > The following is the error I am facing: > > libtool: link: g++ -g -O2 -std=c++17 -o .libs/tesseract > tesseract-tesseractmain.o -fopenmp ./.libs/libtesseract.so > -L/usr/local/lib /usr/local/lib/liblept.so -lrt

[tesseract-ocr] Tesseract 4.1.0 released

2019-07-07 Thread Zdenko Podobny
Hello all, I am proud to announce that tesseract OCR engine version 4.1.0 - the bug fix release with new renders (API extension) Alto, LSTMBox, WordStrBox. See online Release notes [1]. Source code can be downloaded from GitHub [2]. [1] https://github.com/tesseract-ocr/tesseract/wiki/ReleaseNote

Re: [tesseract-ocr] Re: Tesseract 4.1.0 released

2019-07-08 Thread Zdenko Podobny
We do not maintained vcpkg. We officially support autotools, cmake (clang, msvc, g++),cppan (depreciated) and sw builds. Or other way around - there are people that use these tools and contribute necessary changes. Zdenko po 8. 7. 2019 o 12:05

Re: [tesseract-ocr] Re: Tesseract 4.1.0 released

2019-07-08 Thread Zdenko Podobny
se and the build was successful (1568 Warning(s), 0 Error(s)) AFAIK vcpkg use cmake, ninja and msvc for building. Zdenko po 8. 7. 2019 o 18:13 Zdenko Podobny napísal(a): > We do not maintained vcpkg. > We officially support autotools, cmake (clang, msvc, g++),cppan > <htt

Re: [tesseract-ocr] How to get multiple results (i.e. alternative words), each with its own confidence?

2019-07-12 Thread Zdenko Podobny
https://github.com/tesseract-ocr/tesseract/wiki/APIExample#example-of-iterator-over-the-classifier-choices-for-a-single-symbol Zdenko pi 12. 7. 2019 o 11:21 Eli Marmor napísal(a): > I'm newbie (and it's also my first post to this group), so please excuse > me if it's a silly question... > Tes

Re: [tesseract-ocr] tesseract 4.1.0-rc4

2019-07-13 Thread Zdenko Podobny
Tesseract does not support leptonica. Leptonica support tesseract. Dňa so 13. 7. 2019, 11:36 Chanda Nikhil kumar < nikhilchanda2210t...@gmail.com> napísal(a): > hey does tesseract 4.1.0-rc 4 support or compatible with leptonica 1.74.1 ? > > -- > You received this message because you are subscribe

Re: [tesseract-ocr] tesseract 4.1.0-rc4

2019-07-13 Thread Zdenko Podobny
I am not sure what do you mean, but tesseract requires leptonica >= 1.74 but >1.76

Re: [tesseract-ocr] segmentation fault when go app in run in docker container with tesseract installation

2019-07-14 Thread Zdenko Podobny
This is absolutely not sufficient information. + seems like you are using tesseract 3 which is quite outdated and not supported version anymore. Zdenko ne 14. 7. 2019 o 9:39 Chanda Nikhil kumar napísal(a): > Hey team, > > I am facing segmentation fault (core dumped) error when trying to run ap

Re: [tesseract-ocr] tesseract does not detect text with different font size.

2019-07-15 Thread Zdenko Podobny
tesseract 513380440_sdpage_5_element5.tiff - --psm 12 TITLE 0.4 B-TO-B REC ASSY EMBSTP PKG Zdenko ut 16. 7. 2019 o 8:04 Abhishek Khandelwal napísal(a): > After pre-processing I am getting the attached binary image. Tesseract is > able to detect the title but unable to detect the "TITLE" wor

Re: [tesseract-ocr] Where can I obtain the degree of confidence that every word tesseract recognized?

2019-07-15 Thread Zdenko Podobny
1. use hocr output 2. use tesseract API Zdenko ut 16. 7. 2019 o 8:04 Zihao Guo napísal(a): > I'm dealing with important messages that would better be left as blank > than mistakenly translated. I want to set up a threshold to let tesseract > output "?" if the degree of confidence for a g

Re: [tesseract-ocr] Language detection

2019-07-16 Thread Zdenko Podobny
no. tesseract is OCR and not language detection library. Zdenko ut 16. 7. 2019 o 14:12 Purushotham Rao Eravalli napísal(a): > Hi, > Is there a way where we can detect that the text is english or else of any > other language using the detection box given by tesseract.? Can someone > please help

Re: [tesseract-ocr] Tesseract 4.0 LSTM comparing with other OCR engines

2019-07-24 Thread Zdenko Podobny
Read the forum. Don't be lazy - your are doing it for money - we don't. All topics where discused. Zdenko st 24. 7. 2019 o 14:08 prasad nemmikanti napísal(a): > Hi Lorenzo, > > Thanks for your response. > Attached 3 sample fragments. > I have also analyzed few failure cases and main reasons a

Re: [tesseract-ocr] GPU for Tesseract

2019-07-24 Thread Zdenko Podobny
Well, this is not really true: - in age of tesseract version 3.x AMD sent some patches for OpenCL support[1]. Their are still present, but not maintained (search issue tracker for know problem) - AFAIR they affect only tif opening and some tif preprocessing without effect on OCR pro

Re: [tesseract-ocr] Re: Support for New Reiwa Era Character ㋿ in Japanese

2019-07-24 Thread Zdenko Podobny
Really??? https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#fine-tuning-for--a-few-characters Zdenko št 25. 7. 2019 o 8:03 Prateek Mehta napísal(a): > So everywhere I can see examples and process to train it on new fonts, but > what about the new characters? The character

Re: [tesseract-ocr] Use Tesseract dll with c project

2019-07-25 Thread Zdenko Podobny
I would suggest to start reading doc/wiki. Zdenko št 25. 7. 2019 o 8:36 Pooja Kamra napísal(a): > Hi, > I want to use libtesseract.dll in C project. In tesseract source file > there is a header file capi.h. > How can i use these functions in c exe project. > Please suggest. > > -- > You receiv

Re: [tesseract-ocr] Use Tesseract dll with c project

2019-07-25 Thread Zdenko Podobny
Are you making jokes? Did you really read it??? https://github.com/tesseract-ocr/tesseract/wiki/APIExample#example-using-the-c-api-in-a-c-program Zdenko pi 26. 7. 2019 o 7:11 Pooja Kamra napísal(a): > @ElGato and &Rele, sorry if my question bothered you both. But the link > which has been sen

Re: [tesseract-ocr] OCR directory pages with different indentation levels accurately

2019-07-26 Thread Zdenko Podobny
Can you please share/create image with dummy text that would reflect your common image you would like to OCR? IMO this could encourage more people to find solution your problem. Zdenko št 25. 7. 2019 o 7:44 Dilcia Mercedes napísal(a): > Hi everyone, > > I am currently an intern for a news outl

Re: [tesseract-ocr] OCR directory pages with different indentation levels accurately

2019-07-26 Thread Zdenko Podobny
image. > > On Fri, Jul 26, 2019 at 5:02 AM Zdenko Podobny wrote: > >> Can you please share/create image with dummy text that would reflect your >> common image you would like to OCR? >> IMO this could encourage more people to find solution your problem. >> >>

Re: [tesseract-ocr] DISABLED_LEGACY_CODE questions

2019-07-27 Thread Zdenko Podobny
ad 3. it is a bug - please fill issue tracker with exact error messages and detailed steps to reproduce it (how you configure and build tesseract). ad 2. Leptonica has its function to determine page orientation (search for pixOrientDetect and makeOrientDecision in leptonica program directory for e

Re: [tesseract-ocr] Specific localization and doing OCR

2019-07-29 Thread Zdenko Podobny
other way is to use uzn file (search in forum/internet) Zdenko po 29. 7. 2019 o 9:01 Shree Devi Kumar napísal(a): > https://github.com/tesseract-ocr/tesseract/wiki/APIExample > > If you want to restrict recognition to a sub-rectangle of the image - call > *SetRectangle(left, > top, width, hei

Re: [tesseract-ocr] Specific localization and doing OCR

2019-07-29 Thread Zdenko Podobny
I don't think so. Tesseract is OCR library and it will try to recognize letters everywhere (even from dots ;-) - that why preprocessing is important). You should look for text detection algorithm (search for OpenCV or computer vision). Zdenko ut 30. 7. 2019 o 4:18 min khant napísal(a): > Nope.

Re: [tesseract-ocr] Re: Tesseract 4.1.0 released

2019-08-01 Thread Zdenko Podobny
Thanks. Attached patch should fix it (it does not solve unittest part @Shree: are you able to fix unittest). Can you test it? Zdenko št 1. 8. 2019 o 13:03 René Hansen napísal(a): > Good point, I see *fileio.h* referenced here: > > unittest/fileio_test.cc > unittest/ligature_table_test.cc > uni

Re: [tesseract-ocr] Re: Tesseract 4.1.0 released

2019-08-01 Thread Zdenko Podobny
U clearerr > U fclose > U ferror > U fopen > U fputs > U fread > U fseek > U ftell > U glob > U globfre

Re: [tesseract-ocr] Re: Tesseract 4.1.0 released

2019-08-04 Thread Zdenko Podobny
I am sorry I found the problem - moving fileio.* was already staged, so it did not became part of patch... Now it is part of master, so you can cherry-pick it for 4.1 if needed. Zdenko št 1. 8. 2019 o 19:14 Zdenko Podobny napísal(a): > try to run build in new directory. There should not

Re: [tesseract-ocr] Re: Tesseract 4.1.0 released

2019-08-04 Thread Zdenko Podobny
5a50b93ce as something like 4.1.1? > > That way I can target an official release and get rid of my own fork. > > > /René > > > > On Mon, 5 Aug 2019 at 08:15, Zdenko Podobny wrote: > >> I am sorry I found the problem - moving fileio.* was already staged, so &g

Re: [tesseract-ocr] how we can colaborate with tesseract?

2019-08-05 Thread Zdenko Podobny
Hello, development it done on github[1]. Just fork desired repo, make changes and send pull request. [1] https://github.com/tesseract-ocr Zdenko po 5. 8. 2019 o 18:35 Cristobal Jesus Muñoz Solano napísal(a): > My team are interested in colaborate with tesseract proyect, how we can? > > -- >

Re: [tesseract-ocr] tesseract output is of first page only

2019-08-08 Thread Zdenko Podobny
Provide exact information what you did. Make sure you use the latest tesseract and leptonica. Zdenko pi 9. 8. 2019 o 7:41 ilevy napísal(a): > I'm trying tesseract for the first time with a png of a multipage document > I saved out of a pdf (which itself was just an image). > > When I run tesse

Re: [tesseract-ocr] OCR on multiple image files

2019-08-18 Thread Zdenko Podobny
ut 13. 8. 2019 o 8:03 irtmem intellect napísal(a): > > Hi > I want to OCR multiple image files using tesseract. > > I an making a text file with all the file names (savedlist.txt), and then > tesseract savedlist.txt output.txt . > In case if one of the files is having multiple pages, then only fi

Re: [tesseract-ocr] Extracting text from specific region

2019-08-29 Thread Zdenko Podobny
Where did you try to look? Zdenko št 29. 8. 2019 o 19:24 iFieldSmart Technologies napísal(a): > I am getting the coordinates of a bounding box from an image file using a > different piece of code. Now I want to use tesseract to extract text only > from that specific region and not the whole fi

Re: [tesseract-ocr] Differing output / How do I find out which parameters are being used on a given run?

2019-09-05 Thread Zdenko Podobny
1. --print-parameters is not designed to create config file. 2. There are init and not init variables, there could be variables also in language data, etc... Zdenko št 5. 9. 2019 o 10:56 Jonathan Zwart napísal(a): > *My stackoverflow question refers > (https://stackoverflow.com/questions/577

Re: [tesseract-ocr] ERROR: Program text2image failed. Abort.

2019-09-06 Thread Zdenko Podobny
Do you understand error message ("/usr/local/bin/text2image: error while loading shared libraries: libtesseract.so.5: cannot open shared object file: No such file or directory")? IMO it is clear. Zdenko pi 6. 9. 2019 o 17:07 Jundong Qiao napísal(a): > Hi all, > > I am generating training files

Re: [tesseract-ocr] Parsing config to image_to-pdf_hocr for pytesseract

2019-09-17 Thread Zdenko Podobny
Hi, this should work (in terms of How to use configs in pytesseract): import os import pytesseract from PIL import Image # configuration pytesseract.pytesseract.tesseract_cmd = r"f:\win64_llvm\bin\tesseract.exe" os.environ["TESSDATA_PREFIX"] = r"f:\Project-Personal\tessdata_best\tessdata" custom

Re: [tesseract-ocr] Is there any way to load model(tesseract custom model) at ones instead of loading every time?

2019-09-18 Thread Zdenko Podobny
You have to use tesseract API and not executable for this. Zdenko st 18. 9. 2019 o 9:45 jitendra dubey napísal(a): > We have pass every time model name while predicating the text from images. > So I want to know any way to load tesseract custom generated model in > "in-Memory" and predict fast

Re: [tesseract-ocr] problems with upper-case character

2019-09-18 Thread Zdenko Podobny
IMO only solution is to send longer text for ocr. (e.g. paragraph) Zdenko st 18. 9. 2019 o 17:19 'Sandra M.' via tesseract-ocr < tesseract-ocr@googlegroups.com> napísal(a): > I'm using Tesseract with Python. I have an image with 1-6 words in it and > need to read the text. Sometimes the charact

Re: [tesseract-ocr] Could anyone help me about pytessract?

2019-09-18 Thread Zdenko Podobny
What did you try from already publish here in forum and wiki? Zdenko št 19. 9. 2019 o 5:52 luffy monky napísal(a): > Hi ALL > I try to use any sample code from google. > But it's show no thing in my code > Could I trouble you for any advice?? > Here is my sample code >

Re: [tesseract-ocr] Re: problems with upper-case character

2019-09-19 Thread Zdenko Podobny
Please provide more information (versions info, how you do OCR - seem like you use some coding). I just tried tesseract (tesseract 5.0.0-alpha-416-g408d6) command line with tessdata_best and if work for me: tesseract unnamed.png - Warning: Invalid resolution 0 dpi. Using 70 instead. Estimating reso

Re: [tesseract-ocr] Re: problems with upper-case character

2019-09-19 Thread Zdenko Podobny
your tesseract version is old. Current version is 4.1 (or dev version is 5.0). For 4.x and above you can you different tessdata: best, fast or with 3.x module. Zdenko št 19. 9. 2019 o 11:55 'Sandra M.' via tesseract-ocr < tesseract-ocr@googlegroups.com> napísal(a): > I use Tesseract 3.02 lepton

Re: [tesseract-ocr] text2image: No such file or directory

2019-09-19 Thread Zdenko Podobny
Does /usr/local/bin/text2image exists? Did you installed text2image/training tools? Zdenko št 19. 9. 2019 o 13:59 Ajinkya Khalwadekar napísal(a): > I am following https://github.com/tesseract-ocr/tesseract/issues/1453 for > tesseract 4.0 learning. > I am using macOS mojave. > > All was good u

Re: [tesseract-ocr] text2image issue

2019-09-19 Thread Zdenko Podobny
You already send this to forum and I already replied. Did you read it? Zdenko št 19. 9. 2019 o 15:04 Ajinkya Khalwadekar napísal(a): > I am following https://github.com/tesseract-ocr/tesseract/issues/1453 for > tesseract 4.0 learning. > I am using macOS mojave. > > All was good until i tried '

Re: [tesseract-ocr] Re: problems with upper-case character

2019-09-19 Thread Zdenko Podobny
please provide image for testing. Zdenko št 19. 9. 2019 o 18:06 'Sandra M.' via tesseract-ocr < tesseract-ocr@googlegroups.com> napísal(a): > But therefore I get empty strings now, because it occurs a symbol that > tesseract does not know. I had this problem before as well, but could fix > it f

Re: [tesseract-ocr] OCR results are different on different OS (Linux and Windows)

2019-09-19 Thread Zdenko Podobny
Do you really think that somebody can reproduce problem based on information you provided? Zdenko št 19. 9. 2019 o 18:10 Karan Singh napísal(a): > For the same image, I am using the tesseract to get the text output. But > apparently the output is bad on linux version (RHEL) than windows (Windo

Re: [tesseract-ocr] Re: Compile Tesseract with vcpkg to get dynamic libraries

2019-09-19 Thread Zdenko Podobny
I did not tried it, but if you have installed leptonica, you can install tesseract from source, just adjust relevant part of cmake configuration. AFAIK vcpkg uses cmake and ninja, so this this tutorial (last part) can help you: http://www.sk-spell.sk.cx/building-tesseract-and-leptonica-with-cmake-

Re: [tesseract-ocr] text2image: No such file or directory

2019-09-22 Thread Zdenko Podobny
what about try to read wiki? Zdenko ne 22. 9. 2019 o 16:59 Ajinkya Khalwadekar napísal(a): > Hi shree, > > Can you please suggest me any articles which i can follow for clean > install for this? > > > > On Sunday, September 22, 2019 at 6:51:10 AM UTC+5:30, shree wrote: >> >> make training >> S

Re: [tesseract-ocr] Handwritten traineddata.

2019-09-22 Thread Zdenko Podobny
Did you search e.g. user forum before asking? Zdenko ne 22. 9. 2019 o 19:11 Ajinkya Khalwadekar napísal(a): > Hi, > > Do we have traineddata for hand writing? > Or how can i create one? > For tesseract 4.0. > > > > > -- > You received this message because you are subscribed to the Google Group

Re: [tesseract-ocr] How to install tesseract-ocr on a node.js server?

2019-09-22 Thread Zdenko Podobny
This is out scope of this forum. BTW: right answer is: like any other command like tool. Based on your last posts: you try so skip learning phase e.g. to understand what are doing, understand tools are you using etc. This is wrong approach. Zdenko ne 22. 9. 2019 o 19:28 Clint William Theron nap

Re: [tesseract-ocr] Tesseract Recognition using psm13 for charatcers like "t", "i", "j"

2019-09-30 Thread Zdenko Podobny
Can you provide testing images? I do not think there is any need to retrain tesseract for common font like Arial. Zdenko po 30. 9. 2019 o 12:29 Purushotham Rao Eravalli napísal(a): > Hi, > > I retrained tesseract with Calibiri, arial. While testing on the cropped > text images I am facing is

Re: [tesseract-ocr] Re: Tesseract Recognition using psm13 for charatcers like "t", "i", "j"

2019-09-30 Thread Zdenko Podobny
>tesseract front2-201-6.jpg - Warning: Invalid resolution 0 dpi. Using 70 instead. Estimating resolution as 148 Aimanam, Pulikkuttissery. >tesseract front2-476-4.jpg - Warning: Invalid resolution 0 dpi. Using 70 instead. Estimating resolution as 170 S/O: ltvari Lal, Village patti kuki. >tessera

Re: [tesseract-ocr] Need Help Learning Howto Train Tesseract OCR on Fraktur Fonts - MAC - VietOCR v5.5.2 and Tesseract 4.1.0

2019-10-01 Thread Zdenko Podobny
Why do you think training will help you? What other option you have tried? Zdenko st 2. 10. 2019 o 7:26 Akos Simon napísal(a): > Fraktur Fonts OCR recognition with Tesseract OCR is what I am looking > for, I installed VietOCR v5.5.2 and Tesseract 4.1.0 on my mac, and now > I am trying to f

Re: [tesseract-ocr] Tesseract 4.0.0 doesn't recognize 1 number

2019-10-02 Thread Zdenko Podobny
First of all: Use the latest stable version (4.1.0) or master repository code. 4.0 is quiet old and there was been many bugfixes (e.g. for whitelist). What model (traineddata) you use? It seems you use tessdata_fast. Try to use tessdata_best or tessdata (where you can use also legacy engine, which

Re: [tesseract-ocr] Need Help Learning Howto Train Tesseract OCR on Fraktur Fonts - MAC - VietOCR v5.5.2 and Tesseract 4.1.0

2019-10-02 Thread Zdenko Podobny
If you are novice, that most stupid way is to start (and waste time) with training. Spend some time with research - maybe you will find tesseract if already trained for Fraktur. Did you try to use deu_frak.traineddata[1]? If you got still bad result please read wiki [2] , or post example image. Th

Re: [tesseract-ocr] Re: tesseract ignores single/short characters -> any ideas?

2019-10-05 Thread Zdenko Podobny
tesseract 2_input_cropped.png - --psm 6 --oem 0 6. 7. 8. 9. 10. Zdenko so 5. 10. 2019 o 10:04 test0r man napísal(a): > --Push-- > > does anyone have an idea? > > thanks for help! > > > Am Sonntag, 8. September 2019 12:23:28 UTC+2 schrieb test0r man: >> >> hi, >> i use this command: >> >> tes

Re: [tesseract-ocr] Re: tesseract ignores single/short characters -> any ideas?

2019-10-05 Thread Zdenko Podobny
First image has several problems: 1. not straight baseline 2. different font size 3. table like structure 4. amount/digits fields 1-3 could be solved with custom layout analyze e.g. splitting image to individual parts and sending them to tesseract via API or uzn file. There was ana

Re: [tesseract-ocr] Re: tesseract ignores single/short characters -> any ideas?

2019-10-05 Thread Zdenko Podobny
end is typo ;-) should be read as eng :-) Dňa so 5. 10. 2019, 21:31 test0r man napísal(a): > Hi Zdenko, > > very good job! i've tried so many image manipulation, but this was the > wrong way for the problems 1-3. the idea with the uzn file is great and i > think the perfect solution. Thanks :-)

Re: [tesseract-ocr] German lang support

2019-10-07 Thread Zdenko Podobny
There is no tesseract v5.0. Maybe you mean tesseract 5.0.0-alpha? What do you mean with "installed"? How? Which OS? Zdenko po 7. 10. 2019 o 21:48 Leopold Hamminger napísal(a): > Just installed tesseract v5.0. List of avail languagaes: eng, osd. How do > I add deu (German)? Thank you > > -- > Y

Re: [tesseract-ocr] German lang support

2019-10-08 Thread Zdenko Podobny
>From where you downloaded tesseract? Zdenko ut 8. 10. 2019 o 11:39 Leopold Hamminger napísal(a): > Thank you, Zdenko > > I downloaded tesseract and installed it on my PC running Win 10. tesseract > --version returns: v5.0-0-alpha.20190708. --list-langs returns: eng osd > Am I running a wrong

Re: [tesseract-ocr] Mute "Empty page!!" print when using libtesseract?

2019-10-08 Thread Zdenko Podobny
Set parameter debug_file to /dev/null (or some filename, where you can check warning from tesseract) Zdenko ut 8. 10. 2019 o 12:57 MPursche napísal(a): > Hello! > > I have a use-case for Tesseract where I'm not always 100% sure that there > is text in the image I try to do OCR on. When it runs

Re: [tesseract-ocr] Preprocessing Tools

2019-10-08 Thread Zdenko Podobny
Can you share code for border removal? Zdenko št 3. 10. 2019 o 17:10 Suresh Anand napísal(a): > We do use openCV to do the pre process steps > 1.Re size > 2. Binarisation > 3. noise removal > 4. de skew > 5. border removal > > It works like a charm > > On Thu, Oct 3, 2019 at 2:06 PM Jennil Thi

Re: [tesseract-ocr] Mute "Empty page!!" print when using libtesseract?

2019-10-08 Thread Zdenko Podobny
Unfortunately you are wrong on this: 1. He is using the real api binding (if not the best, at least most active C# tesseract solution) 2. Tesseract library prints output to stderr and stdout. Check the source. Zdenko Dňa ut 8. 10. 2019, 14:52 Lorenzo Bolzani napísal(a): > Hi, > I s

Re: [tesseract-ocr] Tesseract Strangely Thinks Text is Upside Down - ACCURACY

2019-10-12 Thread Zdenko Podobny
Do not use psm 12. Default psm seems to work. Zdenko št 10. 10. 2019 o 12:47 Umut Barış Korkut napísal(a): > Hey, > > Tesseract sometimes thinks all the text in the page is upside down. > For example the text "MOM" is recognized as "WOW" by the tesseract. > Similarly "GENERAL NOTES" is recogni

Re: [tesseract-ocr] Tesseract performing bad on Debian, but perfectly on Windows. Help!

2019-10-14 Thread Zdenko Podobny
You use/compare 2 different tesseract version. Very old and the newest one. Use the same version and you will get the same result. Zdenko po 14. 10. 2019 o 12:39 Wassim Boubaker napísal(a): > Hi everyone. > I'm working with pytesseract (python+flask) and deploying my work to a > server running

Re: [tesseract-ocr] Bold italics detection in tesseract 4

2019-10-16 Thread Zdenko Podobny
I do not have 3.5 version available, but simple of test image with tesseract v5.0.0-alpha-479-g247c show "some detection" of bold but on wrong places. So I would not suggest to use >=4.x version for this tasks. Zdenko ut 15. 10. 2019 o 16:52 Ravi Nemala napísal(a): > > I read tesseract 4.0 doe

Re: [tesseract-ocr] Problem using "--oem 0" in Tesseract 5.4.0

2024-06-07 Thread Zdenko Podobny
Please show minimal respect and first google for a solution. Zdenko pi 7. 6. 2024 o 18:23 Fred Andrews napísal(a): > I captured a screenshot of a VirtualBox guest boot crash and Tesseract > didn't seem to do very well OCRing that text, so I wanted to try the older > engine, which the help say

Re: [tesseract-ocr] Error when trying to build Tesseract DLL from Scratch on Arch Linux via Cmake

2024-06-20 Thread Zdenko Podobny
I am lost in your post... e.g. DLL is recreated on Windows not on Linux. Are you trying to cross-compile Tesseract? How did you install Leptonica? Which version? What does it mean that "webp is installed on my system" - only runtime? Can you please provide each step and its output so we can replic

Re: [tesseract-ocr] Error when trying to build Tesseract DLL from Scratch on Arch Linux via Cmake

2024-06-21 Thread Zdenko Podobny
Cross compiling is tricky you need to know what are you doing and how to solve problems. Better solution is to use https://github.com/UB-Mannheim/tesseract/wiki AFAIK `cmake ..` will configure package for current system (e.g. does not cross compile) Zdenko št 20. 6. 2024 o 22:32 Danny napísal(a

Re: [tesseract-ocr] Re: Text extraction failure after preprocessing.

2024-06-28 Thread Zdenko Podobny
First of all, using jpg as a format for image processing and OCR is not very smart. Next: it does not seem like a very standard font... maybe you will need to train tesseract for it. For me, it looks like a heavy preprocessed 7-segment font... so I tried this: tesseract 14.png - --psm 7 --oem 0 -

Re: [tesseract-ocr] Re: Text extraction failure after preprocessing.

2024-06-28 Thread Zdenko Podobny
As far as I remember, the traineddata are from https://github.com/arturaugusto/display_ocr/blob/master/letsgodigital/letsgodigital.traineddata Also, check https://github.com/Shreeshrii/tessdata_ssd for Seven Segment Display recognition. Zdenko pi 28. 6. 2024 o 17:07 'uday kaipa' via tesseract-

Re: [tesseract-ocr] Re: Tesseract training ground truth: I'm confused about the box files

2024-07-14 Thread Zdenko Podobny
Ehm: 1. Tesseract v3 (legacy) engine training is based on characters. 2. Tesseract LSTM engine (tesseract >=v4) training script is based on lines (group of words) Box files reflect that. And yes - box files are important. Zdenko pi 12. 7. 2024 o 14:14 Mateusz Matela napísal(a): > A

<    1   2   3   4   5   6   7   8   9   10   >