Re: [tesseract-ocr] Tesseract couldn't load any languages!

2018-05-04 Thread Zdenko Podobny
The error message is clear. Or? Zdenko pi 4. 5. 2018 o 20:38 Dattatraya Tembare napísal(a): > Exception in thread "main" java.lang.Error: Invalid memory access > at com.sun.jna.Native.invokePointer(Native Method) > at com.sun.jna.Function.invokePointer(Function.java:490) > at com.sun.jna.Funct

[tesseract-ocr] Announcement: Tesseract tessdata downloader from GitHub repositories 1.0

2018-05-11 Thread Zdenko Podobny
Hello all, if you are interesting in downloading only some language of traineddata from repositories (or different tagged version) have a look at tessdata_downloader[1] . I just released version 1.0 [2] . I created this script in python, but also I was able to create windows 64bit "frozen" app so

Re: [tesseract-ocr] a way to extract the location of each components in image

2018-05-20 Thread Zdenko Podobny
Did you read wiki before posting? E.g. https://github.com/tesseract-ocr/tesseract/wiki/APIExample#getcomponentimages-example Zdenko ne 20. 5. 2018 o 8:00 nick napísal(a): > hi > > is there a way to extract the location of each components (lines) in the > image ? > > for example : in the attach

Re: [tesseract-ocr] Where to find tessdata folder?

2018-05-31 Thread Zdenko Podobny
Did you follow instruction for installation of that package? Did you try internet search before posting on forum? Did you try to search for help in project tesserocr??? I just put it to google and I got: https://pypi.org/project/tesserocr/ https://github.com/sirfz/tesserocr https://oded.blog/2017

Re: [tesseract-ocr] Re: Training error "Couldn't find a matching blob"

2018-06-02 Thread Zdenko Podobny
done in https://github.com/tesseract-ocr/tesseract/commit/bc5dfc4b953babcc865f68a55c3bf415f4280b1a Zdenko št 31. 5. 2018 o 22:39 shree napísal(a): > This has been an issue for long. Thanks for finding the problem. > > Please submit a PR on github. > > On Friday, June 1, 2018 at 1:55:25 AM UTC+5

Re: [tesseract-ocr] Re: Training error "Couldn't find a matching blob"

2018-06-02 Thread Zdenko Podobny
Please check if this is ok now. If yes, I am willing to make 3.05.02 release ;-) Zdenko so 2. 6. 2018 o 10:16 Zdenko Podobny napísal(a): > done in > https://github.com/tesseract-ocr/tesseract/commit/bc5dfc4b953babcc865f68a55c3bf415f4280b1a > Zdenko > > > št 31. 5. 2018 o 22

Re: [tesseract-ocr] Re: Training error "Couldn't find a matching blob"

2018-06-03 Thread Zdenko Podobny
dMemBoxes(target_page, skip_blanks, &box_data[0], boxes, texts > , > box_texts, pages); > } > > > > On Saturday, June 2, 2018 at 2:22:16 AM UTC-6, zdenop wrote: >> >> Please check if this is ok now. If yes, I am willing to make 3.05.02 >>

Re: [tesseract-ocr] Re: Training error "Couldn't find a matching blob"

2018-06-04 Thread Zdenko Podobny
Stefan, Paul suggest to modified also LoadDataFromFile (ccutil/genericvector.h). That modification is not needed? Zdenko po 4. 6. 2018 o 17:32 'Stefan Weil' via tesseract-ocr < tesseract-ocr@googlegroups.com> napísal(a): > As far as I see 4.0.0 is good. I have sent a pull request which backpor

Re: [tesseract-ocr] Re: Training error "Couldn't find a matching blob"

2018-06-05 Thread Zdenko Podobny
Please make PR for master (4.0) branch and I will cherry-pick for 3.05... Zdenko ut 5. 6. 2018 o 4:38 Paul Kitchen napísal(a): > ZDenko, > > I checked out the latest tesseract code and updated to branch 3.05. I see > that the int64_t area bug is already fixed (thanks!). I also see that the > b

Re: [tesseract-ocr] Re: Training error "Couldn't find a matching blob"

2018-06-05 Thread Zdenko Podobny
You need to fork official repository and then you have all permission you need. When you make your changes you can send pull request to official repository with your changes. Zdenko ut 5. 6. 2018 o 15:06 Paul Kitchen napísal(a): > ZDenko, > > Unfortunately I don't seem to have write permission

Re: [tesseract-ocr] Re: Training error "Couldn't find a matching blob"

2018-06-05 Thread Zdenko Podobny
Yes, it is ok, but you do not have to create separate issue for PR (PR is a issue too) Zdenko ut 5. 6. 2018 o 16:52 Paul Kitchen napísal(a): > ZDenko, > > I'm new to this so hopefully I did everything correctly. Here is the issue > I created: > > https://github.com/tesseract-ocr/tesseract/issu

Re: [tesseract-ocr] Tesseract is generating error prompt ''Not enough data at scanline" while extracting a tiff file

2018-06-25 Thread Zdenko Podobny
can you post a image? It seems like leptonica/tiff problem Zdenko ut 26. 6. 2018 o 7:21 James Worldprogram napísal(a): > *Problem*: - I am using Tesseract: *tesseract-ocr-setup-3.05.01.exe* as a > command line argument in Windows OS program with the argument *-l eng*; > it is working absol

Re: [tesseract-ocr] wron Characters in LibreOffice Writer with German spezial Characters

2018-06-29 Thread Zdenko Podobny
this is not tesseract problem: https://ask.libreoffice.org/en/question/97993/why-doesnt-lo-writer-open-and-save-text-documents-encoded-in-utf-8-without-bom-any-plans-to-fix-this-soon/ tesseract output is UTF-8 encoded. Zdenko pi 29. 6. 2018 o 19:37 Martin Jenniges napísal(a): > Hello, > > whe

Re: [tesseract-ocr] Not Able to get Text

2018-07-06 Thread Zdenko Podobny
Please read wiki regarding improving tesseract result. Zdenko pi 6. 7. 2018 o 10:52 napísal(a): > Hi, > > I was using tesseract from long time and its working fine, we got some > new images but these images are not been parsed by tesseract > > I removed extra noise, changed to greyscale, chang

Re: [tesseract-ocr] Not Able to get Text

2018-07-06 Thread Zdenko Podobny
images you provided are noisy. tesseract is not designed to work with such images (e.g. to break captcha). Zdenko pi 6. 7. 2018 o 11:14 Pranay Saxena napísal(a): > Hi > > I read and done all the changes to increase the quality for better result > .. > > I tried with google api also and google

Re: [tesseract-ocr] Not Able to get Text

2018-07-06 Thread Zdenko Podobny
t still not worked .. > > And can u suggest any other way to get it done. > > > Regards > Pranay > > On Fri, Jul 6, 2018, 14:48 Zdenko Podobny wrote: > >> images you provided are noisy. tesseract is not designed to work with >> such images (e.g. to break cap

Re: [tesseract-ocr] how to static link tesseract dependencies?

2018-07-09 Thread Zdenko Podobny
I think I did it in the past: I wanted to have only leptonica and tesseract library. Try to build leptonica with static linking and than force tesseract build to use your leptonica build/installation... Zdenko po 9. 7. 2018 o 11:40 blues napísal(a): > Hi All, > > I followed the instruction fo

Re: [tesseract-ocr] unrecognized argument "unrecognised argument linedata_only"

2018-07-21 Thread Zdenko Podobny
your comand is wrong. you forget to put there space. Dňa so 21. 7. 2018, 18:12 napísal(a): > My command is > > > usr/share/tesseract-ocr/./tesstrain.sh \ > > --fonts_dir /usr/share/fonts \ > > --lang ben \ > > --linedata_only\ > > --noextract_font_properties \ > > --langdata_dir /home/jennil/Des

Re: [tesseract-ocr] not able to run autogen.sh building tesseract-master 4.0.0

2018-07-25 Thread Zdenko Podobny
did you tried to google for " undefined macro: m4_esyscmd_s "? One of first answers is " Most likely you will have to upgrade your autoconf ". Zdenko st 25. 7. 2018 o 17:08 Yogesh Sanchihar napísal(a): > Guys, > > I am trying to build tesseract 4.0.0 from master branch > > I am facing followi

Re: [tesseract-ocr] Re: not able to run autogen.sh building tesseract-master 4.0.0

2018-07-26 Thread Zdenko Podobny
Tesseract requires the recent compiler. You will need to upgrade whole system because of it... Zdenko št 26. 7. 2018 o 12:26 Yogesh Sanchihar napísal(a): > I think, you are right... I upgraded to autoconf 2.69. And process moved a > bit. > > But still when I am executing ./configure --enable-

Re: [tesseract-ocr] Can't symlink into tessdata anymore?

2018-07-26 Thread Zdenko Podobny
symlink is filesystem feature and tesseract use standard C++ function for reading/writing files from filesystem, so there is no reason why there would be bug in tesseract. But it seems that you do something non standard because ita.special-words is not file that would tesseract open if you just sp

Re: [tesseract-ocr] Can't symlink into tessdata anymore?

2018-07-27 Thread Zdenko Podobny
If I got it right, that confirm that there is no problem/but related to symlink, but outdated itatraineddata. Right? Zdenko pi 27. 7. 2018 o 9:27 Shree Devi Kumar napísal(a): > @zdenko podobny > > Please see https://github.com/tesseract-ocr/tessdata/issues/18 > ita.special-words

Re: [tesseract-ocr] Re: combine_tessdata. Failed to read /usr/share/tesseract-ocr/tessdata/foo.traineddata

2018-07-29 Thread Zdenko Podobny
"undefined symbol" indicate broken installation Zdenko ne 29. 7. 2018 o 1:41 napísal(a): > Updated the previous error was permission. I solved it no I have this > error > > > combine_tessdata: symbol lookup error: combine_tessdata: undefined symbol: > _Z7tprintfPKcz > > > > On Sunday, July 29,

Re: [tesseract-ocr] Issue installing Tesseract-Langpack-OSD dependency on REHL

2018-09-14 Thread Zdenko Podobny
Did you downloaded tesseract-langpack-osd-4.00~git30-4.1.noarch rpm? Zdenko pi 14. 9. 2018 o 21:01 Jacob Rosenzweig napísal(a): > The REHL server I'm installing Tesseract is not connected to the internet, > so I'm installing from .rpm packages. I moved the Tesseract-4.0.0 rpm over > to server

Re: [tesseract-ocr] Issue installing Tesseract-Langpack-OSD dependency on REHL

2018-09-14 Thread Zdenko Podobny
How did you install it and what was error message ( tesseract-langpack-osd ) when you tried to install it? Zdenko pi 14. 9. 2018 o 21:06 Jacob Rosenzweig napísal(a): > I did. That's the dependency that fails to install. > > On Friday, September 14, 2018 at 12:05:22 PM UTC-7, zdenop wrote: >>

Re: [tesseract-ocr] Issue installing Tesseract-Langpack-OSD dependency on REHL

2018-09-14 Thread Zdenko Podobny
And the error message was? Zdenko pi 14. 9. 2018 o 22:28 Jacob Rosenzweig napísal(a): > sudo rpm -i tesseract-lang-pack-osd-4.00~git30-4.1.noarch.rpm > > I tried yum install as well but it kept trying to download dependencies > off the internet. > > On Friday, September 14, 2018 at 12:32:59 PM

Re: [tesseract-ocr] Issue installing Tesseract-Langpack-OSD dependency on REHL

2018-09-14 Thread Zdenko Podobny
Did you tried to install all packages at the same time (e.g with one command)? Zdenko pi 14. 9. 2018 o 22:51 Jacob Rosenzweig napísal(a): > Oh sorry, it was just what I posted in the OP. > > error: Failed dependencies: > tesseract is needed by tesseract-langpack-osd-4.00~ > git30-4.1.n

Re: [tesseract-ocr] Can Tesseract auto rotate images?

2018-09-16 Thread Zdenko Podobny
Teseract je OCR tool. Aby itnshiuld produce image data? Dňa ne 16. 9. 2018, 11:45 Ido Nava napísal(a): > Hello, > I processed with tessercat 4 files: > 1. Original tif file (0 deg). > 2. 90 deg rotate of the org file. > 3. 180 deg rotate of the org file. > 4. 270 deg rotate of the org file. > >

Re: [tesseract-ocr] Can Tesseract auto rotate images?

2018-09-16 Thread Zdenko Podobny
I am sorry, I just wrote quickly reply on phone without checking. The reply is: Teseract je OCR tool. Why it should produce image data/modify input? Zdenko ne 16. 9. 2018 o 11:47 Zdenko Podobny napísal(a): > Teseract je OCR tool. Aby itnshiuld produce image data? > > Dňa ne 16. 9.

Re: [tesseract-ocr] How to overlay hocr output on original scanned pdf.

2018-09-17 Thread Zdenko Podobny
Something like this? tesseract scannedFile.png scanned.pdf -l eng hocr pdf Zdenko po 17. 9. 2018 o 14:12 monica kumari napísal(a): > for OCRing a scanned pdf, > first it is converted to image format then OCRed and gives a temperory > file of pdf/text format and overlays on original scanned pd

Re: [tesseract-ocr] Checking HasNext on Tesseract API to avoid getting error when there is no item in the iterator.

2018-09-21 Thread Zdenko Podobny
No there is not such function. there is IsAtBeginningOf ( TessPageIteratorIsAtBeginningOf ) and IsAtFinalElement (TessPageIteratorIsAtFinalElement

Re: [tesseract-ocr] Tesseract to detect numbers with opencv (c++) and cmake on a raspberry pi

2018-09-22 Thread Zdenko Podobny
Have a look at wiki APIExample . And tesseract / tesseractmain.cpp is just example how to use tesseract library... Tesseract is also possible to build with

Re: [tesseract-ocr] Tesseract to detect numbers with opencv (c++) and cmake on a raspberry pi

2018-09-22 Thread Zdenko Podobny
How did you install tesseract? Zdenko so 22. 9. 2018 o 15:23 Adam Richards napísal(a): > I have managed to run the basic API example from the command line, however > I am now trying to incorporate it into my project. > I have added these lines to my CMakeLists.txt file: > > find_package( Tesse

Re: [tesseract-ocr] Tesseract to detect numbers with opencv (c++) and cmake on a raspberry pi

2018-09-23 Thread Zdenko Podobny
Can you check if TesseractConfig.cmake macros are included in these packages? Zdenko ne 23. 9. 2018 o 2:06 Adam Richards napísal(a): > Pretty sure I installed with just these commands: > > sudo apt install tesseract-ocr > sudo apt install libtesseract-dev > > When I check the tesseract -v I ge

Re: [tesseract-ocr] Tesseract to detect numbers with opencv (c++) and cmake on a raspberry pi

2018-09-23 Thread Zdenko Podobny
This means that your tesseract packages were build with autotools and they did not includes cmake support. So you need to set your Tesseract_INCLUDE_DIRS and Tesseract_LIBRARIES manually... Zdenko po 24. 9. 2018 o 0:24 Adam Richards napísal(a): > I did a search on the Pi and it couldn't find A

Re: [tesseract-ocr] Re: Final step of install Tesseract 4.0 on MacOS High Serria___make training___"Need to reconfigure project, so there are no errors"

2018-09-24 Thread Zdenko Podobny
Did you installed autoconf-archive? Zdenko ut 25. 9. 2018 o 0:02 Kurniawan Kurniawan napísal(a): > I still get this error even I have brew upgrade and brew update > > brew install tesseract --HEAD > *Error: An exception occured within a child process:* > * RuntimeError: /usr/local/opt/autocon

Re: [tesseract-ocr] Text2image doens't create font list

2018-09-25 Thread Zdenko Podobny
I guess you have another installation of tesseract present in your system. Please uninstall old version/other tesseract before installing new version... Zdenko ut 25. 9. 2018 o 15:03 Khosrobeigy.zohreh napísal(a): > Today, I installed new version of tesseract. Iused this line: > > https://bing

Re: [tesseract-ocr] Tesseract to detect numbers with opencv (c++) and cmake on a raspberry pi

2018-09-26 Thread Zdenko Podobny
Just goole for help. There are plenty examples. e.g. https://stackoverflow.com/questions/24570916/add-external-libraries-to-cmakelist-txt-c Zdenko st 26. 9. 2018 o 14:38 Adam Richards napísal(a): > Hi Zdenko, > > Sorry, I'm just still unsure how exactly to get this working? What should > I be

[tesseract-ocr] Re: Heads up: release of tesseract 4.0

2018-09-30 Thread Zdenko Podobny
RC 1[1] ready. Please test, test, test. Especially if you are wrapping tesseract and creating/providing packages. Report problems ASAP in issue tracker, so we can fix it until finale release. [1] https://github.com/tesseract-ocr/tesseract/tree/4.0.0-rc1 Zdenko so 22. 9. 2018 o 17:06 Zdenko

Re: [tesseract-ocr] Error Segmentation fault (core dumped)

2018-10-03 Thread Zdenko Podobny
Maybe you forget to read FAQ or google ;-) https://github.com/tesseract-ocr/tesseract/wiki/FAQ-Old#actual_tessdata_num_entries_-tessdata_num_entrieserrorassert-failedin-file-ccutiltessdatamanagercpp-line-55_ Zdenko st 3. 10. 2018 o 15:50 AjeetM napísal(a): > Hi, > I am running the command: >

Re: [tesseract-ocr] Disable degradeimage function

2018-10-05 Thread Zdenko Podobny
Why change source file? text2image --help | grep degrade --degrade_image Degrade rendered image with speckle noise, dilation/erosion and rotation (type:bool default:true) Zdenko pi 5. 10. 2018 o 18:16 anonynamja napísal(a): > I wish to disable the degradeimage function when generating tra

[tesseract-ocr] Re: Heads up: release of tesseract 4.0

2018-10-07 Thread Zdenko Podobny
can fix it until finale release. [1] https://github.com/tesseract-ocr/tesseract/tree/4.0.0-rc2 <https://github.com/tesseract-ocr/tesseract/tree/4.0.0-rc1> Zdenko ne 30. 9. 2018 o 19:50 Zdenko Podobny napísal(a): > RC 1[1] ready. > Please test, test, test. Especially if you are wrappi

Re: [tesseract-ocr] Convert image to text shows arrow instead of empty string

2018-10-08 Thread Zdenko Podobny
Try to provide your input image, so somebody can test it ;-) Zdenko po 8. 10. 2018 o 16:25 AutobotRyszard napísal(a): > After update to 4.0 version, blank(empty) img is converted to arrow > instead of empty string. > > Has anyone heard something about such problem and is able to help me:)? > >

Re: [tesseract-ocr] Tesseract 4.0 confidence

2018-10-11 Thread Zdenko Podobny
try hocr output ;-) Or see wiki: https://github.com/tesseract-ocr/tesseract/wiki/APIExample#result-iterator-example its for word you can modify to character level (at least for 3.0x version) Zdenko št 11. 10. 2018 o 18:20 Soumik Ranjan Dasgupta napísal(a): > As far as I know, tesseract does

Re: [tesseract-ocr] pixRead problem

2018-10-12 Thread Zdenko Podobny
Did you read error message? Did you bother to checked leptonica fuctionality + documentation (in leptonica source code ;-) )? Zdenko pi 12. 10. 2018 o 11:39 napísal(a): > Hi. I am a tesseract beginner who stuck into initial API example: > > Pix* pImage = pixRead("C:\\Flaviu\\imagine.png"); > p

Re: [tesseract-ocr] Undefined Reference errors when building Tesseract OCR

2018-10-12 Thread Zdenko Podobny
Seem like you miss to learn tools you are using ;-) Because otherwise you would know that you forget to link your output (tesseract ?) with missing library (leptonica) Zdenko st 3. 10. 2018 o 23:52 Mich Po napísal(a): > I added these lines to CMakeLists.txt in Tesseract, it builds to 99% then

Re: [tesseract-ocr] Empty page!!

2018-10-12 Thread Zdenko Podobny
You got it because you forget to read manual/documenation to tool you try to use :-). You can start with tesseract --help, --help-extra etc. Do you understand command you run? Do you understand default (just basic) parameters used in this command? Zdenko pi 12. 10. 2018 o 13:33 napísal(a): >

Re: [tesseract-ocr] Failed to get Text extraction

2018-10-12 Thread Zdenko Podobny
https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality Did you read it? Zdenko pi 5. 10. 2018 o 8:09 napísal(a): > Environment > >- > >Tesseract Version: 3.04.01 >- > >Platform: Linux >- > >Test url: https://samsung-nudge.s3.eu-central-1.amazonaws.com/20.jpeg >

Re: [tesseract-ocr] Using Tesseract with openCV Mat

2018-10-12 Thread Zdenko Podobny
If you are really interested in help, than please post complete code of your test case (+ how you compiled it) including (link to) image you try to process. Zdenko št 11. 10. 2018 o 12:39 Adam Richards napísal(a): > Hi I am having issues with sending a Mat image the I have opened through > Op

Re: [tesseract-ocr] Issue installing Tesseract-Langpack-OSD dependency on REHL

2018-10-12 Thread Zdenko Podobny
And that is not problem of tesseract project :-( You should contact directly packager. Zdenko pi 12. 10. 2018 o 21:16 Yuefei napísal(a): > I am having the exact same problem. tesseract-langpack-osd-4.00~git30-4.1 > and tesseract-4.00~git3083-1.1 formed a dependency loop and can't be > installe

[tesseract-ocr] Re: Heads up: release of tesseract 4.0

2018-10-14 Thread Zdenko Podobny
issue tracker, so we can fix it until finale release. [1] *https://github.com/tesseract-ocr/tesseract/releases/tag/4.0.0-rc3 <https://github.com/tesseract-ocr/tesseract/releases/tag/4.0.0-rc3>* Zdenko ne 7. 10. 2018 o 21:18 Zdenko Podobny napísal(a): > RC 2 is ready[1]. > Pleas

Re: [tesseract-ocr] Re: Heads up: release of tesseract 4.0

2018-10-14 Thread Zdenko Podobny
it will depends based on number of (significant) commits and findings ;-) E.g. just yesterday we got fixes for Mac and it is still not clear if build from scratch will work on Mac... Just short statistics about number of commits: 4.0.0-beta.3..4.0.0-beta.4259 commits 4.0.0-beta.4..4.0.0-rc1

Re: [tesseract-ocr] Re: Heads up: release of tesseract 4.0

2018-10-15 Thread Zdenko Podobny
any way tesseract could be installed using pip for Ubuntu 16.04 > systems and above? > > On Sun, Oct 14, 2018 at 11:46 PM Zdenko Podobny wrote: > >> it will depends based on number of (significant) commits and findings ;-) >> E.g. just yesterday we got fixes for Mac and it is

Re: [tesseract-ocr] Re: Heads up: release of tesseract 4.0

2018-10-15 Thread Zdenko Podobny
ms? > > On Mon, Oct 15, 2018, 2:00 PM Zdenko Podobny wrote: > >> Are familiar with tools you try to use? >> pip is for distribution python modules and tesseract is c++ project, that >> are distributed with other tools (depending on linux distribution) - on >> Ubunt

Re: [tesseract-ocr] Convert image to text shows arrow instead of empty string

2018-10-15 Thread Zdenko Podobny
it is page line separator or form feed. See https://en.wikipedia.org/wiki/Page_break#Form_feed Zdenko po 15. 10. 2018 o 13:15 Soumik Ranjan Dasgupta napísal(a): > I don't see any arrows opening it with gedit, just a symbol. > I tried opening the file with python and reading the contents. Past

Re: [tesseract-ocr] Why do I get such poor results from Tesseract for simple single character recognizing?

2018-10-15 Thread Zdenko Podobny
1. If you have quality problem - it good to play with tesseract executable instead of API ;-) 2. It is know that passing text (in your case just one letter) is not best idea - please try to add small white border e.g. 10 px 3. Please set dpi for image after SetImage See attachment f

Re: [tesseract-ocr] Making custom traineddata

2018-10-15 Thread Zdenko Podobny
Robert is pointing you to right direction. Did you read the log you post here? " Tesseract Open Source OCR Engine v3.04.01 with Leptonica" You are mixing tesseract versions so no surprise of problems. Zdenko ut 16. 10. 2018 o 8:26 Vinod Gattani napísal(a): > Hi, > Typo: " Why the version is no

Re: [tesseract-ocr] Making custom traineddata

2018-10-16 Thread Zdenko Podobny
there any other way of installing v4.0. Please let me know what I > am doing wrong. > > Regards, > Vinod > > On Tue, Oct 16, 2018 at 12:15 PM Zdenko Podobny wrote: > >> Robert is pointing you to right direction. Did you read the log you post >> here? >> "

Re: [tesseract-ocr] pixRead problem

2018-10-16 Thread Zdenko Podobny
Really? Where did you look??? What is output of leptonica "./configure --help" ??? What is printed on screen when you run leptonica configure? Zdenko ut 16. 10. 2018 o 9:03 napísal(a): > Hi zdenop. I have read here: > > > https://groups.google.com/forum/#!searchin/tesseract-ocr/Error$20in$20pi

[tesseract-ocr] Re: Heads up: release of tesseract 4.0

2018-10-16 Thread Zdenko Podobny
Is here anybody who build&use tesseract on Android? I would like to solve: https://github.com/tesseract-ocr/tesseract/issues/1393 Zdenko ne 14. 10. 2018 o 19:48 Zdenko Podobny napísal(a): > RC 3 is ready[1]. > > Please test, test, test. > Especially if you are building te

Re: [tesseract-ocr] pixRead problem

2018-10-16 Thread Zdenko Podobny
most easy way for you would be to compile tesseract on windows with cppan. instruction are on wiki... Dňa ut 16. 10. 2018, 10:14 napísal(a): > Thank you a lot for your prompt answer ! I really appreciate that ! > > I have run in cmd line: tesseract --help-extra, I don't spot any graphic > librar

Re: [tesseract-ocr] pixRead problem

2018-10-16 Thread Zdenko Podobny
You will do everything including complaining but not to read and follow instructs. Right? ;-) https://github.com/tesseract-ocr/tesseract/wiki/Compiling#windows Zdenko ut 16. 10. 2018 o 10:52 napísal(a): > It is a endless story :) > > I have downloded from here cppan, and I have tried to gene

Re: [tesseract-ocr] pixRead problem

2018-10-16 Thread Zdenko Podobny
I do not use vcpkg. I suggest you to use cppan (you need to install it and put to path). For me it stupidly easy and it takes cca 15 minutes on my computer and internet network): gir clone https://github.com/tesseract-ocr/tesseract.git cd tesseract mkdir build64 cd build64 cppan.. cmake .. -G "Vi

Re: [tesseract-ocr] Installation of Tesseract and some of its dependencies from source on CentOS

2018-10-16 Thread Zdenko Podobny
1. Why you are building debug tesseract? 2. Why you are mixing build tools (cmake for leptonica and autotool for rest)? There was reported issue regarding this mix in case of leptonica->tesseract... 3. jped. png, tiff are common lib heavily used by desktop system. Replacing system

Re: [tesseract-ocr] Server performance is 3x as slow versus local machine

2018-10-18 Thread Zdenko Podobny
Why? What is tesseract issue? That tesseract does not have the same speed on different hw??? That is expected. David started discussion on right place - forum. Please use tesseract issue tracker only for issues that can be fixed on tesseract side. We can not fix user side. Zdenko št 18. 10. 201

Re: [tesseract-ocr] patch for #426

2018-10-20 Thread Zdenko Podobny
Thanks. This is exactly how issue should be handled. If some configuration is not working, it should be replaced or fixed. Zdenko so 20. 10. 2018 o 10:16 Marco Atzeri napísal(a): > The attached patch, tested on 4.0.0 RC3 for Cygwin, > should solve the > > https://github.com/tesseract-ocr/tes

Re: [tesseract-ocr] Re: Heads up: release of tesseract 4.0

2018-10-21 Thread Zdenko Podobny
t; >> Is here anybody who build&use tesseract on Android? >> I would like to solve: >> https://github.com/tesseract-ocr/tesseract/issues/1393 >> >> Zdenko >> >> >> ne 14. 10. 2018 o 19:48 Zdenko Podobny napísal(a): >> >>> RC

Re: [tesseract-ocr] This application has requested the Runtime to terminate it in an unusual way

2018-10-23 Thread Zdenko Podobny
First of all: use the latest version for any software when something does not work as expected. Zdenko ut 23. 10. 2018 o 11:04 bruce napísal(a): > Environment > >- Tesseract Version: tesseract-ocr-w64-setup-v4.0.0-beta.1.20180608 >- Platform: Windows 7 professional Service Pack 1 editi

Re: [tesseract-ocr] Train Tesseract 4.0 for Urdu Nastaleeq fonts

2018-10-24 Thread Zdenko Podobny
Did you read https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00? Zdenko st 24. 10. 2018 o 8:59 napísal(a): > I am trying to train Tesseract for Urdu Nastaleeq fonts. I used 10 Text > files of total 1 MB and gave them to the jTesseract editor to create box > files and then c

[tesseract-ocr] Re: Heads up: release of tesseract 4.0

2018-10-24 Thread Zdenko Podobny
act/releases/tag/4.0.0-rc4>* Zdenko ne 14. 10. 2018 o 19:48 Zdenko Podobny napísal(a): > RC 3 is ready[1]. > > Please test, test, test. > Especially if you are building tesseract on other platform than linux or > windows (cppan+cmake). > > If you have any patch for the

Re: [tesseract-ocr] Train Tesseract 4.0 for Urdu Nastaleeq fonts

2018-10-24 Thread Zdenko Podobny
What did not work for you? Zdenko st 24. 10. 2018 o 9:04 napísal(a): > Yes I did. But its not working out for me. > > On Wednesday, October 24, 2018 at 12:31:09 PM UTC+5:30, zdenop wrote: >> >> Did you read >> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00? >> >> Zdenk

Re: [tesseract-ocr] Train Tesseract 4.0 for Urdu Nastaleeq fonts

2018-10-24 Thread Zdenko Podobny
But when I give nastaleeq input file, It gives Error: unichar > بکی in normproto file is not in unichar set and generate garbage > output. > > On Wed, Oct 24, 2018 at 12:42 PM Zdenko Podobny wrote: > >> What did not work for you? >> >> Zdenko >> >> >&

[tesseract-ocr] Tesseract 4.0.0 released

2018-10-29 Thread Zdenko Podobny
Hello all, I am proud to announce that tesseract OCR engine version 4.0.0 ( LSTMs based) was released today. See online Release notes [1]. Source code can be downloaded from GitHub [2]. Known issues and regressions are documented at wiki Planning.[3] [1] https://github.com/tesseract-ocr/tesserac

Re: [tesseract-ocr] Re: Heads up: release of tesseract 4.0

2018-10-29 Thread Zdenko Podobny
dy. >> Please test, test, test. Especially if you are wrapping tesseract and >> creating/providing packages. >> Report problems ASAP in issue tracker, so we can fix it until finale >> release. >> >> [1] https://github.com/tesseract-ocr/tesseract/tree/4.0.0-rc1

Re: [tesseract-ocr] Re: Heads up: release of tesseract 4.0

2018-10-30 Thread Zdenko Podobny
roject that using Tesseract in VC++ ... I will be >>> grateful for any hint to lead me for solving my task. >>> >>> Regards, >>> Flaviu. >>> >>> >>> On Sunday, September 30, 2018 at 8:50:59 PM UTC+3, zdenop wrote: >>>>

Re: [tesseract-ocr] run text2image failed ,text2image not support chinese name fonts?

2018-11-06 Thread Zdenko Podobny
Hello, Please see bug-report and suggested solution: https://github.com/tesseract-ocr/tesseract/issues/1252 I guess problem is in pango, but we would like to test it. Are you able to create simple test case (provide small chi_sim.txt and share font if it is possible) for this issue? Zdenko ut

Re: [tesseract-ocr] Convert image to text shows arrow instead of empty string

2018-11-06 Thread Zdenko Podobny
It is up to you. Default setting is that page finish with page separator. Empty page is also page ;-) Zdenko ut 6. 11. 2018 o 14:10 AutobotRyszard napísal(a): > Actually it doesn't matter if it is a separator or arrow or other symbol. > The problem is that there is any symbol. Output should be

Re: [tesseract-ocr] Convert image to text shows arrow instead of empty string

2018-11-06 Thread Zdenko Podobny
4.0 is new major version with a lot of changes to 3.0x. So incompatibility is fine and expected. 4 code is here for 2 years. Release process (from beta3 to finale) took serious time and I asked several times on both forums (user and developers) for testings. We should not change default behavior w

[tesseract-ocr] Announcement: Tesseract tessdata downloader from GitHub repositories 1.2

2018-11-07 Thread Zdenko Podobny
Hello all, if you are interesting in downloading only some language of traineddata from repositories (or different tagged version) have a look at tessdata_downloader[1] . There is available new version with support proxy authentication and possibility to download files from Script subdirectory.

Re: [tesseract-ocr] run text2image failed ,text2image not support chinese name fonts?

2018-11-08 Thread Zdenko Podobny
What is output of command "chcp" (in command line)? Zdenko st 7. 11. 2018 o 2:55 bruce napísal(a): > hi,zdenop ,thank you for your reply. > my environment is: > windows 7 professional 64bit > tesseract version: > https://digi.bib.uni-ma

Re: [tesseract-ocr] run text2image failed ,text2image not support chinese name fonts?

2018-11-09 Thread Zdenko Podobny
I want to know what is origin output of chcp;-) I think there are (at least) 2 issues: 1. encoding console problem (windows only - on linux it it correct) 2. font related issue (at the moment I am not sure if font itself or pango or text2image) Regarding 1.: When I run: text2image.exe

Re: [tesseract-ocr] Having re-installation issues

2018-11-09 Thread Zdenko Podobny
Did you uninstall tesseract before compiling from source? BTW: what exactly you did when you "uninstalled and reinstalled everything"? Zdenko pi 9. 11. 2018 o 7:10 lance arnold napísal(a): > Hi everyone, saw on github this is where I should go for this. Hope > someone can help! > > tl;dr: I th

Re: [tesseract-ocr] Images with text in white color

2018-11-12 Thread Zdenko Podobny
Can you please provide images for testing? Zdenko po 12. 11. 2018 o 12:38 raghunath rs napísal(a): > Hi, > > I recently experienced that Tesseract 4 is not identifying images with > text in white and background colored > > Is there any specific preprocessing? > > Thanks, > Raghu > > -- > You r

Re: [tesseract-ocr] -c textord_min_linesize 3.25 in tesseract 4 give Errormessage

2018-11-12 Thread Zdenko Podobny
What kind of error message you get? Please share your image for testing too. Zdenko ne 11. 11. 2018 o 15:39 Martin Jenniges napísal(a): > Hello, > > > I have found the follow Tip for tesseract; but when I give this parameter > with -c *textord_min_linesize 3.25 in tesseract 4, I receive a err

Re: [tesseract-ocr] Reducing output image quality to make PDF smaller

2018-11-13 Thread Zdenko Podobny
Tesseract approach is to not re-compress/change image type of input image in pdf creation. So you need to use other tools for creating smaller pdf. Zdenko ut 13. 11. 2018 o 7:49 napísal(a): > I've not used Tesseract in many years until today. I'm very impressed > with what I see now. > > I ne

Re: [tesseract-ocr] trying to increase accuracy of tesseract for this kind of image...

2018-11-14 Thread Zdenko Podobny
Did you read https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality Zdenko st 14. 11. 2018 o 11:09 gökay öze napísal(a): > currently it doenst get anything right > here is the input: > > > % “UMDUWW YMK Jia; > Www 0 A > > [DİBİ “HİHİ? . . > ığğğ) . ,: >

Re: [tesseract-ocr] curious why tesseract does not extract the top two lines of text in this attached receipt image

2018-11-14 Thread Zdenko Podobny
We are also curious :-) : What version of tesseract did you used? What version of trainnedata? Did you read and try suggestion at https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality ? Zdenko st 14. 11. 2018 o 21:04 krishna napísal(a): > I don't see the first two lines scanned at all

Re: [tesseract-ocr] What upper-level header files are really needed when using Tesseract API and what can be left out?

2018-11-24 Thread Zdenko Podobny
If you install tesseract, all needed headers will be installed to specified/system location. Referring headers directly to ccstruct etc. is simple bad design of you application. Zdenko so 24. 11. 2018 o 14:18 'Yuliana Zigangirova' via tesseract-ocr < tesseract-ocr@googlegroups.com> napísal(a):

Re: [tesseract-ocr] curious why tesseract does not extract the top two lines of text in this attached receipt image

2018-11-25 Thread Zdenko Podobny
ImproveQuality suggests much more than adding a border ;-) Actually it suggest Scanning border Removal (cropping - I did it manually, just do demonstrate it helps ;-) ) and noise removal, binarization... Also it suggest several tools. So all your answers to your questions are in wiki page you refer

Re: [tesseract-ocr] What upper-level header files are really needed when using Tesseract API and what can be left out?

2018-11-25 Thread Zdenko Podobny
Your explanation is quite strange for me: you want to build application that use tesseract in environment where you can not build/install tesseract??? Anyway it your problem: installed header your can find here: https://github.com/tesseract-ocr/tesseract/blob/267b79982d64e48d11eaa99ee2618106662a9

Re: [tesseract-ocr] lt-lstmtraining: genericvector.h:720: T& GenericVector::operator[](int) const [with T = char]: Assertion `index >= 0 && index < size_used_' failed.

2018-11-27 Thread Zdenko Podobny
Shree, issue tracker is not for custom training. Simply because there is not enough people and it can not be reproduced... Did you read: "I have been runnig about 130G data which are 4000 files"? Unless you are not able to reproduce problem with very small data, there is IMO nobody would be willi

Re: [tesseract-ocr] lt-lstmtraining: genericvector.h:720: T& GenericVector::operator[](int) const [with T = char]: Assertion `index >= 0 && index < size_used_' failed.

2018-11-27 Thread Zdenko Podobny
issue without testing case (for reproducing problem) is useless and demotivating. Zdenko ut 27. 11. 2018 o 13:23 Shree Devi Kumar napísal(a): > In my opinion, the assert still needs to be documented as an issue, with > LSTM training. > > On Tue, 27 Nov 2018, 05:03 Zdenko Podob

Re: [tesseract-ocr] lt-lstmtraining: genericvector.h:720: T& GenericVector::operator[](int) const [with T = char]: Assertion `index >= 0 && index < size_used_' failed.

2018-11-29 Thread Zdenko Podobny
ining, it will be >> good to document them in the wiki, so that people do not waste time and >> effort in training if they don't have the minimum hardware requirements. >> >> On Tue, 27 Nov 2018, 08:49 Zdenko Podobny > >>> Yes, you can ;-) >>> If y

Re: [tesseract-ocr] [/usr/local/bin/language-specific.sh: 줄 1125: FONTS: unbound variable] Error help me!!

2018-12-05 Thread Zdenko Podobny
Do you use scripts from master repository? There where some updates after 4.0 release... Zdenko st 5. 12. 2018 o 8:19 SEUNGGWANSHIN napísal(a): > hello guys > > i'm training tesseract-lstm with > https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00 > i have some problem using

Re: [tesseract-ocr] when I use shapeclustering command ,it is not response,no error ,no any information

2018-12-29 Thread Zdenko Podobny
What does it mean you follow steps on internet? How did you use shapeclustering command? Dňa so 29. 12. 2018, 9:16 napísal(a): > Today,I use tesseract-ocr for training,but I follow the steps from the > internet,but when I use shapeclustering command ,it is not response,no > error ,no any infor

Re: [tesseract-ocr] Re: Shapeclustering Not Responding

2018-12-29 Thread Zdenko Podobny
Please provide real information and data - not "meta" description of you process. Zdenko so 29. 12. 2018 o 9:37 napísal(a): > I also encounter this problem,I tried tesseract 3.5 and tesseract 4.0, > the result is same. > > 在 2018年7月17日星期二 UTC+8下午5:16:26,xyq...@gmail.com写道: >> >> Hi all, >> >>

Re: [tesseract-ocr] o recognized as 0 on simple image (no captcha style text)

2018-12-31 Thread Zdenko Podobny
I got this result with tesseract 4.0.0 leptonica-1.76.0 (Dec 14 2018, 15:34:47) [MSC v.1916 LIB Release x64] libgif 5.1.4 : libjpeg 9b : libpng 1.6.35 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0 Found AVX Found SSE >tesseract.exe cropped.png - Warning: Invalid resolution

Re: [tesseract-ocr] Getting alternative options for OCR results

2019-01-02 Thread Zdenko Podobny
Did you try have a look at https://github.com/tesseract-ocr/tesseract/wiki/APIExample#example-of-iterator-over-the-classifier-choices-for-a-single-symbol ? PS: I am not sure how it works with 4.00, but it in 3.0x era it provided alternative option for symbols... Zdenko st 2. 1. 2019 o 7:56 Dani

Re: [tesseract-ocr] Getting alternative options for OCR results

2019-01-02 Thread Zdenko Podobny
it is not available from binary. Maybe you can try to use C-API from python: https://github.com/tesseract-ocr/tesseract/wiki/APIExample#c-api-in-python Zdenko st 2. 1. 2019 o 17:35 Daniel Rembiszewski napísal(a): > I'm using the pytesseract wrapper, which I believe wraps over the CLI. > > So I

Re: [tesseract-ocr] Configure error(configure: error: Leptonica 1.74 or higher is required. Try to install libleptonica-dev package.) in Amazon Linux

2019-01-03 Thread Zdenko Podobny
št 3. 1. 2019 o 15:38 napísal(a): > During configure, error occurs. > > checking for LEPTONICA... no > > configure: error: Leptonica 1.74 or higher is required. Try to install > libleptonica-dev package. > > But, I have installed leptonica-1.77.0. > What does it mean? > > > ・Environment > >

<    1   2   3   4   5   6   7   8   9   10   >