Re: [tesseract-ocr] Re: Tesseract 4.1.0 released

2019-12-08 Thread NY C
Also, I think CUBE is removed from tesseract 4x. I found it very strange that there is no suitable OEM value in tess-two 9.0.0. Could somebody help me here. Do I miss anything to make tessdata_fast work in tess-two? NY C於 2019年12月7日星期六 UTC+8下午5時37分59秒寫道: > > I changed the the oem to this as

Re: [tesseract-ocr] Re: Tesseract 4.1.0 released

2019-12-08 Thread NY C
Also, I think CUBE is removed from tesseract 4x. I found it strange to have this CUBE OEM value in tess-two 9.0.0. Could somebody help me here. Do I miss anything to make tessdata_fast work in tess-two? NY C於 2019年12月7日星期六 UTC+8下午5時37分59秒寫道: > > I changed the the oem to this as you said : > ba

Re: [tesseract-ocr] Re: Tesseract 4.1.0 released

2019-12-07 Thread NY C
I changed the the oem to this as you said : baseApi.init(pathfiles, language, TessBaseAPI.OEM_CUBE_ONLY); but it still crashes. I tried all the parameters I can find (OEM_TESSERACT_ONLY = 0, OEM_CUBE_ONLY = 1, OEM_TESSERACT_CUBE_COMBINED = 2, OEM_DEFAULT = 3) They crashes on the same line. >>

Re: [tesseract-ocr] Re: Tesseract 4.1.0 released

2019-12-07 Thread Shree Devi Kumar
tessdata supports both legacy engine and lstm engine. Tessdata_fast and tessdata_best only support lstm engine. To use tessdata_fast , use oem engine code 1. On command line it is --oem 1.please look up the corresponding syntax. On Sat, Dec 7, 2019, 14:06 NY C wrote: > Hi, I am using tess-two

Re: [tesseract-ocr] Re: Tesseract 4.1.0 released

2019-12-07 Thread Shree Devi Kumar
ocrEngineMode On Sat, Dec 7, 2019, 14:35 Shree Devi Kumar wrote: > tessdata supports both legacy engine and lstm engine. Tessdata_fast and > tessdata_best only support lstm engine. > > To use tessdata_fast , use oem engine code 1. > > On command line it is --oem 1.please look up the correspondi

Re: [tesseract-ocr] Re: Tesseract 4.1.0 released

2019-12-07 Thread NY C
Hi, I am using tess-two for OCR. The version I use is : https://github.com/alexcohn/tess-two Code: TessBaseAPI baseApi = new TessBaseAPI(); baseApi.setDebug(true); baseApi.init(pathfiles, language); baseApi.setVariable(TessBaseAPI.VAR_CHAR_WHITELIST, "01234567

Re: [tesseract-ocr] Re: Tesseract 4.1.0 released

2019-08-14 Thread Alex Cohn
On Wednesday, August 14, 2019 at 11:01:44 AM UTC+3, JB Data31 wrote: > > Build done. > >> ... >> [arm64-v8a] StaticLibrary : libpngt_static.a >> [arm64-v8a] Executable : tesseract >> > > Is it the *static command-line executable tesseract* WIKI says ? > >> $ file tess-two-git-3/tess-two/obj/lo

Re: [tesseract-ocr] Re: Tesseract 4.1.0 released

2019-08-14 Thread JB Data31
Build done. > ... > [arm64-v8a] StaticLibrary : libpngt_static.a > [arm64-v8a] Executable : tesseract > Is it the *static command-line executable tesseract* WIKI says ? > $ file tess-two-git-3/tess-two/obj/local/arm64-v8a/tesseract > tess-two-git-3/tess-two/obj/local/arm64-v8a/tesseract: EL

Re: [tesseract-ocr] Re: Tesseract 4.1.0 released

2019-08-13 Thread Alex Cohn
Oh, I now understand the problem. You need git clone --recurse-submodules. To add the missing submodules after *clone*, git submodule init git submodule update BR, Alex On Tuesday, August 13, 2019 at 1:31:20 PM UTC+3, JB Data31 wrote: > > *$ git clone -b "4.1" --single-branch > https://gith

Re: [tesseract-ocr] Re: Tesseract 4.1.0 released

2019-08-13 Thread JB Data31
*$ git clone -b "4.1" --single-branch https://github.com/alexcohn/tess-two.git tess-two-git-2* Cloning into 'tess-two-git-2'... remote: Enumerating objects: 34, done. remote: Counting objects: 100% (34/34), done. remote: Compressing objects: 100% (34/34), done. remote: Total 11423 (delta 1), reuse

Re: [tesseract-ocr] Re: Tesseract 4.1.0 released

2019-08-08 Thread Alex Cohn
I believe that there is no true need to change anything. To run unittest (and even training) on Android, it's enough to choose __ANDROID_API__=28 (or higher). Methinks that this is a reasonable restriction. The production version of the library can still be built with __ANDROID_API__=16 and ex

Re: [tesseract-ocr] Re: Tesseract 4.1.0 released

2019-08-07 Thread René Hansen
Agreed. Maybe the real solution after all, is to drop the usage of glob, and go for a portable solution? This is how I got around it initially. Not the best code though: https://github.com/tesseract-ocr/tesseract/compare/4.1.0...rhardih:4.1.0-rhardih On Wed, 7 Aug 2019 at 10:56, 'Stefan Weil' v

Re: [tesseract-ocr] Re: Tesseract 4.1.0 released

2019-08-07 Thread 'Stefan Weil' via tesseract-ocr
It's a pity that I did not see this discussion earlier. I understand that old Android now builds fine. On the other side, the Appveyor CI build for Windows was now broken, and unittest still no longer build. That's not a good result. :-( I therefore suggest to go back to my commit

Re: [tesseract-ocr] Re: Tesseract 4.1.0 released

2019-08-05 Thread René Hansen
Alright, I'll keep the fork around till then. Thanks. /René On Mon, 5 Aug 2019 at 08:56, Zdenko Podobny wrote: > I would like to create/release 4.1.1 (just cherry-pick fixes from > master/5.0.0), but it requires time... Maybe end of August, just to see > what happens in master repository. >

Re: [tesseract-ocr] Re: Tesseract 4.1.0 released

2019-08-05 Thread Alex Cohn
On Monday, August 5, 2019 at 8:21:32 AM UTC+3, JB Data31 wrote: > > https://github.com/tesseract-ocr/tesseract/wiki/Compiling#android > > *$ date* >> Mon Aug 5 04:58:08 UTC 2019 >> *$ git clone https://github.com/alexcohn/tess-two.git >> tess-two-git*

Re: [tesseract-ocr] Re: Tesseract 4.1.0 released

2019-08-04 Thread Zdenko Podobny
I would like to create/release 4.1.1 (just cherry-pick fixes from master/5.0.0), but it requires time... Maybe end of August, just to see what happens in master repository. Zdenko po 5. 8. 2019 o 8:35 René Hansen napísal(a): > Awesome! Thanks Zdenko. > > Would it be possible to tag c5a50b93ce

Re: [tesseract-ocr] Re: Tesseract 4.1.0 released

2019-08-04 Thread René Hansen
Awesome! Thanks Zdenko. Would it be possible to tag c5a50b93ce as something like 4.1.1? That way I can target an official release and get rid of my own fork. /René On Mon, 5 Aug 2019 at 08:15, Zdenko Podobny wrote: > I am sorry I found the problem - moving fileio.* was already staged, so i

Re: [tesseract-ocr] Re: Tesseract 4.1.0 released

2019-08-04 Thread Zdenko Podobny
I am sorry I found the problem - moving fileio.* was already staged, so it did not became part of patch... Now it is part of master, so you can cherry-pick it for 4.1 if needed. Zdenko št 1. 8. 2019 o 19:14 Zdenko Podobny napísal(a): > try to run build in new directory. There should not be any

Re: [tesseract-ocr] Re: Tesseract 4.1.0 released

2019-08-04 Thread JB Data31
https://github.com/tesseract-ocr/tesseract/wiki/Compiling#android *$ date* > Mon Aug 5 04:58:08 UTC 2019 > *$ git clone https://github.com/alexcohn/tess-two.git > tess-two-git* > Cloning into 'tess-two-git'... > ... > Resolving deltas: 100% (7359/7359),

Re: [tesseract-ocr] Re: Tesseract 4.1.0 released

2019-08-02 Thread Alex Cohn
On Friday, August 2, 2019 at 8:52:02 AM UTC+3, JB Data31 wrote: > > *$ git clone https://github.com/alexcohn/tess-two.git >> tess-two-git* >> Cloning into 'tess-two-git'... >> ... >> *$ ndk-build -C tess-two-git/tess-two tesseract APP_ABI=arm64-v8a >> A

Re: [tesseract-ocr] Re: Tesseract 4.1.0 released

2019-08-01 Thread JB Data31
> > *$ git clone https://github.com/alexcohn/tess-two.git > tess-two-git* > Cloning into 'tess-two-git'... > ... > *$ ndk-build -C tess-two-git/tess-two tesseract APP_ABI=arm64-v8a > APP_PLATFORM=android-24* > Android NDK: WARNING: APP_PLATFORM android-24

Re: [tesseract-ocr] Re: Tesseract 4.1.0 released

2019-08-01 Thread Zdenko Podobny
try to run build in new directory. There should not be any ccutil/fileio.cpp.o - file is move to training part Zdenko št 1. 8. 2019 o 19:05 René Hansen napísal(a): > Thanks Alex. > > Cool Zdenko, > > I can't find any reference to the unittest sub-directory in the main > CMakeLists.txt, so

Re: [tesseract-ocr] Re: Tesseract 4.1.0 released

2019-08-01 Thread René Hansen
Thanks Alex. Cool Zdenko, I can't find any reference to the unittest sub-directory in the main CMakeLists.txt, so it seems to only be included in the autotools build. Guess that is not a problem then. I've tested your patch; I'm building tag tag 4.1.0-rhardih-00

Re: [tesseract-ocr] Re: Tesseract 4.1.0 released

2019-08-01 Thread Zdenko Podobny
Thanks. Attached patch should fix it (it does not solve unittest part @Shree: are you able to fix unittest). Can you test it? Zdenko št 1. 8. 2019 o 13:03 René Hansen napísal(a): > Good point, I see *fileio.h* referenced here: > > unittest/fileio_test.cc > unittest/ligature_table_test.cc > uni

Re: [tesseract-ocr] Re: Tesseract 4.1.0 released

2019-08-01 Thread Alex Cohn
It's nice that there are different ways to achieve (almost) same things with as little hassle as possible. BTW, I also added reference to your B.A.D. in https://github.com/tesseract-ocr/tesseract/wiki/4.0-Docker-Containers. Sincerely, Alex -- You received this message because you are subscrib

Re: [tesseract-ocr] Re: Tesseract 4.1.0 released

2019-08-01 Thread René Hansen
Good point, I see *fileio.h* referenced here: unittest/fileio_test.cc unittest/ligature_table_test.cc unittest/include_gunit.h unittest/pango_font_info_test.cc src/training/boxchar.cpp src/training/text2image.cpp src/training/pango_font_info.cpp src/training/lang_model_helpers.cpp src/training/uni

Re: [tesseract-ocr] Re: Tesseract 4.1.0 released

2019-08-01 Thread René Hansen
I can completely understand the reasons and need for the way tess-two does things. If I was working with Android Studio and Java/Kotlin, I would probably never have spend time on this. Last time I used tess-two it worked flawlessly. I am coming at this from the perspective of Qt projects however.

[tesseract-ocr] Re: Tesseract 4.1.0 released

2019-07-31 Thread Sergei Sokolov
Is there a docker container with 4.1.0 version available on docker hub? On Sunday, 7 July 2019 10:34:37 UTC-7, zdenop wrote: > > Hello all, > > I am proud to announce that tesseract OCR engine version 4.1.0 - the bug > fix release with new renders (API extension) Alto, LSTMBox, WordStrBox. > Se

Re: [tesseract-ocr] Re: Tesseract 4.1.0 released

2019-07-31 Thread Alex Cohn
On Wednesday, July 31, 2019 at 1:43:24 PM UTC+3, René Hansen wrote: > > Thanks Alex, I'll go and have a look. One would imagine that -D > BUILD_TRAINING_TOOLS=OFF should be enough. > Disabling build of training is not enough. You must explicitly exclude *fileio.cpp*, too, because it's not a part

Re: [tesseract-ocr] Re: Tesseract 4.1.0 released

2019-07-31 Thread Alex Cohn
Hi René, thanks for your detailed post. Let me try to explain why I prefer to use NDK 'directly'. When we need some libraries (like *libtess*) as parts of our apps, we need the library integrated into the app. Often, the library comes with JNI layer and can be accessed (inderectly) from the

Re: [tesseract-ocr] Re: Tesseract 4.1.0 released

2019-07-31 Thread René Hansen
Thanks Alex, I'll go and have a look. One would imagine that -D BUILD_TRAINING_TOOLS=OFF should be enough. I know Docker is not everyones cup of tea, but in my case, I've just become so used to trying to avoid installing anything on my host system if possible.. One thing that got me down this pat

Re: [tesseract-ocr] Re: Tesseract 4.1.0 released

2019-07-31 Thread Alex Cohn
I don't build training, and I excluded fileio, following the path of Robyer ( https://github.com/adaptech-cz/Tesseract4Android/commit/7852e08fa51ae1461883e5cf1dc858d531bb21c2

Re: [tesseract-ocr] Re: Tesseract 4.1.0 released

2019-07-30 Thread Shree Devi Kumar
Please make a PR in the tesseract repo regarding the changes you needed for Android 6.0. I am sure there is a way to build without training tools on Android. With autotools it is a separate step. Please update the wiki with link to your repo as an alternative way to build on Android. On Tue, 30

Re: [tesseract-ocr] Re: Tesseract 4.1.0 released

2019-07-30 Thread René Hansen
A bit late to the party here, but I've just pushed changes that update build configs for tesseract 4 in https://github.com/rhardih/bad. It now supports building 4.0.0 *and* 4.1.0. I've tested both versions on x86, armv7-a and arm64-v8a. All seems to be working just fine. I'm using the default bui

Re: [tesseract-ocr] Re: Tesseract 4.1.0 released

2019-07-28 Thread Alex Cohn
It's there, in https://github.com/tesseract-ocr/tesseract/wiki/Compiling#android On Sun, 28 Jul 2019, 17:11 Shree Devi Kumar, wrote: > Thanks. Please add the info to Tesseract wiki page also. > > On Sun, 28 Jul 2019, 18:42 Alex Cohn, wrote: > >> Hi everybody, >> >> I am proud to announce Androi

Re: [tesseract-ocr] Re: Tesseract 4.1.0 released

2019-07-28 Thread Shree Devi Kumar
Thanks. Please add the info to Tesseract wiki page also. On Sun, 28 Jul 2019, 18:42 Alex Cohn, wrote: > Hi everybody, > > I am proud to announce Android support for the new 4.1.0 version of > tesseract OCR engine. This repo [1] includes both 3.05 and 4.1 branches, > and lets you painlessly build

[tesseract-ocr] Re: Tesseract 4.1.0 released

2019-07-28 Thread Alex Cohn
Hi everybody, I am proud to announce Android support for the new 4.1.0 version of tesseract OCR engine. This repo [1] includes both 3.05 and 4.1 branches, and lets you painlessly build a static command-line binary. In addition, it builds the Java binding, so *libtress *and *liblept *can be used

Re: [tesseract-ocr] Re: Tesseract 4.1.0 released

2019-07-28 Thread Shree Devi Kumar
It is not a bug but is intentional. For details please see discussion at https://github.com/tesseract-ocr/tesseract/issues/648#issuecomment-271870748 On Sat, Jul 27, 2019 at 4:14 PM Abdou wrote: > > Hello everyone I tried to use OCRD-train with tesseract 4.1 but I did not > succeed. I noticed t

[tesseract-ocr] Re: Tesseract 4.1.0 released

2019-07-27 Thread Abdou
Hello everyone I tried to use OCRD-train with tesseract 4.1 but I did not succeed. I noticed that with the RTL language, the wordstrbox reversed the text and wrote it as an LTR language. is a bug or I have to change some configuration for it to work well Thank you Le dimanche 7 juillet 2019

[tesseract-ocr] Re: Tesseract 4.1.0 released

2019-07-12 Thread Joseph DiFrancisco
When will this release be available in Homebrew? 4.0.0 is still the current formula https://formulae.brew.sh/formula/tesseract On Sunday, July 7, 2019 at 10:34:37 AM UTC-7, zdenop wrote: > > Hello all, > > I am proud to announce that tesseract OCR engine version 4.1.0 - the bug > fix release wi

Re: [tesseract-ocr] Re: Tesseract 4.1.0 released

2019-07-12 Thread Abstract
I mean I tried to compile with vcpkg tool (it's the only makes short names files in Windows native style, others make long prefixes with no chance to prevent it), as it's a nightmare to build all the used libs manually. As I wrote, there're no packages for 4.1.0 version, so I can only use 4.0.0

Re: [tesseract-ocr] Re: Tesseract 4.1.0 released

2019-07-12 Thread Shree Devi Kumar
See https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc Lstmbox and wordstrbox create box files for training. Alto creates XML output. Hocr creates HTML output. On Fri, 12 Jul 2019, 13:39 ElGato ElMago, wrote: > Hello, > > How do you use Alto, LSTMBox, and WordStrBox? A

[tesseract-ocr] Re: Tesseract 4.1.0 released

2019-07-12 Thread ElGato ElMago
Hello, How do you use Alto, LSTMBox, and WordStrBox? Are they options for training or do you use them as command line options for tesseract? ElMagoElGato 2019年7月8日月曜日 2時34分37秒 UTC+9 zdenop: > > Hello all, > > I am proud to announce that tesseract OCR engine version 4.1.0 - the bug > fix releas

Re: [tesseract-ocr] Re: Tesseract 4.1.0 released

2019-07-08 Thread Zdenko Podobny
I do not know what do you mean with: cannot compile due to C++11 incorrect changes I just tried: > mkdir build.msvc && cd build.msvc > "c:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Auxiliary\Build\vcvars64.bat" > set PKG_CONFIG_PATH=F:/win64/lib/pkgconfig/ > set INSTALL_DIR=F

Re: [tesseract-ocr] Re: Tesseract 4.1.0 released

2019-07-08 Thread Zdenko Podobny
We do not maintained vcpkg. We officially support autotools, cmake (clang, msvc, g++),cppan (depreciated) and sw builds. Or other way around - there are people that use these tools and contribute necessary changes. Zdenko po 8. 7. 2019 o 12:05

[tesseract-ocr] Re: Tesseract 4.1.0 released

2019-07-08 Thread Abstract
Hi ! But what about vcpkg update for this version ? vcpkg is still 4.0.0, while --head version cannot compile due to C++11 incorrect changes -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving ema