[tesseract-ocr] Re: Tesseract 4.0.0 released

2018-10-30 Thread DMGG
Great news, thank you all.

On Monday, October 29, 2018 at 7:02:30 AM UTC-4, zdenop wrote:
>
> Hello all,
>
> I am proud to announce that tesseract OCR engine version 4.0.0 ( LSTMs 
>  based) was released today.
> See online Release notes [1].
> Source code can be downloaded from GitHub [2].
> Known issues and regressions are documented at wiki Planning.[3]
>
> [1] 
> https://github.com/tesseract-ocr/tesseract/wiki/ReleaseNotes#tesseract-release-notes-oct-29-2018---v400
> [2] https://github.com/tesseract-ocr/tesseract/releases/tag/4.0.0
> [3] https://github.com/tesseract-ocr/tesseract/wiki/Planning#400
>
> Zdenko
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/b7d2e991-3c4c-4076-9a62-c8306db2dc86%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Re: Heads up: release of tesseract 4.0

2018-10-30 Thread flaviumarc
I have compiled now the tesseract library, with cppan.

and I have found a test app, with this source code:

/*
dependencies:
pvt.cppan.demo.google.tesseract.libtesseract: master
pvt.cppan.demo.danbloomberg.leptonica: 1
*/

#include 
#include 

#include  // leptonica main header for image io
#include  // tesseract main header

int main(int argc, char *argv[])
{
// if (argc == 1)
// return 1;

tesseract::TessBaseAPI tess;

if (tess.Init("./tessdata", "eng"))
{
std::cout << "OCRTesseract: Could not initialize tesseract." << std::endl;
return 1;
}

// setup
tess.SetPageSegMode(tesseract::PageSegMode::PSM_AUTO);
tess.SetVariable("save_best_choices", "T");

// read image
auto pixs = pixRead(argv[1]);
if (! pixs)
{
std::cout << "Cannot open input file: " << argv[1] << std::endl;
return 1;
}

// recognize
tess.SetImage(pixs);
tess.Recognize(0);

// get result and delete[] returned char* string
std::cout << std::unique_ptr(tess.GetUTF8Text()).get() << std::endl;

// cleanup
tess.Clear();
pixDestroy(&pixs);

return 0;
}

and when I am trying to run this, I got:

Error opening data file ./tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your 
"tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
OCRTesseract: Could not initialize tesseract.

There is not enough to compile tesseract and dependencies, what else should 
I setup in order to run this code in a VC++ project ?



On Monday, October 29, 2018 at 7:50:05 PM UTC+2, zdenop wrote:
>
> I already gave you step by step instructions (the same as on the wiki ;-) 
> but just commands you need to write).
> You replied that is does not work for you without any explanation what 
> does not work. With this state of mind I do not know how to help you.
>
>
>
>
> Zdenko
>
>
> po 29. 10. 2018 o 14:17 > napísal(a):
>
>> Hi Zdenko. Could you send me a example (from a link, or from 
>> something) to reveal me how to compile Tesseract 4 in order to use it in a 
>> VC++ project ? I have a task to do that, and my time is running out, and I 
>> haven't found a functional and working sample of how to do that ... I guess 
>> you already have a project that using Tesseract in VC++ ... I will be 
>> grateful for any hint to lead me for solving my task.
>>
>> Regards,
>> Flaviu.
>>
>>
>> On Sunday, September 30, 2018 at 8:50:59 PM UTC+3, zdenop wrote:
>>>
>>> RC 1[1] ready.
>>> Please test, test, test. Especially if you are wrapping tesseract and 
>>> creating/providing packages.
>>> Report problems ASAP in issue tracker, so we can fix it until finale 
>>> release.
>>>
>>> [1] https://github.com/tesseract-ocr/tesseract/tree/4.0.0-rc1
>>>
>>>
>>> Zdenko
>>>
>>>
>>> so 22. 9. 2018 o 17:06 Zdenko Podobny  napísal(a):
>>>
 Hello,

 I would like to thank all who share their thought about releasing new 
 version of tesseract [1]. I took my time and I decided we should make 
 release at the middle October 2018 (14-21...).

 This should means that no new features will be applied to current code. 
 There is not time for testings. Anyway please feel free to send your 
 patch/PR - it will included after 4.0 release.

 There are several ways, how people can contribute to this process:

- *Developers*: go through open issues, try to fix it. Please make 
a comment when you start do deal with issue, so we can use our capacity 
efficiently.
- *Packagers*: please test if building and packaging process is 
working fine. If something is broken, try to fix&submit it fast. Please 
give a note to forum or me directly, where users can find your 
 "product", 
so we can put information about supported systems to release notes.
- *"Wrappers"*: if you are producing wrapper for tesseract, please 
give a note to forum or me directly if you support tesseract 4: I would 
like to promote your work. 
- *"No code" developers*:
   - check open issues, test it with the latest code if it still 
   valid report, prepare test case if missing, report duplicates, 
 suggest 
   label etc.
   - Improve documentation, release notes, man pages etc...
   - English native speaker: check documentation, release notes etc.

 Thanks to all who help us to get to this point. I really appreciate all 
 ways of support.

 [1] https://github.com/tesseract-ocr/tesseract/issues/1423

 Zdenko

>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com .
>> To post to this group, send email to tesser...@googlegroups.com 
>> .
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/

Re: [tesseract-ocr] Re: Heads up: release of tesseract 4.0

2018-10-30 Thread Zdenko Podobny
First learn to write forum e-mails! Stop stealing email threads.
Your questions/problems has nothing to do with content of original posting.

Zdenko


ut 30. 10. 2018 o 9:12  napísal(a):

> I have compiled now the tesseract library, with cppan.
>
> and I have found a test app, with this source code:
>
> /*
> dependencies:
> pvt.cppan.demo.google.tesseract.libtesseract: master
> pvt.cppan.demo.danbloomberg.leptonica: 1
> */
>
> #include 
> #include 
>
> #include  // leptonica main header for image io
> #include  // tesseract main header
>
> int main(int argc, char *argv[])
> {
> // if (argc == 1)
> // return 1;
>
> tesseract::TessBaseAPI tess;
>
> if (tess.Init("./tessdata", "eng"))
> {
> std::cout << "OCRTesseract: Could not initialize tesseract." << std::endl;
> return 1;
> }
>
> // setup
> tess.SetPageSegMode(tesseract::PageSegMode::PSM_AUTO);
> tess.SetVariable("save_best_choices", "T");
>
> // read image
> auto pixs = pixRead(argv[1]);
> if (! pixs)
> {
> std::cout << "Cannot open input file: " << argv[1] << std::endl;
> return 1;
> }
>
> // recognize
> tess.SetImage(pixs);
> tess.Recognize(0);
>
> // get result and delete[] returned char* string
> std::cout << std::unique_ptr(tess.GetUTF8Text()).get() <<
> std::endl;
>
> // cleanup
> tess.Clear();
> pixDestroy(&pixs);
>
> return 0;
> }
>
> and when I am trying to run this, I got:
>
> Error opening data file ./tessdata/eng.traineddata
> Please make sure the TESSDATA_PREFIX environment variable is set to your
> "tessdata" directory.
> Failed loading language 'eng'
> Tesseract couldn't load any languages!
> OCRTesseract: Could not initialize tesseract.
>
> There is not enough to compile tesseract and dependencies, what else
> should I setup in order to run this code in a VC++ project ?
>
>
>
> On Monday, October 29, 2018 at 7:50:05 PM UTC+2, zdenop wrote:
>>
>> I already gave you step by step instructions (the same as on the wiki ;-)
>> but just commands you need to write).
>> You replied that is does not work for you without any explanation what
>> does not work. With this state of mind I do not know how to help you.
>>
>>
>>
>>
>> Zdenko
>>
>>
>> po 29. 10. 2018 o 14:17  napísal(a):
>>
>>> Hi Zdenko. Could you send me a example (from a link, or from
>>> something) to reveal me how to compile Tesseract 4 in order to use it in a
>>> VC++ project ? I have a task to do that, and my time is running out, and I
>>> haven't found a functional and working sample of how to do that ... I guess
>>> you already have a project that using Tesseract in VC++ ... I will be
>>> grateful for any hint to lead me for solving my task.
>>>
>>> Regards,
>>> Flaviu.
>>>
>>>
>>> On Sunday, September 30, 2018 at 8:50:59 PM UTC+3, zdenop wrote:

 RC 1[1] ready.
 Please test, test, test. Especially if you are wrapping tesseract and
 creating/providing packages.
 Report problems ASAP in issue tracker, so we can fix it until finale
 release.

 [1] https://github.com/tesseract-ocr/tesseract/tree/4.0.0-rc1


 Zdenko


 so 22. 9. 2018 o 17:06 Zdenko Podobny  napísal(a):

> Hello,
>
> I would like to thank all who share their thought about releasing new
> version of tesseract [1]. I took my time and I decided we should make
> release at the middle October 2018 (14-21...).
>
> This should means that no new features will be applied to current
> code. There is not time for testings. Anyway please feel free to send your
> patch/PR - it will included after 4.0 release.
>
> There are several ways, how people can contribute to this process:
>
>- *Developers*: go through open issues, try to fix it. Please make
>a comment when you start do deal with issue, so we can use our capacity
>efficiently.
>- *Packagers*: please test if building and packaging process is
>working fine. If something is broken, try to fix&submit it fast. Please
>give a note to forum or me directly, where users can find your 
> "product",
>so we can put information about supported systems to release notes.
>- *"Wrappers"*: if you are producing wrapper for tesseract, please
>give a note to forum or me directly if you support tesseract 4: I would
>like to promote your work.
>- *"No code" developers*:
>   - check open issues, test it with the latest code if it still
>   valid report, prepare test case if missing, report duplicates, 
> suggest
>   label etc.
>   - Improve documentation, release notes, man pages etc...
>   - English native speaker: check documentation, release notes
>   etc.
>
> Thanks to all who help us to get to this point. I really appreciate
> all ways of support.
>
> [1] https://github.com/tesseract-ocr/tesseract/issues/1423
>
> Zdenko
>
 --
>>> You received this message because you are subscribed to the Google
>>> 

Re: [tesseract-ocr] Re: Heads up: release of tesseract 4.0

2018-10-30 Thread flaviumarc
Ok, sorry. I will update the original post.

On Tuesday, October 30, 2018 at 11:04:08 AM UTC+2, zdenop wrote:
>
> First learn to write forum e-mails! Stop stealing email threads. 
> Your questions/problems has nothing to do with content of original posting.
>
> Zdenko
>
>
> ut 30. 10. 2018 o 9:12 > napísal(a):
>
>> I have compiled now the tesseract library, with cppan.
>>
>> and I have found a test app, with this source code:
>>
>> /*
>> dependencies:
>> pvt.cppan.demo.google.tesseract.libtesseract: master
>> pvt.cppan.demo.danbloomberg.leptonica: 1
>> */
>>
>> #include 
>> #include 
>>
>> #include  // leptonica main header for image io
>> #include  // tesseract main header
>>
>> int main(int argc, char *argv[])
>> {
>> // if (argc == 1)
>> // return 1;
>>
>> tesseract::TessBaseAPI tess;
>>
>> if (tess.Init("./tessdata", "eng"))
>> {
>> std::cout << "OCRTesseract: Could not initialize tesseract." << std::endl;
>> return 1;
>> }
>>
>> // setup
>> tess.SetPageSegMode(tesseract::PageSegMode::PSM_AUTO);
>> tess.SetVariable("save_best_choices", "T");
>>
>> // read image
>> auto pixs = pixRead(argv[1]);
>> if (! pixs)
>> {
>> std::cout << "Cannot open input file: " << argv[1] << std::endl;
>> return 1;
>> }
>>
>> // recognize
>> tess.SetImage(pixs);
>> tess.Recognize(0);
>>
>> // get result and delete[] returned char* string
>> std::cout << std::unique_ptr(tess.GetUTF8Text()).get() << 
>> std::endl;
>>
>> // cleanup
>> tess.Clear();
>> pixDestroy(&pixs);
>>
>> return 0;
>> }
>>
>> and when I am trying to run this, I got:
>>
>> Error opening data file ./tessdata/eng.traineddata
>> Please make sure the TESSDATA_PREFIX environment variable is set to your 
>> "tessdata" directory.
>> Failed loading language 'eng'
>> Tesseract couldn't load any languages!
>> OCRTesseract: Could not initialize tesseract.
>>
>> There is not enough to compile tesseract and dependencies, what else 
>> should I setup in order to run this code in a VC++ project ?
>>
>>
>>
>> On Monday, October 29, 2018 at 7:50:05 PM UTC+2, zdenop wrote:
>>>
>>> I already gave you step by step instructions (the same as on the wiki 
>>> ;-) but just commands you need to write).
>>> You replied that is does not work for you without any explanation what 
>>> does not work. With this state of mind I do not know how to help you.
>>>
>>>
>>>
>>>
>>> Zdenko
>>>
>>>
>>> po 29. 10. 2018 o 14:17  napísal(a):
>>>
 Hi Zdenko. Could you send me a example (from a link, or from 
 something) to reveal me how to compile Tesseract 4 in order to use it in a 
 VC++ project ? I have a task to do that, and my time is running out, and I 
 haven't found a functional and working sample of how to do that ... I 
 guess 
 you already have a project that using Tesseract in VC++ ... I will be 
 grateful for any hint to lead me for solving my task.

 Regards,
 Flaviu.


 On Sunday, September 30, 2018 at 8:50:59 PM UTC+3, zdenop wrote:
>
> RC 1[1] ready.
> Please test, test, test. Especially if you are wrapping tesseract and 
> creating/providing packages.
> Report problems ASAP in issue tracker, so we can fix it until finale 
> release.
>
> [1] https://github.com/tesseract-ocr/tesseract/tree/4.0.0-rc1
>
>
> Zdenko
>
>
> so 22. 9. 2018 o 17:06 Zdenko Podobny  napísal(a):
>
>> Hello,
>>
>> I would like to thank all who share their thought about releasing new 
>> version of tesseract [1]. I took my time and I decided we should make 
>> release at the middle October 2018 (14-21...).
>>
>> This should means that no new features will be applied to current 
>> code. There is not time for testings. Anyway please feel free to send 
>> your 
>> patch/PR - it will included after 4.0 release.
>>
>> There are several ways, how people can contribute to this process:
>>
>>- *Developers*: go through open issues, try to fix it. Please 
>>make a comment when you start do deal with issue, so we can use our 
>>capacity efficiently.
>>- *Packagers*: please test if building and packaging process is 
>>working fine. If something is broken, try to fix&submit it fast. 
>> Please 
>>give a note to forum or me directly, where users can find your 
>> "product", 
>>so we can put information about supported systems to release notes.
>>- *"Wrappers"*: if you are producing wrapper for tesseract, 
>>please give a note to forum or me directly if you support tesseract 
>> 4: I 
>>would like to promote your work. 
>>- *"No code" developers*:
>>   - check open issues, test it with the latest code if it still 
>>   valid report, prepare test case if missing, report duplicates, 
>> suggest 
>>   label etc.
>>   - Improve documentation, release notes, man pages etc...
>>   - English native speaker: check document

Re: [tesseract-ocr] pixRead problem

2018-10-30 Thread flaviumarc
Thank you zdenop, after all, I have solved after all.

Flaviu.

On Wednesday, October 17, 2018 at 12:38:26 PM UTC+3, flavi...@gmail.com 
wrote:
>
> Yes, could be simple, but perhaps you have something installed which I 
> have not ... I guess ...
>
> On Tuesday, October 16, 2018 at 7:30:13 PM UTC+3, zdenop wrote:
>>
>> I do not use  vcpkg. I suggest you to use cppan (you need to install it 
>> and put to path). For me it stupidly easy and it takes cca 15 minutes on my 
>> computer and internet network):
>>
>> gir clone https://github.com/tesseract-ocr/tesseract.git
>> cd tesseract
>> mkdir build64
>> cd build64
>> cppan..
>> cmake .. -G "Visual Studio 15 2017 Win64"
>> cmake --build . --config Release
>>
>>  and it is done (output is in tesseract\build64\bin\Release).
>>
>> Zdenko
>>
>>
>> ut 16. 10. 2018 o 14:32  napísal(a):
>>
>>> Your post are valuable for me, it is first time when I try to use 
>>> tesseract.
>>>
>>> Regarding compiling leptonica and tesseract, it's endless story :)
>>> I have taken from here: https://github.com/Microsoft/vcpkg vcpkg, and 
>>> generated the exe from .bat file.
>>> And then I have tried this command in console: *vcpkg install 
>>> tesseract:x86-windows-static*
>>> and it installed some libraries ( 
>>> zlib[core]:x86-windows-static, libpng[core]:x86-windows-static, 
>>> libjpeg-turbo[core]:x86-windows-static,  liblzma[core]:x86-windows-static, 
>>> tiff[core]:x86-windows-static,  giflib[core]:x86-windows-static,  
>>> leptonica[core]:x86-windows-static,  icu[core]:x86-windows-static), 
>>> and then I get:
>>>
>>> Error: Building package tesseract:x86-windows-static failed with: 
>>> BUILD_FAILED
>>> Please ensure you're using the latest portfiles with `.\vcpkg update`, 
>>> then
>>> submit an issue at https://github.com/Microsoft/vcpkg/issues including:
>>>   Package: tesseract:x86-windows-static
>>>   Vcpkg version: 0.0.113-nohash
>>>
>>> Additionally, attach any relevant sections from the log files above.
>>>
>>> and when I tried the initial code, I get the same errors:
>>>
>>> Error in pixReadMemTiff: function not present
>>> Error in pixReadMem: tiff: no pix returned
>>> Error in pixaGenerateFontFromString: pix not made
>>> Error in bmfCreate: font pixa not made
>>> Error in pixReadStreamPng: function not present
>>> Error in pixReadStream: png: no pix returned
>>> Error in pixRead: pix not read
>>> pImage pointer value: 
>>>
>>> On Tuesday, October 16, 2018 at 12:29:33 PM UTC+3, zdenop wrote:

 You will do everything including complaining but not to read and follow 
 instructs. Right? ;-)

 https://github.com/tesseract-ocr/tesseract/wiki/Compiling#windows

 Zdenko


 ut 16. 10. 2018 o 10:52  napísal(a):

> It is a endless story :)
>
> I have downloded from here cppan,  and I have tried to generate a .sln 
> file with CMake ... but I get the following errors:
>
> CMake Error at CMakeLists.txt:130 (find_package):
> By not providing "FindCPPAN.cmake" in CMAKE_MODULE_PATH this project 
> has
> asked CMake to find a package configuration file provided by "CPPAN", 
> but
> CMake did not find one.
>
> Could not find a package configuration file provided by "CPPAN" with 
> any of
> the following names:
>
> CPPANConfig.cmake
> cppan-config.cmake
>
> Add the installation prefix of "CPPAN" to CMAKE_PREFIX_PATH or set
> "CPPAN_DIR" to a directory containing one of the above files. If 
> "CPPAN"
> provides a separate development package or SDK, be sure it has been
> installed.
>
> strange ... is there any method to compile leptonica and tesseract 
> successfully ?
>
> On Tuesday, October 16, 2018 at 11:22:15 AM UTC+3, zdenop wrote:
>>
>> most easy way for you would be to compile tesseract on windows with 
>> cppan. instruction are on wiki...
>>
>> Dňa ut 16. 10. 2018, 10:14  napísal(a):
>>
>>> Thank you a lot for your prompt answer ! I really appreciate that !
>>>
>>> I have run in cmd line: tesseract --help-extra, I don't spot any 
>>> graphic library option.
>>>
>>> I have to tell you that I am using Windows10, and I have compiled 
>>> leptonica with VS2017, taken from here: 
>>> https://github.com/danbloomberg/leptonica
>>>
>>> I have generate a .sln file with CMake, but I get some warnings 
>>> there:
>>>
>>> Could NOT find GIF (missing: GIF_LIBRARY GIF_INCLUDE_DIR) 
>>>
>>> Could NOT find JPEG (missing: JPEG_LIBRARY JPEG_INCLUDE_DIR) 
>>>
>>> Could NOT find ZLIB (missing: ZLIB_LIBRARY ZLIB_INCLUDE_DIR) 
>>>
>>> Could NOT find PNG (missing: PNG_LIBRARY PNG_PNG_INCLUDE_DIR) 
>>>
>>> Could NOT find TIFF (missing: TIFF_LIBRARY TIFF_INCLUDE_DIR) 
>>>
>>> Could NOT find ZLIB (missing: ZLIB_LIBRARY ZLIB_INCLUDE_DIR) 
>>>
>>> Could NOT find PkgConfig (missing: PKG_CONFIG_EXECUTABLE)
>>>
>>>

Re: [tesseract-ocr] pixRead problem

2018-10-30 Thread flaviumarc
Thank you Zdenop for your support, I have solved.

On Tuesday, October 16, 2018 at 7:30:13 PM UTC+3, zdenop wrote:
>
> I do not use  vcpkg. I suggest you to use cppan (you need to install it 
> and put to path). For me it stupidly easy and it takes cca 15 minutes on my 
> computer and internet network):
>
> gir clone https://github.com/tesseract-ocr/tesseract.git
> cd tesseract
> mkdir build64
> cd build64
> cppan..
> cmake .. -G "Visual Studio 15 2017 Win64"
> cmake --build . --config Release
>
>  and it is done (output is in tesseract\build64\bin\Release).
>
> Zdenko
>
>
> ut 16. 10. 2018 o 14:32 > napísal(a):
>
>> Your post are valuable for me, it is first time when I try to use 
>> tesseract.
>>
>> Regarding compiling leptonica and tesseract, it's endless story :)
>> I have taken from here: https://github.com/Microsoft/vcpkg vcpkg, and 
>> generated the exe from .bat file.
>> And then I have tried this command in console: *vcpkg install 
>> tesseract:x86-windows-static*
>> and it installed some libraries ( 
>> zlib[core]:x86-windows-static, libpng[core]:x86-windows-static, 
>> libjpeg-turbo[core]:x86-windows-static,  liblzma[core]:x86-windows-static, 
>> tiff[core]:x86-windows-static,  giflib[core]:x86-windows-static,  
>> leptonica[core]:x86-windows-static,  icu[core]:x86-windows-static), 
>> and then I get:
>>
>> Error: Building package tesseract:x86-windows-static failed with: 
>> BUILD_FAILED
>> Please ensure you're using the latest portfiles with `.\vcpkg update`, 
>> then
>> submit an issue at https://github.com/Microsoft/vcpkg/issues including:
>>   Package: tesseract:x86-windows-static
>>   Vcpkg version: 0.0.113-nohash
>>
>> Additionally, attach any relevant sections from the log files above.
>>
>> and when I tried the initial code, I get the same errors:
>>
>> Error in pixReadMemTiff: function not present
>> Error in pixReadMem: tiff: no pix returned
>> Error in pixaGenerateFontFromString: pix not made
>> Error in bmfCreate: font pixa not made
>> Error in pixReadStreamPng: function not present
>> Error in pixReadStream: png: no pix returned
>> Error in pixRead: pix not read
>> pImage pointer value: 
>>
>> On Tuesday, October 16, 2018 at 12:29:33 PM UTC+3, zdenop wrote:
>>>
>>> You will do everything including complaining but not to read and follow 
>>> instructs. Right? ;-)
>>>
>>> https://github.com/tesseract-ocr/tesseract/wiki/Compiling#windows
>>>
>>> Zdenko
>>>
>>>
>>> ut 16. 10. 2018 o 10:52  napísal(a):
>>>
 It is a endless story :)

 I have downloded from here cppan,  and I have tried to generate a .sln 
 file with CMake ... but I get the following errors:

 CMake Error at CMakeLists.txt:130 (find_package):
 By not providing "FindCPPAN.cmake" in CMAKE_MODULE_PATH this project has
 asked CMake to find a package configuration file provided by "CPPAN", 
 but
 CMake did not find one.

 Could not find a package configuration file provided by "CPPAN" with 
 any of
 the following names:

 CPPANConfig.cmake
 cppan-config.cmake

 Add the installation prefix of "CPPAN" to CMAKE_PREFIX_PATH or set
 "CPPAN_DIR" to a directory containing one of the above files. If "CPPAN"
 provides a separate development package or SDK, be sure it has been
 installed.

 strange ... is there any method to compile leptonica and tesseract 
 successfully ?

 On Tuesday, October 16, 2018 at 11:22:15 AM UTC+3, zdenop wrote:
>
> most easy way for you would be to compile tesseract on windows with 
> cppan. instruction are on wiki...
>
> Dňa ut 16. 10. 2018, 10:14  napísal(a):
>
>> Thank you a lot for your prompt answer ! I really appreciate that !
>>
>> I have run in cmd line: tesseract --help-extra, I don't spot any 
>> graphic library option.
>>
>> I have to tell you that I am using Windows10, and I have compiled 
>> leptonica with VS2017, taken from here: 
>> https://github.com/danbloomberg/leptonica
>>
>> I have generate a .sln file with CMake, but I get some warnings there:
>>
>> Could NOT find GIF (missing: GIF_LIBRARY GIF_INCLUDE_DIR) 
>>
>> Could NOT find JPEG (missing: JPEG_LIBRARY JPEG_INCLUDE_DIR) 
>>
>> Could NOT find ZLIB (missing: ZLIB_LIBRARY ZLIB_INCLUDE_DIR) 
>>
>> Could NOT find PNG (missing: PNG_LIBRARY PNG_PNG_INCLUDE_DIR) 
>>
>> Could NOT find TIFF (missing: TIFF_LIBRARY TIFF_INCLUDE_DIR) 
>>
>> Could NOT find ZLIB (missing: ZLIB_LIBRARY ZLIB_INCLUDE_DIR) 
>>
>> Could NOT find PkgConfig (missing: PKG_CONFIG_EXECUTABLE)
>>
>>
>> which tell me that I haven't some libraries, but I really don't know 
>> how to achieve them ...
>>
>>
>> Regards,
>>
>>
>> Flaviu.
>>
>>  
>>
>> On Tuesday, October 16, 2018 at 10:10:14 AM UTC+3, zdenop wrote:
>>>
>>> Really? Where did you look???
>>> What is output of leptonica 

[tesseract-ocr] tesstrain.sh with hundreds of fonts

2018-10-30 Thread benda . krisztian
I would like to train the tesseract with hundreds of my fonts. My fonts 
name are numbers and their format like "22.ttf". For creating traineddata I 
use the tesstrain.sh script like this:
tesseract-ocr/tesseract/src/training/tesstrain.sh \
 --fonts_dir processed_fonts \
 --lang eng \
 --langdata_dir tesseract-ocr/langdata \
 --tessdata_dir tesseract-ocr/tesseract/tessdata --training_text 
tesseract-ocr/langdata/eng/eng.training_text \
 --output_dir output \
 --linedata_only \
 --fontlist "1.ttf" "2.ttf" "3.ttf" "4.ttf" "7.ttf" "8.ttf" "10.ttf" 
"11.ttf" "12.ttf" "13.ttf" "15.ttf" "16.ttf" "17.ttf" "18.ttf" "19.ttf" 
"20.ttf" "22.ttf" "23.ttf" "25.ttf" "26.ttf" "31.ttf" "32.ttf" "33.ttf" 
"36.ttf" "38.ttf" "39.ttf" "40.ttf" "41.ttf" "42.ttf" "44.ttf" "52.ttf" 
"53.ttf" "54.ttf" "55.ttf" "56.ttf" "58.ttf" "59.ttf" "60.ttf" "61.ttf" 
"62.ttf" "64.ttf" "65.ttf" "67.ttf" "68.ttf" "69.ttf" "70.ttf" "71.ttf" 
"72.ttf" "73.ttf" "75.ttf" "76.ttf" "79.ttf" "80.ttf" "81.ttf" "82.ttf" 
"83.ttf" "89.ttf" "90.ttf" "91.ttf" "94.ttf" "95.ttf" "96.ttf" "97.ttf" 
"98.ttf" "100.ttf" 
I did not update my cloned repo for a while but I did this a few days ago. 
Since then this command does not work as I expected. It process the first 8 
font and than stop the procession without any error. It does not create 
traineddata just som tif and box files into a temp directory.

What am I doing wrong? Do some constraints come into the training data 
process to limit the data creating with the recent (last 3 months) updates?


-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/f3d41ee8-4ef3-48c6-a608-6090d1826869%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] tesstrain.sh with hundreds of fonts

2018-10-30 Thread Shree Devi Kumar
Please check the log file in the tmp directory. There might be some font
related errors there. There has been pango related change made for fonts
procese recently. Please check the change log.

On Tue, 30 Oct 2018, 09:10 ,  wrote:

> I would like to train the tesseract with hundreds of my fonts. My fonts
> name are numbers and their format like "22.ttf". For creating traineddata I
> use the tesstrain.sh script like this:
> tesseract-ocr/tesseract/src/training/tesstrain.sh \
>  --fonts_dir processed_fonts \
>  --lang eng \
>  --langdata_dir tesseract-ocr/langdata \
>  --tessdata_dir tesseract-ocr/tesseract/tessdata --training_text
> tesseract-ocr/langdata/eng/eng.training_text \
>  --output_dir output \
>  --linedata_only \
>  --fontlist "1.ttf" "2.ttf" "3.ttf" "4.ttf" "7.ttf" "8.ttf" "10.ttf"
> "11.ttf" "12.ttf" "13.ttf" "15.ttf" "16.ttf" "17.ttf" "18.ttf" "19.ttf"
> "20.ttf" "22.ttf" "23.ttf" "25.ttf" "26.ttf" "31.ttf" "32.ttf" "33.ttf"
> "36.ttf" "38.ttf" "39.ttf" "40.ttf" "41.ttf" "42.ttf" "44.ttf" "52.ttf"
> "53.ttf" "54.ttf" "55.ttf" "56.ttf" "58.ttf" "59.ttf" "60.ttf" "61.ttf"
> "62.ttf" "64.ttf" "65.ttf" "67.ttf" "68.ttf" "69.ttf" "70.ttf" "71.ttf"
> "72.ttf" "73.ttf" "75.ttf" "76.ttf" "79.ttf" "80.ttf" "81.ttf" "82.ttf"
> "83.ttf" "89.ttf" "90.ttf" "91.ttf" "94.ttf" "95.ttf" "96.ttf" "97.ttf"
> "98.ttf" "100.ttf"
> I did not update my cloned repo for a while but I did this a few days ago.
> Since then this command does not work as I expected. It process the first 8
> font and than stop the procession without any error. It does not create
> traineddata just som tif and box files into a temp directory.
>
> What am I doing wrong? Do some constraints come into the training data
> process to limit the data creating with the recent (last 3 months) updates?
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/f3d41ee8-4ef3-48c6-a608-6090d1826869%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVoQyaM6SJjgi9imyEWA1mp-j%3DQxz9UiR8GgNuxHh%3DFUw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] How to improve the quality of Training From Scratch

2018-10-30 Thread Shree Devi Kumar
Please read the wiki page regarding training 4.0 and the presentation files
in docs by Ray Smith.

On Tue, 30 Oct 2018, 02:32 bruce,  wrote:

> thank you for your reply ,shree.
> I've seen the training_text and the list of fonts.
> I will try again.
> Before I start my next  Scratch training,I want to ask some questions as
> follows.
>
> 1.Is the training_text containing more characters, the better the training
> results? Is there an upper limit?
>
> 2.Whether the more fonts are used, the better the training results will be?
>
> 3.I find that the official text contains not only Chinese characters, but
> also English characters and numbers.
>If I will use the command like this:  tesseract.exe  test.png
> c:\dir\test -l eng+chi_sim
>Is it better for me to train  a training_text with pure Chinese
> characters?
>
>
> 在 2018年10月30日星期二 UTC+8上午2:43:05,shree写道:
>>
>> https://github.com/tesseract-ocr/langdata_lstm/tree/master/chi_sim
>>
>> On Mon, 29 Oct 2018, 14:41 Shree Devi Kumar,  wrote:
>>
>>> Please look at the langdata_lstm repo, specifically the chi_sim folder.
>>> It has the training_text as well as list of fonts used for LSTM training.
>>>
>>> On Mon, 29 Oct 2018, 05:40 bruce,  wrote:
>>>
 Recently,I'm using tesseract training my chi_sim language. I want to
 train a chi_sim.traineddata better than the official one.
 I have generated a 82915-characters training data.And trained it with 7
 common fonts。
 After 4434207 iterations ,the train rate is lower than 0.016% ,But the
 recognition effect is much worse than the official training library.

 so,I'm confused...

 How to improve the quality of Training?
 Do I need more training data for more training fonts?What is the right
 amount?
 I want to know the training data of the official training library and
 the font range of the official training library.

 --
 You received this message because you are subscribed to the Google
 Groups "tesseract-ocr" group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to tesseract-oc...@googlegroups.com.
 To post to this group, send email to tesser...@googlegroups.com.
 Visit this group at https://groups.google.com/group/tesseract-ocr.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/tesseract-ocr/a7acc320-67f6-42b3-b2c8-99d3db6de7e6%40googlegroups.com
 
 .
 For more options, visit https://groups.google.com/d/optout.

>>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/501bdf42-ee5a-4a2e-92ce-8dbac2cc42be%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWC65_Z2cDV2%3DS-4cDjQmhuq-te%2ByJBB35mZ0aaNxas0Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] How do I train tesseract 4 for the font Comic Sans MS?

2018-10-30 Thread 'rely LIVE' via tesseract-ocr
Hello,

I want to train the default eng.traineddata for the font "Comic Sans MS".
Is it possible at all?
Which files do I need and where do I get them? I already installed 
tesseract 4 on Ubuntu 18.04 and can do simple OCR.
What are the necessary commands to do training?

I know from the basic tutorial, that I have to use tesstrain.sh and 
lstmtraining. But it is much too complicated to understand.

Thanks in advance and kindly regards,
Volker

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/fea4c0a0-2de5-426b-ac0e-8f234fca19eb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] How do I train tesseract 4 for the font Comic Sans MS?

2018-10-30 Thread Shree Devi Kumar
See
https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#fine-tuning-for-impact

Use comic sans font instead of impact, to finetune

On Tue, 30 Oct 2018, 12:32 'rely LIVE' via tesseract-ocr, <
tesseract-ocr@googlegroups.com> wrote:

> Hello,
>
> I want to train the default eng.traineddata for the font "Comic Sans MS".
> Is it possible at all?
> Which files do I need and where do I get them? I already installed
> tesseract 4 on Ubuntu 18.04 and can do simple OCR.
> What are the necessary commands to do training?
>
> I know from the basic tutorial, that I have to use tesstrain.sh and
> lstmtraining. But it is much too complicated to understand.
>
> Thanks in advance and kindly regards,
> Volker
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/fea4c0a0-2de5-426b-ac0e-8f234fca19eb%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXpPmMzYsReGExFzgF3eSLt%2BEmpGWSbYSbYLNe%3DWt626Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] New rus/eng traineddata for tes4

2018-10-30 Thread vngorunov via tesseract-ocr
Hi all! We are making a kofax like system, named soica. And we use tes4. It is 
good now. But there are stil problems with russian OCR. And problems wih 
rus/eng language. Could say if there will be new traineddata soon?

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/2b713fe2-cc03-4ecb-b6fb-0908f33a3dd8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.