Re: Updated: leptonica-1.74.1-1
Hi Marco, it would be great to also have cross packages of Leptonica 1.74.x for MinGW (mingw64-i868-*, mingw64-x86_64-*). They are needed to build Tesseract for Windows, for example. As far as I know, all dependencies needed for a MinGW Leptonica are already available. I'm not familiar with creating packages for Cygwin, but would be willing to help as far as I can. Kind regards Stefan -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: [BUG] Package mingw64-x86_64-icu is broken
Am 06.06.2018 um 21:13 schrieb Stefan Weil: > Both mingw64-x86_64-icu-57.1-1 and mingw64-x86_64-icu-57.1-2 are broken: > > This code always fails: > > icu::Normalizer2::getInstance(nullptr, "nfkc", UNORM2_COMPOSE, error_code); > > The problem was detected when comparing Tesseract for Windows > executables: while the 32 bit version worked fine, the 64 bit version > failed. The failure could be localized, and the mingw64-x86_64-icu > package was identified to be causing it. > > https://github.com/tesseract-ocr/tesseract/issues/1625#issuecomment-395161152 > contains details and also a short test code which can be used to > reproduce the problem. > > The 32 bit package mingw64-i686-icu-57.1-2 works fine. > > Kind regards > Stefan Weil Ping. How can I help to get this issue fixed? -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: [BUG] Package mingw64-x86_64-icu is broken
Am 19.06.2018 um 11:53 schrieb JonY: > On 06/18/2018 12:12 PM, Stefan Weil wrote: >> Am 06.06.2018 um 21:13 schrieb Stefan Weil: >>> Both mingw64-x86_64-icu-57.1-1 and mingw64-x86_64-icu-57.1-2 are broken: >>> >>> This code always fails: >>> >>> icu::Normalizer2::getInstance(nullptr, "nfkc", UNORM2_COMPOSE, error_code); >>> >>> The problem was detected when comparing Tesseract for Windows >>> executables: while the 32 bit version worked fine, the 64 bit version >>> failed. The failure could be localized, and the mingw64-x86_64-icu >>> package was identified to be causing it. >>> >>> https://github.com/tesseract-ocr/tesseract/issues/1625#issuecomment-395161152 >>> contains details and also a short test code which can be used to >>> reproduce the problem. >>> >>> The 32 bit package mingw64-i686-icu-57.1-2 works fine. >>> >>> Kind regards >>> Stefan Weil >> >> Ping. How can I help to get this issue fixed? > > I noticed cygport is using llvm version of binutils, not sure if that > broke things. > > Was the issue there if you built it yourself with gcc/binutils? No, a local build with x86_64-w64-mingw32-gcc works fine. Tested with 57.1-2, but also with recent versions of icu. A good indicator of a broken installation is a small icudata57.dll: $ ls -l /usr/*/sys-root/mingw/bin/icudata57.dll -rwxr-xr-x 1 25680896 Nov 10 2016 /usr/i686-w64-mingw32/sys-root/mingw/bin/icudata57.dll -rwxr-xr-x 115872 Nov 10 2016 /usr/x86_64-w64-mingw32/sys-root/mingw/bin/icudata57.dll The (good) file for 32 bit is much larger than the (broken) file for 64 bit. Stefan -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: UTF-8 character encoding
Am 20.06.2018 um 20:09 schrieb Lee: > I'm looking at > https://cygwin.com/packaging-hint-files.html#pvr.hint > and it starts off with > Use UTF-8 character encoding. > > How do I do that and how do I check that I actually did use UTF-8 > character encoding _without_ using file? > > for whatever it's worth: > $ file unicode.html > unicode.html: HTML document, UTF-8 Unicode text > > $ file test.c > test.c: C source, ASCII text > > I used vi to create both files & I'd like to understand why file says > one is ascii & the other is utf-8 > > Thanks, > Lee ASCII is a subset of UTF-8, so that's fine. The file command will report ASCII as long as your text does not contain any non-ASCII characters. If you add some (for example ÄÖÜ), it should report UTF-8. Regards, Stefan -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: [ANNOUNCEMENT] icu 62.1-1
Am 22.06.2018 um 19:02 schrieb Ken Brown: > The following packages have been uploaded to the Cygwin distribution: > > * libicu62-62.1-1 > * libicu-devel-62.1-1 > * icu-doc-62.1-1 > > ICU is a mature, widely used set of C/C++ and Java libraries providing > Unicode and Globalization support for software applications. ICU is > widely portable and gives applications the same results on all > platforms and between C/C++ and Java software. > > This is an update to the latest upstream release. > > Ken Brown > Cygwin's ICU maintainer Hi Ken, thank you for your work. It would be great to get a similar update for the mingw64-i686-icu and mingw64-x86_64-icu packages, too. Those packages are still based on icu 57, so rather out of date. See https://cygwin.com/ml/cygwin/2018-06/msg00222.html for more reasons. Regards Stefan Weil -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: [ANNOUNCEMENT] Test: tesseract-ocr-4.0.0-0.4
Am 08.08.2018 um 19:27 schrieb Marco Atzeri: > Version 4.0.0-0.4 of packages > > libtesseract-ocr_4 (API bump) > tesseract-ocr > tesseract-ocr-devel > tesseract-training-util > > and version 4.00-0.4 of relative language data > > tesseract-ocr-languages (source only) > tesseract-ocr-deu > tesseract-ocr-eng > tesseract-ocr-fra > tesseract-ocr-ita > tesseract-ocr-nld > tesseract-ocr-por > tesseract-ocr-spa > tesseract-ocr-vie > tesseract-training-core > tesseract-training-deu > tesseract-training-eng > tesseract-training-fra > tesseract-training-ita > tesseract-training-nld > tesseract-training-por > tesseract-training-spa > tesseract-training-vie > > are available in the Cygwin distribution: > > Other language specific data are available upstream > https://github.com/tesseract-ocr/tessdata/ > > while training data for building new language data are in > https://github.com/tesseract-ocr/langdata Hi Marco, thank you for providing those Tesseract packages. A hint: I suggest to remove the tesseract-training-* packages as there currently does not exist training data for Tesseract 4.0.0. Regards Stefan Weil -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: [ANNOUNCEMENT] Test: tesseract-ocr-4.0.0-0.4
Am 09.08.2018 um 10:19 schrieb Marco Atzeri: > My understanding is that the trained data "tessdata, tessdata_fast, > tessdata_best" are coming from the same training data then version 3 > > https://github.com/tesseract-ocr/langdata > > It is not that the languages raw data should be changed. > > Regards > Marco https://github.com/tesseract-ocr/langdata is valid for Tesseract 3.05.x and earlier versions. Tesseract 4.0.0 still supports the old traineddata format, but added new (and typically better) traineddata based on neural networks. There is currently no langdata available for those new traineddata. tessdata_best only contains the new traineddata. tessdata_fast also contains only new traineddata, but is faster and less accurate. tessdata still contains old traineddata for most languages and additionally new traineddata made from tessdata_best, but using integer instead of float models (which makes them faster). tessdata_best, tessdata_fast and tessdata not only contain traineddata for many languages, but also for "scripts", for example in https://github.com/tesseract-ocr/tessdata/tree/master/script. Those models support all languages using the same script, so https://github.com/tesseract-ocr/tessdata/blob/master/script/Latin.traineddata supports all languages which use Latin characters (English, French, Spanish, Italian, German, Danish, ...). A selection of those script models would be useful for Cygwin, too. Regards, Stefan -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: [ANNOUNCEMENT] icu 63.1-1
Am 22.10.2018 um 16:04 schrieb Ken Brown: > The following packages have been uploaded to the Cygwin distribution: > > * libicu63-63.1-1 > * libicu-devel-63.1-1 > * icu-doc-63.1-1 > What about mingw64-x86_64-icu and mingw64-i686-icu? Will they get an update, too? Currently they are still at 57.1-2. Kind regards Stefan Weil -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
mingw64-x86_64-icu / mingw64-i686-icu 64.2
The following experimental updates are not available from Cygwin, but from my personal website https://qemu.weilnetz.de/test/cygwin/local/: mingw64-x86_64-icu 64.2 mingw64-i686-icu 64.2 I tested the binary packages using a short test program (https://github.com/tesseract-ocr/tesseract/issues/1625#issuecomment-395166694). Maybe someone else has other tests and wants to try the new packages. Both packages are currently unmaintained in Cygwin (old version 57.1, broken mingw64-x86_64-icu). See issue https://github.com/cygwinports/mingw64-x86_64-icu/issues/1 and pull request https://github.com/cygwinports/mingw64-x86_64-icu/pull/2. See also https://sourceware.org/ml/cygwin/2018-10/msg00209.html and https://cygwin.com/ml/cygwin/2018-06/msg00222.html. I need those packages to build Tesseract for Windows. I would also like to contribute packages for Leptonica (also needed for Tesseract): mingw64-x86_64-leptonica and mingw64-i686-leptonica. Building such packages is easy, but how can I can get them into the official package distribution? Regards Stefan Weil -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
[BUG] Package mingw64-x86_64-icu is broken
Both mingw64-x86_64-icu-57.1-1 and mingw64-x86_64-icu-57.1-2 are broken: This code always fails: icu::Normalizer2::getInstance(nullptr, "nfkc", UNORM2_COMPOSE, error_code); The problem was detected when comparing Tesseract for Windows executables: while the 32 bit version worked fine, the 64 bit version failed. The failure could be localized, and the mingw64-x86_64-icu package was identified to be causing it. https://github.com/tesseract-ocr/tesseract/issues/1625#issuecomment-395161152 contains details and also a short test code which can be used to reproduce the problem. The 32 bit package mingw64-i686-icu-57.1-2 works fine. Kind regards Stefan Weil -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple