Re: Updated: leptonica-1.74.1-1

2017-01-18 Thread Stefan Weil
Hi Marco,

it would be great to also have cross packages of Leptonica 1.74.x
for MinGW (mingw64-i868-*, mingw64-x86_64-*).

They are needed to build Tesseract for Windows, for example.

As far as I know, all dependencies needed for a MinGW Leptonica
are already available.

I'm not familiar with creating packages for Cygwin,
but would be willing to help as far as I can.

Kind regards
Stefan


--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



Re: [BUG] Package mingw64-x86_64-icu is broken

2018-06-18 Thread Stefan Weil
Am 06.06.2018 um 21:13 schrieb Stefan Weil:
> Both mingw64-x86_64-icu-57.1-1 and mingw64-x86_64-icu-57.1-2 are broken:
> 
> This code always fails:
> 
> icu::Normalizer2::getInstance(nullptr, "nfkc", UNORM2_COMPOSE, error_code);
> 
> The problem was detected when comparing Tesseract for Windows
> executables: while the 32 bit version worked fine, the 64 bit version
> failed. The failure could be localized, and the mingw64-x86_64-icu
> package was identified to be causing it.
> 
> https://github.com/tesseract-ocr/tesseract/issues/1625#issuecomment-395161152
> contains details and also a short test code which can be used to
> reproduce the problem.
> 
> The 32 bit package mingw64-i686-icu-57.1-2 works fine.
> 
> Kind regards
> Stefan Weil

Ping. How can I help to get this issue fixed?

--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



Re: [BUG] Package mingw64-x86_64-icu is broken

2018-06-19 Thread Stefan Weil
Am 19.06.2018 um 11:53 schrieb JonY:
> On 06/18/2018 12:12 PM, Stefan Weil wrote:
>> Am 06.06.2018 um 21:13 schrieb Stefan Weil:
>>> Both mingw64-x86_64-icu-57.1-1 and mingw64-x86_64-icu-57.1-2 are broken:
>>>
>>> This code always fails:
>>>
>>> icu::Normalizer2::getInstance(nullptr, "nfkc", UNORM2_COMPOSE, error_code);
>>>
>>> The problem was detected when comparing Tesseract for Windows
>>> executables: while the 32 bit version worked fine, the 64 bit version
>>> failed. The failure could be localized, and the mingw64-x86_64-icu
>>> package was identified to be causing it.
>>>
>>> https://github.com/tesseract-ocr/tesseract/issues/1625#issuecomment-395161152
>>> contains details and also a short test code which can be used to
>>> reproduce the problem.
>>>
>>> The 32 bit package mingw64-i686-icu-57.1-2 works fine.
>>>
>>> Kind regards
>>> Stefan Weil
>>
>> Ping. How can I help to get this issue fixed?
> 
> I noticed cygport is using llvm version of binutils, not sure if that
> broke things.
> 
> Was the issue there if you built it yourself with gcc/binutils?


No, a local build with x86_64-w64-mingw32-gcc works fine.
Tested with 57.1-2, but also with recent versions of icu.

A good indicator of a broken installation is a small icudata57.dll:

$ ls -l /usr/*/sys-root/mingw/bin/icudata57.dll
-rwxr-xr-x 1 25680896 Nov 10  2016
/usr/i686-w64-mingw32/sys-root/mingw/bin/icudata57.dll
-rwxr-xr-x 115872 Nov 10  2016
/usr/x86_64-w64-mingw32/sys-root/mingw/bin/icudata57.dll

The (good) file for 32 bit is much larger than the (broken) file for 64 bit.

Stefan


--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



Re: UTF-8 character encoding

2018-06-20 Thread Stefan Weil
Am 20.06.2018 um 20:09 schrieb Lee:
> I'm looking at
>   https://cygwin.com/packaging-hint-files.html#pvr.hint
> and it starts off with
>   Use UTF-8 character encoding.
> 
> How do I do that and how do I check that I actually did use UTF-8
> character encoding _without_ using file?
> 
> for whatever it's worth:
> $ file unicode.html
> unicode.html: HTML document, UTF-8 Unicode text
> 
> $ file test.c
> test.c: C source, ASCII text
> 
> I used vi to create both files & I'd like to understand why file says
> one is ascii & the other is utf-8
> 
> Thanks,
> Lee

ASCII is a subset of UTF-8, so that's fine.

The file command will report ASCII as long as your text does not contain
any non-ASCII characters. If you add some (for example ÄÖÜ), it should
report UTF-8.

Regards,
Stefan


--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



Re: [ANNOUNCEMENT] icu 62.1-1

2018-06-22 Thread Stefan Weil
Am 22.06.2018 um 19:02 schrieb Ken Brown:
> The following packages have been uploaded to the Cygwin distribution:
> 
> * libicu62-62.1-1
> * libicu-devel-62.1-1
> * icu-doc-62.1-1
> 
> ICU is a mature, widely used set of C/C++ and Java libraries providing
> Unicode and Globalization support for software applications.  ICU is
> widely portable and gives applications the same results on all
> platforms and between C/C++ and Java software.
> 
> This is an update to the latest upstream release.
> 
> Ken Brown
> Cygwin's ICU maintainer


Hi Ken,

thank you for your work.

It would be great to get a similar update for the mingw64-i686-icu and
mingw64-x86_64-icu packages, too. Those packages are still based on icu
57, so rather out of date.

See https://cygwin.com/ml/cygwin/2018-06/msg00222.html for more reasons.

Regards
Stefan Weil

--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



Re: [ANNOUNCEMENT] Test: tesseract-ocr-4.0.0-0.4

2018-08-09 Thread Stefan Weil
Am 08.08.2018 um 19:27 schrieb Marco Atzeri:
> Version 4.0.0-0.4  of packages
> 
>    libtesseract-ocr_4   (API bump)
>    tesseract-ocr
>    tesseract-ocr-devel
>    tesseract-training-util
> 
> and version 4.00-0.4 of relative language data
> 
>    tesseract-ocr-languages (source only)
>    tesseract-ocr-deu
>    tesseract-ocr-eng
>    tesseract-ocr-fra
>    tesseract-ocr-ita
>    tesseract-ocr-nld
>    tesseract-ocr-por
>    tesseract-ocr-spa
>    tesseract-ocr-vie
>    tesseract-training-core
>    tesseract-training-deu
>    tesseract-training-eng
>    tesseract-training-fra
>    tesseract-training-ita
>    tesseract-training-nld
>    tesseract-training-por
>    tesseract-training-spa
>    tesseract-training-vie
> 
> are available in the Cygwin distribution:
> 
> Other language specific data are available upstream
>   https://github.com/tesseract-ocr/tessdata/
> 
> while training data for building new language data are in
>   https://github.com/tesseract-ocr/langdata


Hi Marco,

thank you for providing those Tesseract packages.

A hint: I suggest to remove the tesseract-training-* packages as there
currently does not exist training data for Tesseract 4.0.0.

Regards
Stefan Weil

--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



Re: [ANNOUNCEMENT] Test: tesseract-ocr-4.0.0-0.4

2018-08-09 Thread Stefan Weil
Am 09.08.2018 um 10:19 schrieb Marco Atzeri:
> My understanding is that the trained data "tessdata, tessdata_fast,
> tessdata_best" are coming from the same training data then version 3
> 
> https://github.com/tesseract-ocr/langdata
> 
> It is not that the languages raw data should be changed.
> 
> Regards
> Marco

https://github.com/tesseract-ocr/langdata is valid for Tesseract 3.05.x
and earlier versions.

Tesseract 4.0.0 still supports the old traineddata format, but added new
(and typically better) traineddata based on neural networks. There is
currently no langdata available for those new traineddata.

tessdata_best only contains the new traineddata.

tessdata_fast also contains only new traineddata, but is faster and less
accurate.

tessdata still contains old traineddata for most languages and
additionally new traineddata made from tessdata_best, but using integer
instead of float models (which makes them faster).

tessdata_best, tessdata_fast and tessdata not only contain traineddata
for many languages, but also for "scripts", for example in
https://github.com/tesseract-ocr/tessdata/tree/master/script. Those
models support all languages using the same script, so
https://github.com/tesseract-ocr/tessdata/blob/master/script/Latin.traineddata
supports all languages which use Latin characters (English, French,
Spanish, Italian, German, Danish, ...). A selection of those script
models would be useful for Cygwin, too.

Regards,
Stefan

--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



Re: [ANNOUNCEMENT] icu 63.1-1

2018-10-22 Thread Stefan Weil
Am 22.10.2018 um 16:04 schrieb Ken Brown:
> The following packages have been uploaded to the Cygwin distribution:
> 
> * libicu63-63.1-1
> * libicu-devel-63.1-1
> * icu-doc-63.1-1
> 

What about mingw64-x86_64-icu and mingw64-i686-icu? Will they get an
update, too? Currently they are still at 57.1-2.

Kind regards
Stefan Weil

--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



mingw64-x86_64-icu / mingw64-i686-icu 64.2

2019-04-26 Thread Stefan Weil
The following experimental updates are not available from Cygwin, but
from my personal website https://qemu.weilnetz.de/test/cygwin/local/:

mingw64-x86_64-icu 64.2
mingw64-i686-icu 64.2

I tested the binary packages using a short test program
(https://github.com/tesseract-ocr/tesseract/issues/1625#issuecomment-395166694).
Maybe someone else has other tests and wants to try the new packages.

Both packages are currently unmaintained in Cygwin (old version 57.1,
broken mingw64-x86_64-icu). See issue
https://github.com/cygwinports/mingw64-x86_64-icu/issues/1 and pull
request https://github.com/cygwinports/mingw64-x86_64-icu/pull/2. See
also https://sourceware.org/ml/cygwin/2018-10/msg00209.html and
https://cygwin.com/ml/cygwin/2018-06/msg00222.html.

I need those packages to build Tesseract for Windows.

I would also like to contribute packages for Leptonica (also needed for
Tesseract): mingw64-x86_64-leptonica and mingw64-i686-leptonica.
Building such packages is easy, but how can I can get them into the
official package distribution?

Regards
Stefan Weil


--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



[BUG] Package mingw64-x86_64-icu is broken

2018-06-06 Thread Stefan Weil
Both mingw64-x86_64-icu-57.1-1 and mingw64-x86_64-icu-57.1-2 are broken:

This code always fails:

icu::Normalizer2::getInstance(nullptr, "nfkc", UNORM2_COMPOSE, error_code);

The problem was detected when comparing Tesseract for Windows
executables: while the 32 bit version worked fine, the 64 bit version
failed. The failure could be localized, and the mingw64-x86_64-icu
package was identified to be causing it.

https://github.com/tesseract-ocr/tesseract/issues/1625#issuecomment-395161152
contains details and also a short test code which can be used to
reproduce the problem.

The 32 bit package mingw64-i686-icu-57.1-2 works fine.

Kind regards
Stefan Weil

--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple