Re: UTF-8 in jessie

2013-10-14 Thread Johannes Schauer
Hi, Quoting Adam Borowski (2013-08-12 02:51:52) > On Mon, May 06, 2013 at 02:49:57PM +0200, Andreas Beckmann wrote: > > now might be the right time to start a discussion about release goals > > for jessie. > > I would like to propose full UTF-8 support. I don't mean here full > support for all o

Re: UTF-8 in jessie

2013-09-17 Thread Jakub Wilk
* Adam Borowski , 2013-08-12, 02:51: Detecting non-UTF files is easy: * false positives are impossible * false negatives are extremely unlikely: combinations of letters that would happen to match a valid utf character don't happen naturally, and even if they did, every single combination in the

Re: UTF-8 in jessie

2013-08-29 Thread Russ Allbery
Ian Jackson writes: > Jonas Smedegaard writes ("Re: UTF-8 in jessie"): >> How about we simply mention explicitly that `arcane quoting' - even if >> arguably related to UTF-8 encoding, should be classified not as >> release-critical bugs but as spelling errors.

Re: UTF-8 in jessie

2013-08-29 Thread Ian Jackson
Jonas Smedegaard writes ("Re: UTF-8 in jessie"): > Quoting Ian Jackson (2013-08-29 18:03:22) > > Jonas Smedegaard writes ("Re: UTF-8 in jessie"): > >> I believe the underlying issue is the one summarized here: > >> https://en.wikipedia.org/wiki/Ty

Re: UTF-8 in jessie

2013-08-29 Thread Jonas Smedegaard
Quoting Ian Jackson (2013-08-29 18:03:22) > Jonas Smedegaard writes ("Re: UTF-8 in jessie"): >> I believe the underlying issue is the one summarized here: >> https://en.wikipedia.org/wiki/Typewriter_apostrophe#ASCII_encoding > > Yes. > >> How about w

Re: UTF-8 in jessie

2013-08-29 Thread Ian Jackson
Jonas Smedegaard writes ("Re: UTF-8 in jessie"): > I believe the underlying issue is the one summarized here: > https://en.wikipedia.org/wiki/Typewriter_apostrophe#ASCII_encoding Yes. > How about we simply mention explicitly that `arcane quoting' - even if > arguabl

Re: UTF-8 in jessie

2013-08-29 Thread Jonas Smedegaard
Quoting Ian Jackson (2013-08-29 13:56:09) > Adam Borowski writes ("Re: UTF-8 in jessie"): > > Let's take a look at some sheets. > > Last time I looked at this I found a copy of the actual ASCII > standards document from 1968 or so and it did mention this usage. &

Re: UTF-8 in jessie

2013-08-29 Thread Ian Jackson
Adam Borowski writes ("Re: UTF-8 in jessie"): > Let's take a look at some sheets. Last time I looked at this I found a copy of the actual ASCII standards document from 1968 or so and it did mention this usage. > > I don't think that better UTF-8 support should involv

Re: UTF-8 in jessie

2013-08-28 Thread Adam Borowski
On Wed, Aug 28, 2013 at 04:20:17PM +0100, Ian Jackson wrote: > Adam Borowski writes ("UTF-8 in jessie"): > > I would like to propose full UTF-8 support. I don't mean here full > > support for all of Unicode's finer points, merely complete eradication of > &g

Re: UTF-8 in jessie

2013-08-28 Thread Dmitrijs Ledkovs
On 12 August 2013 01:51, Adam Borowski wrote: > > 3. all file names must be valid UTF-8 > Case in point errors from ubuntu UDD package importer: """ Packages containing non-UTF-8, non-ASCII filenames. This is a problem. It is unclear how to sensibly map these into Bazaar. anon-proxy aspell-is as

Re: UTF-8 in jessie

2013-08-28 Thread Ian Jackson
Adam Borowski writes ("UTF-8 in jessie"): > I would like to propose full UTF-8 support. I don't mean here full > support for all of Unicode's finer points, merely complete eradication of > mojibake. That is, ensuring that /m.o/ matches "möo", or that &quo

Re: UTF-8 in jessie

2013-08-18 Thread gregor herrmann
On Mon, 12 Aug 2013 02:51:52 +0200, Adam Borowski wrote: > 4a. perl and pod > > Considering perl to be text raises one more issue: pod. By perl's design, > pod without a specified encoding is considered to be ISO-8859-1, even if > the file contains "use utf8;". This is surprising, and many auth

Re: UTF-8 in jessie (debhelper and BOM)

2013-08-13 Thread Adam Borowski
On Tue, Aug 13, 2013 at 01:44:03PM +0900, Osamu Aoki wrote: > But I do not understand goal #5. Why "MUST"? Do you have rationale? > > On Mon, Aug 12, 2013 at 03:50:19PM +0200, Florian Lohoff wrote: > > On Mon, Aug 12, 2013 at 02:51:52AM +0200, Adam Borowski wrote: > > > I propose the following s

Re: UTF-8 in jessie

2013-08-13 Thread Florian Lohoff
On Mon, Aug 12, 2013 at 05:58:20PM +0200, Adam Borowski wrote: > On Mon, Aug 12, 2013 at 03:50:19PM +0200, Florian Lohoff wrote: > > 5. All programs consuning UTF8 Text must understand a BOM. > > I'm afraid I don't agree here: BOMs are nasty stuff that serve no purpose > once you standardize on UT

Re: UTF-8 in jessie

2013-08-13 Thread Vincent Lefevre
On 2013-08-13 10:25:31 +, Thorsten Glaser wrote: > Vincent Lefevre vinc17.net> writes: > > > If scripts intend to use LC_ALL=C.UTF-8 to force everything to > > the standard locale with UTF-8 support, then the glibc should > > be modified to regard C.UTF-8 like C w.r.t. $LANGUAGE. I mean: > >

Re: UTF-8 in jessie

2013-08-13 Thread Thorsten Glaser
Vincent Lefevre vinc17.net> writes: > If scripts intend to use LC_ALL=C.UTF-8 to force everything to > the standard locale with UTF-8 support, then the glibc should > be modified to regard C.UTF-8 like C w.r.t. $LANGUAGE. I mean: Ouch! Scripts do, and this *is* how C.UTF-8 was intended: to behav

Re: UTF-8 in jessie

2013-08-13 Thread Bastien ROUCARIES
On Mon, Aug 12, 2013 at 5:56 PM, Thorsten Glaser wrote: > Florian Lohoff zz.de> writes: > >> 5. All programs consuning UTF8 Text must understand a BOM. > > The kernel doesn’t, start there: > > tglase@tglase:~$ mksh -c 'print '\''\ufeff#!/bin/sh\necho foo'\' >x; chmod +x > x; ./x > ./x: line 1: #

Re: UTF-8 in jessie

2013-08-13 Thread Christian PERRIER
Quoting Charles Plessy (ple...@debian.org): > Hi Christian, > > what I am proposing is a task that install all languages. I made a bit of > research earlier, and it is not as simple as installing all the existing > tasks, > as the result on my computer was that some browsers started to display

Re: UTF-8 in jessie

2013-08-12 Thread Charles Plessy
Le Tue, Aug 13, 2013 at 08:12:24AM +0200, Christian PERRIER a écrit : > Quoting Charles Plessy (ple...@debian.org): > > > About display by GUIs, I think that we should have a system to install all > > the > > fonts necessary to display languages that we support at the installation. > > > Such a

Re: UTF-8 in jessie

2013-08-12 Thread Christian PERRIER
Quoting Charles Plessy (ple...@debian.org): > About display by GUIs, I think that we should have a system to install all the > fonts necessary to display languages that we support at the installation. Such as tasksel and its language tasks? :-) In short, we already have that. However, we need p

Re: UTF-8 in jessie (debhelper and BOM)

2013-08-12 Thread Osamu Aoki
Hi, UTF-8 is a good goal indeed as principle. (I agree but I am struggling to update package documentation since Japanese are known to be tough (JIS 2022/EUCJP/SHIFT-JIS/... are used) EUC/SHIFT-JIS mixed case can be confused with LATIN-1 easily. ) But I do not understand goal #5. Why "MUST"?

Re: UTF-8 in jessie

2013-08-12 Thread Charles Plessy
Le Mon, Aug 12, 2013 at 03:55:03PM +0200, Adam Borowski a écrit : > On Mon, Aug 12, 2013 at 09:58:30AM +0200, Niels Thykier wrote: > > For the record, there is a Lintian tag for this now[1], which suggests > > only a handful of packages violates this. > > > > > - Recommend ASCII when possible. >

Re: UTF-8 in jessie

2013-08-12 Thread Vincent Lefevre
On 2013-08-12 20:14:30 +0100, Dmitrijs Ledkovs wrote: > What about locales though? > > * C.utf8 locale should be always available > * C.utf8 locale should be the default/fallback locale > * utf8 locale variants should be default / available / preferred > (where appropriate) If scripts intend to u

Re: UTF-8 in jessie

2013-08-12 Thread Vincent Lefevre
On 2013-08-12 17:58:20 +0200, Adam Borowski wrote: > On Mon, Aug 12, 2013 at 03:50:19PM +0200, Florian Lohoff wrote: > > 5. All programs consuning UTF8 Text must understand a BOM. > > I'm afraid I don't agree here: BOMs are nasty stuff that serve no purpose > once you standardize on UTF8. They mi

Re: UTF-8 in jessie

2013-08-12 Thread Dmitrijs Ledkovs
On 12 August 2013 01:51, Adam Borowski wrote: > On Mon, May 06, 2013 at 02:49:57PM +0200, Andreas Beckmann wrote: > I propose the following sub-goals: > > 1. all programs should, in their default configuration, accept UTF-8 input >and pass it through uncorrupted. Having to manually specify en

Re: UTF-8 in jessie

2013-08-12 Thread Vincent Lefevre
On 2013-08-12 15:16:59 +0200, Adam Borowski wrote: > On Mon, Aug 12, 2013 at 12:50:35PM +0200, Vincent Lefevre wrote: > > On 2013-08-12 02:51:52 +0200, Adam Borowski wrote: > > > Detecting non-UTF files is easy: > > > * false positives are impossible > > > * false negatives are extremely unlikely:

Re: UTF-8 in jessie

2013-08-12 Thread Adam Borowski
On Mon, Aug 12, 2013 at 03:50:19PM +0200, Florian Lohoff wrote: > 5. All programs consuning UTF8 Text must understand a BOM. I'm afraid I don't agree here: BOMs are nasty stuff that serve no purpose once you standardize on UTF8. They might help with exchange with a minority of Windows programs, a

Re: UTF-8 in jessie

2013-08-12 Thread Thorsten Glaser
Florian Lohoff zz.de> writes: > 5. All programs consuning UTF8 Text must understand a BOM. The kernel doesn’t, start there: tglase@tglase:~$ mksh -c 'print '\''\ufeff#!/bin/sh\necho foo'\' >x; chmod +x x; ./x ./x: line 1: #!/bin/sh: No such file or directory foo That’s running GNU bash, with

Re: UTF-8 in jessie

2013-08-12 Thread Florian Lohoff
On Mon, Aug 12, 2013 at 02:51:52AM +0200, Adam Borowski wrote: > I propose the following sub-goals: > > 1. all programs should, in their default configuration, accept UTF-8 input >and pass it through uncorrupted. Having to manually specify encoding >is acceptable only in a programmatic in

Re: UTF-8 in jessie

2013-08-12 Thread Adam Borowski
On Mon, Aug 12, 2013 at 09:58:30AM +0200, Niels Thykier wrote: > For the record, there is a Lintian tag for this now[1], which suggests > only a handful of packages violates this. > > > - Recommend ASCII when possible. > > - Require ASCII for files in /bin, /sbin, /usr/bin, /usr/sbin and > > /u

Re: UTF-8 in jessie

2013-08-12 Thread Adam Borowski
On Mon, Aug 12, 2013 at 12:50:35PM +0200, Vincent Lefevre wrote: > On 2013-08-12 02:51:52 +0200, Adam Borowski wrote: > > Detecting non-UTF files is easy: > > * false positives are impossible > > * false negatives are extremely unlikely: combinations of letters that would > > happen to match a va

Re: UTF-8 in jessie

2013-08-12 Thread Vincent Lefevre
On 2013-08-12 02:51:52 +0200, Adam Borowski wrote: > Detecting non-UTF files is easy: > * false positives are impossible > * false negatives are extremely unlikely: combinations of letters that would > happen to match a valid utf character don't happen naturally, and even if > they did, every s

Re: UTF-8 in jessie

2013-08-12 Thread Niels Thykier
On 2013-08-12 04:18, Charles Plessy wrote: > Le Mon, Aug 12, 2013 at 02:51:52AM +0200, Adam Borowski a écrit : >> >> I would like to propose full UTF-8 support. I don't mean here full >> support for all of Unicode's finer points, merely complete eradication of >> mojibake. > > Hi Adam, > Hi, >

Re: UTF-8 in jessie

2013-08-11 Thread Charles Plessy
Le Mon, Aug 12, 2013 at 02:51:52AM +0200, Adam Borowski a écrit : > > I would like to propose full UTF-8 support. I don't mean here full > support for all of Unicode's finer points, merely complete eradication of > mojibake. Hi Adam, this is a great goal. Here are two comments. There is a rel

Re: UTF-8 in jessie

2013-08-11 Thread Chow Loong Jin
On Mon, Aug 12, 2013 at 02:51:52AM +0200, Adam Borowski wrote: > [...] > > On the other hand, detecting text files is hard. The best tool so far, > "file", makes so many errors it's useless for this purpose. One could use > location: like, declaring stuff in /etc/ and /usr/share/doc/ to be text

UTF-8 in jessie

2013-08-11 Thread Adam Borowski
On Mon, May 06, 2013 at 02:49:57PM +0200, Andreas Beckmann wrote: > now might be the right time to start a discussion about release goals > for jessie. I would like to propose full UTF-8 support. I don't mean here full support for all of Unicode's finer points, merely complete eradication of moji