Re: [HACKERS] Extensions, patch v20 (bitrot fixes)

2010-12-20 Thread Itagaki Takahiro
On Tue, Dec 21, 2010 at 08:04, Martijn van Oosterhout wrote: > On Mon, Dec 20, 2010 at 10:15:56PM +0100, Nicolas Barbier wrote: >> >From >> >http://en.wikipedia.org/wiki/Japanese_language_and_computers#Character_encodings>: > ISTM that since all the mapping tables are public it should be a SMOP >

Re: [HACKERS] Extensions, patch v20 (bitrot fixes)

2010-12-20 Thread Martijn van Oosterhout
On Mon, Dec 20, 2010 at 10:15:56PM +0100, Nicolas Barbier wrote: > >From > >http://en.wikipedia.org/wiki/Japanese_language_and_computers#Character_encodings>: > > "Unicode is supposed to solve all encoding problems in all languages > of the world. [..] There are still controversies. For Japanese,

Re: [HACKERS] Extensions, patch v20 (bitrot fixes)

2010-12-20 Thread Nicolas Barbier
2010/12/20 Martijn van Oosterhout : > On Mon, Dec 20, 2010 at 09:03:56AM +0900, Itagaki Takahiro wrote: > >> UTF-8 is not a superset of all encodings. > > I think you mean Unicode is not a superset of all character sets. I've > heard this before but never found what's missing. [citation needed]?

Re: [HACKERS] Extensions, patch v20 (bitrot fixes)

2010-12-20 Thread Kenneth Marshall
On Mon, Dec 20, 2010 at 03:08:48PM -0500, Tom Lane wrote: > Kenneth Marshall writes: > > On Mon, Dec 20, 2010 at 02:10:39PM -0500, Tom Lane wrote: > >> [citation needed]? Exactly what characters are missing, and why would > >> the Unicode people have chosen to leave them out? It's not like they'

Re: [HACKERS] Extensions, patch v20 (bitrot fixes)

2010-12-20 Thread Tom Lane
Kenneth Marshall writes: > On Mon, Dec 20, 2010 at 02:10:39PM -0500, Tom Lane wrote: >> [citation needed]? Exactly what characters are missing, and why would >> the Unicode people have chosen to leave them out? It's not like they've >> not heard of those encodings, I'm sure. > Here is an intere

Re: [HACKERS] Extensions, patch v20 (bitrot fixes)

2010-12-20 Thread David E. Wheeler
On Dec 20, 2010, at 11:53 AM, Kenneth Marshall wrote: > Here is an interesting description of some of the gotchas: > > http://en.wikipedia.org/wiki/Windows-1252 FWIW, those are gotchas translating between Windows 1252 and Latin-1. Windows 1252's nerbles translate to UTF-8 just fine. David --

Re: [HACKERS] Extensions, patch v20 (bitrot fixes)

2010-12-20 Thread Kenneth Marshall
On Mon, Dec 20, 2010 at 02:10:39PM -0500, Tom Lane wrote: > David Fetter writes: > > On Mon, Dec 20, 2010 at 08:01:42PM +0100, Martijn van Oosterhout wrote: > >> I think you mean Unicode is not a superset of all character sets. I've > >> heard this before but never found what's missing. [citation

Re: [HACKERS] Extensions, patch v20 (bitrot fixes)

2010-12-20 Thread Tom Lane
David Fetter writes: > On Mon, Dec 20, 2010 at 08:01:42PM +0100, Martijn van Oosterhout wrote: >> I think you mean Unicode is not a superset of all character sets. I've >> heard this before but never found what's missing. [citation needed]? > Windows-1252, ISO-2022-JP-2 and EUC-TW are such encodi

Re: [HACKERS] Extensions, patch v20 (bitrot fixes)

2010-12-20 Thread David Fetter
On Mon, Dec 20, 2010 at 08:01:42PM +0100, Martijn van Oosterhout wrote: > On Mon, Dec 20, 2010 at 09:03:56AM +0900, Itagaki Takahiro wrote: > > On Mon, Dec 20, 2010 at 01:34, Tom Lane wrote: > > >> I agree that "the default encoding is UTF-8", but it should be > > >> configurable by the 'encoding'

Re: [HACKERS] Extensions, patch v20 (bitrot fixes)

2010-12-20 Thread Martijn van Oosterhout
On Mon, Dec 20, 2010 at 09:03:56AM +0900, Itagaki Takahiro wrote: > On Mon, Dec 20, 2010 at 01:34, Tom Lane wrote: > >> I agree that "the default encoding is UTF-8", but it should be > >> configurable by the 'encoding' parameter in control files. > > > > Why is it necessary to have such a paramete

Re: [HACKERS] Extensions, patch v20 (bitrot fixes)

2010-12-19 Thread Itagaki Takahiro
On Mon, Dec 20, 2010 at 01:34, Tom Lane wrote: >> I agree that "the default encoding is UTF-8", but it should be >> configurable by the 'encoding' parameter in control files. > > Why is it necessary to have such a parameter at all? UTF-8 is not a superset of all encodings. -- Itagaki Takahiro

Re: [HACKERS] Extensions, patch v20 (bitrot fixes)

2010-12-19 Thread Dimitri Fontaine
Tom Lane writes: > Why is it necessary to have such a parameter at all? AFAICS it just > adds complexity for little if any gain. Most extension files will > probably be pure ASCII anyway. Dictionary files are *far* more likely > to contain non-ASCII characters. If we've gotten along fine with

Re: [HACKERS] Extensions, patch v20 (bitrot fixes)

2010-12-19 Thread Tom Lane
Itagaki Takahiro writes: >> Oh, I wasn't aware that Itagaki-san had objected to Tom's proposal. > I agree that "the default encoding is UTF-8", but it should be > configurable by the 'encoding' parameter in control files. Why is it necessary to have such a parameter at all? AFAICS it just adds

Re: [HACKERS] Extensions, patch v20 (bitrot fixes)

2010-12-19 Thread Itagaki Takahiro
>>> - Did we decide to ditch the encoding parameter for extension scripts >>> and mandate UTF-8? >> >> No we didn't, we decided that the default encoding is UTF-8 and that any >> contrib script defaults to UTF-8, so that it's not necessary to care >> about the 'encoding' parameter in the control fi

Re: [HACKERS] Extensions, patch v20 (bitrot fixes)

2010-12-19 Thread Robert Haas
On Sun, Dec 19, 2010 at 5:30 AM, Dimitri Fontaine wrote: > Robert Haas writes: >> I spent a little time looking at this tonight.  I'm going to give you >> the same general advice that I've given other people who have >> submitted very large patches of this type: it'll be a lot easier to >> get th

Re: [HACKERS] Extensions, patch v20 (bitrot fixes)

2010-12-19 Thread Dimitri Fontaine
Hi, Thanks for your review and your time. Trying to answer some of your points there: Robert Haas writes: > I spent a little time looking at this tonight. I'm going to give you > the same general advice that I've given other people who have > submitted very large patches of this type: it'll be