Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-07-10 Thread Tatsuo Ishii
> Well, when the preceding comment block contains five references to > xemacs and the link for more information leads to www.xemacs.org, > I don't think it's real helpful to add one sentence saying "oh > by the way we're not actually following xemacs". > > I continue to think that we'd be better o

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-07-10 Thread Tom Lane
Tatsuo Ishii writes: > Done along with comment that we follow emacs's implementation, not > xemacs's. Well, when the preceding comment block contains five references to xemacs and the link for more information leads to www.xemacs.org, I don't think it's real helpful to add one sentence saying "oh

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-07-10 Thread Tatsuo Ishii
Tatsuo Ishii writes: >> So far as I can see, the only LCPRVn marker code that is actually in >> use right now is 0x9d --- there are no instances of 9a, 9b, or 9c >> that I can find. >> >> I also read in the xemacs internals doc, at >> http://www.xemacs.org/Documentati

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-07-08 Thread Tatsuo Ishii
>>> Tatsuo Ishii writes: > So far as I can see, the only LCPRVn marker code that is actually in > use right now is 0x9d --- there are no instances of 9a, 9b, or 9c > that I can find. > > I also read in the xemacs internals doc, at > http://www.xemacs.org/Documentation/21.5

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-07-07 Thread Tatsuo Ishii
>> Tatsuo Ishii writes: So far as I can see, the only LCPRVn marker code that is actually in use right now is 0x9d --- there are no instances of 9a, 9b, or 9c that I can find. I also read in the xemacs internals doc, at http://www.xemacs.org/Documentation/21.5/html/i

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-07-05 Thread Robert Haas
On Thu, Jul 5, 2012 at 8:46 PM, Tom Lane wrote: > Robert Haas writes: >> On Thu, Jul 5, 2012 at 7:11 PM, Tom Lane wrote: >>> Hm, several of these routines seem to neglect to advance the "from" >>> pointer? > >> Err... yeah. That's not a bug I introduced, but I should have caught >> it... and it

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-07-05 Thread Tatsuo Ishii
> Tatsuo Ishii writes: >>> So far as I can see, the only LCPRVn marker code that is actually in >>> use right now is 0x9d --- there are no instances of 9a, 9b, or 9c >>> that I can find. >>> >>> I also read in the xemacs internals doc, at >>> http://www.xemacs.org/Documentation/21.5/html/internal

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-07-05 Thread Tom Lane
Robert Haas writes: > On Thu, Jul 5, 2012 at 7:11 PM, Tom Lane wrote: >> Hm, several of these routines seem to neglect to advance the "from" >> pointer? > Err... yeah. That's not a bug I introduced, but I should have caught > it... and it does make me wonder how well this code was tested. > Do

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-07-05 Thread Robert Haas
On Thu, Jul 5, 2012 at 7:11 PM, Tom Lane wrote: > Robert Haas writes: >> On Sun, Jul 1, 2012 at 5:11 AM, Alexander Korotkov >> wrote: >>> [ new patch ] > >> With the improved comments in pg_wchar.h, it seemed clear what needed >> to be done here, so I fixed up the MULE conversion and committed

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-07-05 Thread Tom Lane
Tatsuo Ishii writes: >> So far as I can see, the only LCPRVn marker code that is actually in >> use right now is 0x9d --- there are no instances of 9a, 9b, or 9c >> that I can find. >> >> I also read in the xemacs internals doc, at >> http://www.xemacs.org/Documentation/21.5/html/internals_26.htm

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-07-05 Thread Tom Lane
Robert Haas writes: > On Sun, Jul 1, 2012 at 5:11 AM, Alexander Korotkov > wrote: >> [ new patch ] > With the improved comments in pg_wchar.h, it seemed clear what needed > to be done here, so I fixed up the MULE conversion and committed this. > I'd appreciate it if someone would check my work

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-07-04 Thread Tatsuo Ishii
> On Sun, Jul 1, 2012 at 5:11 AM, Alexander Korotkov > wrote: >> [ new patch ] > > With the improved comments in pg_wchar.h, it seemed clear what needed > to be done here, so I fixed up the MULE conversion and committed this. > I'd appreciate it if someone would check my work, but I think it's

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-07-04 Thread Robert Haas
On Sun, Jul 1, 2012 at 5:11 AM, Alexander Korotkov wrote: > [ new patch ] With the improved comments in pg_wchar.h, it seemed clear what needed to be done here, so I fixed up the MULE conversion and committed this. I'd appreciate it if someone would check my work, but I think it's right. -- Ro

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-07-03 Thread Tatsuo Ishii
> So far as I can see, the only LCPRVn marker code that is actually in > use right now is 0x9d --- there are no instances of 9a, 9b, or 9c > that I can find. > > I also read in the xemacs internals doc, at > http://www.xemacs.org/Documentation/21.5/html/internals_26.html#SEC145 > that XEmacs think

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-07-03 Thread Tom Lane
I wrote: > Tatsuo Ishii writes: >>> I have added comments about mule internal encoding by refreshing my >>> memory and from old document found on >>> web(http://mibai.tec.u-ryukyu.ac.jp/cgi-bin/info2www?%28mule%29Buffer%20and%20string). >> Any objection to apply my patch? > It needs a bit of cop

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-07-03 Thread Tom Lane
Tatsuo Ishii writes: >> I have added comments about mule internal encoding by refreshing my >> memory and from old document found on >> web(http://mibai.tec.u-ryukyu.ac.jp/cgi-bin/info2www?%28mule%29Buffer%20and%20string). > Any objection to apply my patch? It needs a bit of copy-editing, and I

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-07-03 Thread Tatsuo Ishii
> I have added comments about mule internal encoding by refreshing my > memory and from old document found on > web(http://mibai.tec.u-ryukyu.ac.jp/cgi-bin/info2www?%28mule%29Buffer%20and%20string). Any objection to apply my patch? -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-07-03 Thread Tom Lane
Alexander Korotkov writes: > It's likely we also need to assign some names to all these numbers > (0xf0, 0xf4, 0xfe, 0x9c, 0x9d). But it's hard for me to invent such names. The encoding ID byte values already have names (see pg_wchar.h), but the private prefix bytes don't. I griped about that up

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-07-03 Thread Alexander Korotkov
On Tue, Jul 3, 2012 at 10:17 AM, Tatsuo Ishii wrote: > > OK. So, in that case, I suggest that if the leading byte is non-zero, > > we emit 0x9d followed by the three available bytes, instead of first > > testing whether the first byte is >= 0xf0. That test seems to serve > > no purpose but to c

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-07-02 Thread Tatsuo Ishii
> OK. So, in that case, I suggest that if the leading byte is non-zero, > we emit 0x9d followed by the three available bytes, instead of first > testing whether the first byte is >= 0xf0. That test seems to serve > no purpose but to confuse the issue. Probably the code shoud look like this(see b

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-07-02 Thread Robert Haas
On Mon, Jul 2, 2012 at 7:33 PM, Tatsuo Ishii wrote: >> Yeah, I did. I think I may be a bit confused here, so let me try to >> understand this a bit better. It seems like pg_mule2wchar_with_len >> uses the following algorithm: >> >> - If the first character IS_LC1 (0x81-0x8d), decode two bytes, s

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-07-02 Thread Tom Lane
I wrote: > Some inspection of pg_wchar.h suggests that the IS_LCPRV1 and IS_LCPRV2 > cases are unused: the file doesn't define any encoding labels that match > the byte values they accept, nor do the comments suggest that Emacs has > any such labels either. Scratch that --- I was misled by the fon

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-07-02 Thread Tom Lane
Robert Haas writes: > In the reverse transformation implemented by pg_wchar2mule_with_len, > if the byte stored with shift 16 IS_LC1 or IS_LC2, then we decode 2 or > 3 bytes, respectively, exactly as I would expect. ASCII decoding is > also as I would expect. The case I don't understand is what

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-07-02 Thread Tatsuo Ishii
> Yeah, I did. I think I may be a bit confused here, so let me try to > understand this a bit better. It seems like pg_mule2wchar_with_len > uses the following algorithm: > > - If the first character IS_LC1 (0x81-0x8d), decode two bytes, stored > with shifts of 16 and 0. > - If the first charact

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-07-02 Thread Robert Haas
On Mon, Jul 2, 2012 at 4:46 PM, Alexander Korotkov wrote: > So, I provided such transformation in versions 0.3 and 0.4 based on > explanation from Tatsuo Ishii. The problem is that both conversions are > nontrivial and it's not evident that they are mirror (understanding that > they are mirror req

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-07-02 Thread Alexander Korotkov
On Tue, Jul 3, 2012 at 12:37 AM, Robert Haas wrote: > On Mon, Jul 2, 2012 at 4:33 PM, Alexander Korotkov > wrote: > > On Mon, Jul 2, 2012 at 8:12 PM, Robert Haas > wrote: > >> > >> On Sun, Jul 1, 2012 at 5:11 AM, Alexander Korotkov < > aekorot...@gmail.com> > >> wrote: > >> >> MULE also looks p

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-07-02 Thread Robert Haas
On Mon, Jul 2, 2012 at 4:33 PM, Alexander Korotkov wrote: > On Mon, Jul 2, 2012 at 8:12 PM, Robert Haas wrote: >> >> On Sun, Jul 1, 2012 at 5:11 AM, Alexander Korotkov >> wrote: >> >> MULE also looks problematic. The code that you've written isn't >> >> symmetric with the opposite conversion, u

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-07-02 Thread Alexander Korotkov
On Mon, Jul 2, 2012 at 8:12 PM, Robert Haas wrote: > On Sun, Jul 1, 2012 at 5:11 AM, Alexander Korotkov > wrote: > >> MULE also looks problematic. The code that you've written isn't > >> symmetric with the opposite conversion, unlike what you did in all > >> other cases, and I don't understand

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-07-02 Thread Robert Haas
On Sun, Jul 1, 2012 at 5:11 AM, Alexander Korotkov wrote: >> MULE also looks problematic. The code that you've written isn't >> symmetric with the opposite conversion, unlike what you did in all >> other cases, and I don't understand why. I'm also somewhat baffled by >> the reverse conversion: i

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-07-01 Thread Alexander Korotkov
On Wed, Jun 27, 2012 at 11:35 PM, Robert Haas wrote: > It looks to me like pg_wchar2utf_with_len will not work, because > unicode_to_utf8 returns its second argument unmodified - not, as your > code seems to assume, the byte following what was already written. > Fixed. > MULE also looks proble

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-06-27 Thread Robert Haas
On Thu, May 24, 2012 at 12:04 AM, Alexander Korotkov wrote: > Thanks. I rewrote inverse conversion from pg_wchar to mule. New version of > patch is attached. Review: It looks to me like pg_wchar2utf_with_len will not work, because unicode_to_utf8 returns its second argument unmodified - not, as

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-05-28 Thread Tatsuo Ishii
> On Tue, May 22, 2012 at 3:27 PM, Tatsuo Ishii wrote: > >> > Thanks for your comments. They clarify a lot. >> > But I still don't realize how can we distinguish IS_LCPRV2 and IS_LC2? >> > Isn't it possible for them to produce same pg_wchar? >> >> If LB is in 0x90 - 0x99 range, then they are LC2.

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-05-23 Thread Alexander Korotkov
On Tue, May 22, 2012 at 3:27 PM, Tatsuo Ishii wrote: > > Thanks for your comments. They clarify a lot. > > But I still don't realize how can we distinguish IS_LCPRV2 and IS_LC2? > > Isn't it possible for them to produce same pg_wchar? > > If LB is in 0x90 - 0x99 range, then they are LC2. > If LB

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-05-22 Thread Tatsuo Ishii
> Thanks for your comments. They clarify a lot. > But I still don't realize how can we distinguish IS_LCPRV2 and IS_LC2? > Isn't it possible for them to produce same pg_wchar? If LB is in 0x90 - 0x99 range, then they are LC2. If LB is in 0xf0 - 0xff range, then they are LCPRV2. -- Tatsuo Ishii SRA

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-05-22 Thread Alexander Korotkov
On Tue, May 22, 2012 at 11:50 AM, Tatsuo Ishii wrote: > > I think it's possible. The first characters are defined like this: > > #define IS_LCPRV1(c)((unsigned char)(c) == 0x9a || (unsigned char)(c) > == 0x9b) > #define IS_LCPRV2(c)((unsigned char)(c) == 0x9c || (unsigned char)(c) > == 0x9

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-05-22 Thread Tatsuo Ishii
Hi Alexander, It was good seeing you in Ottawa! > Hello, Ishii-san! > > We've talked on PGCon that I've questions about mule to wchar > conversion. My questions about pg_mule2wchar_with_len function are > following. In these parts of code: > * > * > else if (IS_LCPRV1(*from) && len >= 3) > { >

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-05-21 Thread Alexander Korotkov
Hello, Ishii-san! We've talked on PGCon that I've questions about mule to wchar conversion. My questions about pg_mule2wchar_with_len function are following. In these parts of code: * * else if (IS_LCPRV1(*from) && len >= 3) { from++; *to = *from++ << 16; *to |= *from++; len -= 3;

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-05-02 Thread Alexander Korotkov
On Wed, May 2, 2012 at 5:48 PM, Robert Haas wrote: > On Wed, May 2, 2012 at 9:35 AM, Alexander Korotkov > wrote: > > Imagine we've two queries: > > 1) SELECT * FROM tbl WHERE col LIKE '%abcd%'; > > 2) SELECT * FROM tbl WHERE col LIKE '%abcdefghijk%'; > > > > The first query require reading post

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-05-02 Thread Robert Haas
On Wed, May 2, 2012 at 9:35 AM, Alexander Korotkov wrote: >> I was thinking you could perhaps do it just based on the *number* of >> trigrams, not necessarily their frequency. > > Imagine we've two queries: > 1) SELECT * FROM tbl WHERE col LIKE '%abcd%'; > 2) SELECT * FROM tbl WHERE col LIKE '%abc

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-05-02 Thread Alexander Korotkov
On Wed, May 2, 2012 at 4:50 PM, Robert Haas wrote: > On Tue, May 1, 2012 at 6:02 PM, Alexander Korotkov > wrote: > > Right. When number of trigrams is big, it is slow to scan posting list of > > all of them. The solution is this case is to exclude most frequent > trigrams > > from index scan. B

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-05-02 Thread Robert Haas
On Tue, May 1, 2012 at 6:02 PM, Alexander Korotkov wrote: > Right. When number of trigrams is big, it is slow to scan posting list of > all of them. The solution is this case is to exclude most frequent trigrams > from index scan. But, it require some kind of statistics of trigrams > frequencies

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-05-01 Thread Alexander Korotkov
On Tue, May 1, 2012 at 1:48 AM, Kevin Grittner wrote: > My biggest complaint is related to setting the threshold for the % > operator. It seems to me that there should be a GUC to control the > default, and that there should be a way to set the threshold for > each % operator in a query (if there

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-05-01 Thread Alexander Korotkov
On Mon, Apr 30, 2012 at 10:07 PM, Robert Haas wrote: > On Sun, Apr 29, 2012 at 8:12 AM, Erik Rijkers wrote: > > Perhaps I'm too early with these tests, but FWIW I reran my earlier test > program against three > > instances. (the patches compiled fine, and make check was without > problem). > >

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-05-01 Thread Alexander Korotkov
Hi Erik On Sun, Apr 29, 2012 at 4:12 PM, Erik Rijkers wrote: > Perhaps I'm too early with these tests, but FWIW I reran my earlier test > program against three > instances. (the patches compiled fine, and make check was without > problem). > > -- 3 instances: > HEAD port 6542 >

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-04-30 Thread Kevin Grittner
Robert Haas wrote: > Hopefully that's not too hard to fix; the basic approach seems > quite promising. After playing with trigram searches for name searches against copies of production database with appropriate indexing, our shop has chosen it as the new way to do name searches here. It's re

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-04-30 Thread Robert Haas
On Sun, Apr 29, 2012 at 8:12 AM, Erik Rijkers wrote: > Perhaps I'm too early with these tests, but FWIW I reran my earlier test > program against three > instances.  (the patches compiled fine, and make check was without problem). These tests results seem to be more about the pg_trgm changes tha

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-04-29 Thread Erik Rijkers
Hi Alexander, Perhaps I'm too early with these tests, but FWIW I reran my earlier test program against three instances. (the patches compiled fine, and make check was without problem). -- 3 instances: HEAD port 6542 trgm_regex port 6547 HEAD + trgm-regexp patch (22 N

[HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-04-23 Thread Alexander Korotkov
Hackers, attached patch adds conversion from pg_wchar string to multibyte string. This functionality is needed for my patch on index support for regular expression search http://archives.postgresql.org/pgsql-hackers/2011-11/msg01297.php . Analyzing conversion from multibyte to pg_wchar I found fol