Re: [HACKERS] PATCH: CITEXT 2.0 v4

2008-07-22 Thread David E. Wheeler
On Jul 18, 2008, at 01:39, Michael Paesold wrote: Calling regex functions with the case-insensitivity option would be great. It should also be possible to rewrite replace() into regexp_replace() by first escaping the regex meta characters. Actually re-implementing those functions in a case

Re: [HACKERS] PATCH: CITEXT 2.0 v4

2008-07-21 Thread David E. Wheeler
On Jul 18, 2008, at 09:53, David E. Wheeler wrote: However, if someone with a lot more C and Pg core knowledge wanted to sit down with me for a couple hours next week and help me bang out these functions, that would be great. I'd love to have the implementation be that much more complete.

Re: [HACKERS] PATCH: CITEXT 2.0 v4

2008-07-18 Thread David E. Wheeler
On Jul 18, 2008, at 01:39, Michael Paesold wrote: Calling regex functions with the case-insensitivity option would be great. It should also be possible to rewrite replace() into regexp_replace() by first escaping the regex meta characters. Actually re-implementing those functions in a case

Re: [HACKERS] PATCH: CITEXT 2.0 v4

2008-07-18 Thread Michael Paesold
David E. Wheeler writes: On Jul 17, 2008, at 03:45, Michael Paesold wrote: Wouldn't it be possible to create a variant of regexp_replace, i.e. regexp_replace(citext,citext,text), which would again lower-case the first two arguments before passing the input to regexp_replace(text,text,text

Re: [HACKERS] PATCH: CITEXT 2.0 v4

2008-07-17 Thread David E. Wheeler
On Jul 17, 2008, at 03:45, Michael Paesold wrote: Wouldn't it be possible to create a variant of regexp_replace, i.e. regexp_replace(citext,citext,text), which would again lower-case the first two arguments before passing the input to regexp_replace(text,text,text)? Sure, but then you end

Re: [HACKERS] PATCH: CITEXT 2.0 v4

2008-07-17 Thread Michael Paesold
Am 16.07.2008 um 20:38 schrieb David E. Wheeler: The trouble is that, right now: template1=# select regexp_replace( 'fxx'::citext, 'X'::citext, 'o'); regexp_replace fxx (1 row) So there's an inconsistency there. I don't know how to make that work case-insensitively. Would

Re: [HACKERS] PATCH: CITEXT 2.0 v4

2008-07-16 Thread David E. Wheeler
On Jul 16, 2008, at 11:20, Robert Treat wrote: I was thinking about this a bit last night and wanted to fill things out a bit. As a programmer, I find Donald Fraser's hindsight to be more appealing, because at least that way I have the option to do matching against CITEXT strings case-sensitive

Re: [HACKERS] PATCH: CITEXT 2.0 v4

2008-07-16 Thread Robert Treat
On Wednesday 16 July 2008 13:54:25 David E. Wheeler wrote: > On Jul 15, 2008, at 22:23, David E. Wheeler wrote: > > * The README for citext 1.0 on pgFoundry says: > >> I had to make a decision on casting between types for regular > >> expressions and > >> decided that if any parameter is of citext

Re: [HACKERS] PATCH: CITEXT 2.0 v4

2008-07-16 Thread David E. Wheeler
On Jul 15, 2008, at 22:23, David E. Wheeler wrote: * The README for citext 1.0 on pgFoundry says: I had to make a decision on casting between types for regular expressions and decided that if any parameter is of citext type then case insensitive applies. For example applying regular express

[HACKERS] PATCH: CITEXT 2.0 v4

2008-07-15 Thread David E. Wheeler
Howdy, I've attached a new patch with the latest revisions of for the citext contrib module patch. The changes include: * Using strlen() to pass string lengths to the comparison function, since lowercasing the value can change the length. Per Tom Lane. * Made citextcmp consistently return i

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-15 Thread David E. Wheeler
On Jul 15, 2008, at 20:26, Tom Lane wrote: "David E. Wheeler" <[EMAIL PROTECTED]> writes: So I guess my question is: what is wrong with the properties for citextsend/citextrecv [ checks catalogs... ] textsend and textrecv are marked STABLE not IMMUTABLE. I am not totally sure about the reas

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-15 Thread Tom Lane
"David E. Wheeler" <[EMAIL PROTECTED]> writes: > So I guess my question is: what is wrong with the properties for > citextsend/citextrecv [ checks catalogs... ] textsend and textrecv are marked STABLE not IMMUTABLE. I am not totally sure about the reasoning offhand --- it might be because thei

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-15 Thread David E. Wheeler
On Jul 15, 2008, at 12:56, Tom Lane wrote: Don't run the tests in a read-only directory, perhaps. Yes, I changed the owner to the postgres system user and that did the trick. Or do they matter for sanity-checking citext? Hard to tell --- I'd suggest trying to get a clean run. As for

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-15 Thread Tom Lane
"David E. Wheeler" <[EMAIL PROTECTED]> writes: > Well now that was cool to see. I got some failures, of course, but > nothing stands out to me as an obvious bug. I attach the diffs file > (with the citext.sql failure removed) for your perusal. What would be > the best way for me to resolve th

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-15 Thread David E. Wheeler
On Jul 15, 2008, at 07:09, Tom Lane wrote: Yeah, probably. I don't think the "make check" path will support it because it doesn't install contrib into the temp installation. (You'd also need to have put the extra entry in parallel_schedule not serial_schedule, but it's not gonna work anyway.)

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-15 Thread Tom Lane
"David E. Wheeler" <[EMAIL PROTECTED]> writes: > Okay, I copied citext.sql into src/test/regress/sql and then added > "test: citext" to the top of src/test/regress/serial_schedule. Then I > ran `make check`. All tests passed, but I don't think that citext was > tested. > Do I need to install

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-14 Thread David E. Wheeler
On Jul 14, 2008, at 07:08, Tom Lane wrote: You're really making it into another test. Just copy the citext.sql file into the sql/ subdirectory and add a "citext" entry to the schedule file. The last time I did this, I had to at least "touch" a corresponding expected file (expected/citext.ou

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-14 Thread David E. Wheeler
On Jul 12, 2008, at 15:13, David E. Wheeler wrote: 2. It's ridiculously slow; at least a factor of ten slower than doing equivalent tests directly in SQL. This is a very bad thing. Speed of regression tests matters a lot to those of us who run them a dozen times per day --- and I do not w

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-14 Thread David E. Wheeler
On Jul 14, 2008, at 11:49, Tom Lane wrote: You don't seem to understand what I'm suggesting: I'm not talking about testing strcoll. I'm talking about making sure that citext *uses* strcoll. This seems pointless, as well as essentially impossible to enforce by black-box testing. Nobody is like

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-14 Thread Tom Lane
"David E. Wheeler" <[EMAIL PROTECTED]> writes: > On Jul 14, 2008, at 11:06, Tom Lane wrote: >> Let me put it another way: if we test on another platform and find out >> that strcoll's behavior is different there, are you going to fix that >> version of strcoll? No, you're not. So you might as wel

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-14 Thread David E. Wheeler
On Jul 14, 2008, at 11:06, Tom Lane wrote: [ shrug... ] Seems pretty useless to me: we already know that it works for you. The point of a regression test in my mind is to make sure that it works for everybody. Given the platform variations involved in strcoll's behavior, the definition of

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-14 Thread Tom Lane
"David E. Wheeler" <[EMAIL PROTECTED]> writes: > On Jul 14, 2008, at 07:24, Tom Lane wrote: >> The fallacy in that proposal is the assumption that there are only two >> behaviors out there. > Well, no, that's not the assumption at all. The assumption is that the > type works properly with multib

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-14 Thread David E. Wheeler
On Jul 14, 2008, at 10:58, Tom Lane wrote: Well, unless the display of the characters would be broken (the build farm databases use UTF-8, no?), No, they use C. Um, you mean SQL_ASCII? Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to you

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-14 Thread Tom Lane
"David E. Wheeler" <[EMAIL PROTECTED]> writes: > Well, unless the display of the characters would be broken (the build > farm databases use UTF-8, no?), No, they use C. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make chan

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-14 Thread David E. Wheeler
On Jul 14, 2008, at 10:55, Andrew Dunstan wrote: No. --no-locale is the same as --locale=C which means the encoding is SQL_ASCII. I've used --no-locale --encoding UTF-8 many times. But if the regressions run with SQL_ASCII…well, I'm just amazed that there haven't been more unexpected si

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-14 Thread Andrew Dunstan
David E. Wheeler wrote: On Jul 14, 2008, at 06:05, Andrew Dunstan wrote: You can certainly add any tests you like. But the buildfarm does all its regression tests using an install done with --no-locale. Any test that requires some locale to work, or make sense, should probably not be in you

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-14 Thread David E. Wheeler
On Jul 14, 2008, at 07:26, Tom Lane wrote: I'd like to keep these tests, since they ensure not just that the functions work but that they work with citext. It might be reasonable to test a couple of them for that purpose. If your agenda is "test every function in the system that comes or might

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-14 Thread David E. Wheeler
On Jul 14, 2008, at 07:24, Tom Lane wrote: "David E. Wheeler" <[EMAIL PROTECTED]> writes: Could I supply two comparison files, one for Mac OS X with en_US.UTF-8 and one for everything else, as described in the last three paragraphs here? The fallacy in that proposal is the assumption that

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-14 Thread David E. Wheeler
On Jul 14, 2008, at 06:05, Andrew Dunstan wrote: You can certainly add any tests you like. But the buildfarm does all its regression tests using an install done with --no-locale. Any test that requires some locale to work, or make sense, should probably not be in your Makefile's REGRESS tar

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-14 Thread Tom Lane
"David E. Wheeler" <[EMAIL PROTECTED]> writes: > On Jul 12, 2008, at 14:57, Tom Lane wrote: >> 4. A lot of the later test cases are equally uselessly testing whether >> piggybacking over text functions works. > I'd like to keep these tests, since they ensure not just that the > functions work bu

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-14 Thread Tom Lane
"David E. Wheeler" <[EMAIL PROTECTED]> writes: > Could I supply two comparison files, one for Mac OS X with en_US.UTF-8 > and one for everything else, as described in the last three paragraphs > here? The fallacy in that proposal is the assumption that there are only two behaviors out there.

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-14 Thread Tom Lane
"David E. Wheeler" <[EMAIL PROTECTED]> writes: > On Jul 13, 2008, at 10:19, Tom Lane wrote: >> BTW, actually a better idea would be to put citext.sql at the front of >> the list and just run the whole main regression series with it >> present. >> typ_sanity and oidjoins might possibly find issues

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-14 Thread Andrew Dunstan
David E. Wheeler wrote: (If we get to having per-database collations in 8.4 then integrating a test with a non-default collation would get a lot easier, of course; but for the moment I'm afraid you've got to work with what's there.) Could I supply two comparison files, one for Mac OS X with

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-14 Thread David E. Wheeler
On Jul 12, 2008, at 14:57, Tom Lane wrote: 4. A lot of the later test cases are equally uselessly testing whether piggybacking over text functions works. The odds of ever finding anything with those tests are not distinguishable from zero (unless the underlying text function is busted, which

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-13 Thread David E. Wheeler
On Jul 13, 2008, at 10:31, Tom Lane wrote: Grr. Kind of defeats the purpose. Is there no infrastructure for testing multibyte functionality? There's some stuff under src/test/locale and src/test/mb, though it doesn't get exercised regularly. The contrib tests are a particularly bad place for

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-13 Thread David E. Wheeler
On Jul 13, 2008, at 10:19, Tom Lane wrote: You might try running the opr_sanity regression test on this module to see if it finds any other silliness. (Procedure: insert the citext definition script into the serial_schedule list just ahead of opr_sanity, run tests, make sure you understand the

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-13 Thread Tom Lane
"David E. Wheeler" <[EMAIL PROTECTED]> writes: > To judge by the User-Defined Types docs, I was close. :-) I just > changed the argument to citextrecv to "internal". Right. The APIs for send and recv aren't inverses. (Oh, the sins we'll commit in the name of performance ...)

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-13 Thread David E. Wheeler
On Jul 13, 2008, at 16:06, David E. Wheeler wrote: Should those return bytea and citext, respectively? IOW, are these right? CREATE OR REPLACE FUNCTION citextrecv(bytea) RETURNS citext AS 'textrecv' LANGUAGE 'internal' IMMUTABLE STRICT; CREATE OR REPLACE FUNCTION citextsend(citext) RETURNS b

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-13 Thread David E. Wheeler
On Jul 13, 2008, at 10:19, Tom Lane wrote: I'm confused. Is that not what the citextin and citextout functions are? No, those are text I/O. You need analogues of textsend and textrecv too. Should those return bytea and citext, respectively? IOW, are these right? CREATE OR REPLACE FUNCT

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-13 Thread David E. Wheeler
On Jul 13, 2008, at 10:19, Tom Lane wrote: "David E. Wheeler" <[EMAIL PROTECTED]> writes: On Jul 12, 2008, at 12:17, Tom Lane wrote: * You should provide binary I/O (send/receive) functions, if you want this to be an industrial-strength module. It's easy since you can piggyback on text's.

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-13 Thread David E. Wheeler
On Jul 13, 2008, at 10:16, Tom Lane wrote: Hmm. I think what that actually means is that the cast from citext to bpchar should be AS ASSIGNMENT rather than IMPLICIT. What is happening is that the system can't figure out whether to use length(text) or length(bpchar) when presented with a cit

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-13 Thread Tom Lane
"David E. Wheeler" <[EMAIL PROTECTED]> writes: > On Jul 12, 2008, at 14:57, Tom Lane wrote: >> Sadly, I think you have to give up >> attempts to check the interesting multibyte cases and confine yourself >> to tests using ASCII strings. > Grr. Kind of defeats the purpose. Is there no infrastructur

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-13 Thread Tom Lane
"David E. Wheeler" <[EMAIL PROTECTED]> writes: > On Jul 12, 2008, at 12:17, Tom Lane wrote: >> * You should provide binary I/O (send/receive) functions, if you want >> this to be an industrial-strength module. It's easy since you can >> piggyback on text's. > I'm confused. Is that not what the ci

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-13 Thread Tom Lane
"David E. Wheeler" <[EMAIL PROTECTED]> writes: > When I deleted any of the others, I got errors like this: > psql:sql/citext.sql:865: ERROR: function length(citext) is not unique > LINE 1: SELECT is( length( name ), length(name::text), 'length("' ||... > ^ > HINT: Could not c

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-12 Thread David E. Wheeler
On Jul 12, 2008, at 14:50, David E. Wheeler wrote: * An explicit comment explaining that you're piggybacking on the I/O functions (and some others) for "text" wouldn't be out of place. I've added SQL comments. Were you talking about a COMMENT? * Lose the GRANT EXECUTEs on the I/O functions;

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-12 Thread David E. Wheeler
On Jul 12, 2008, at 15:13, David E. Wheeler wrote: Sadly, I think you have to give up attempts to check the interesting multibyte cases and confine yourself to tests using ASCII strings. Grr. Kind of defeats the purpose. Is there no infrastructure for testing multibyte functionality? Are

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-12 Thread David E. Wheeler
On Jul 12, 2008, at 14:57, Tom Lane wrote: There was some discussion earlier about whether the proposed regression tests for citext are suitable for use in contrib or not. After playing with them for awhile, I have to come down very firmly on the side of "not". I have these gripes: You'r

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-12 Thread Tom Lane
There was some discussion earlier about whether the proposed regression tests for citext are suitable for use in contrib or not. After playing with them for awhile, I have to come down very firmly on the side of "not". I have these gripes: 1. The style is gratuitously different from every other

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-12 Thread David E. Wheeler
On Jul 12, 2008, at 12:17, Tom Lane wrote: BTW, I looked at the SQL file (citext.sql.in) a bit. Some comments: Thanks a million for these, Tom. I greatly appreciate it. * An explicit comment explaining that you're piggybacking on the I/O functions (and some others) for "text" wouldn't be ou

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-12 Thread Tom Lane
BTW, I looked at the SQL file (citext.sql.in) a bit. Some comments: * An explicit comment explaining that you're piggybacking on the I/O functions (and some others) for "text" wouldn't be out of place. * Lose the GRANT EXECUTEs on the I/O functions; they're redundant. (If you needed them, you'd

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-11 Thread David E. Wheeler
On Jul 11, 2008, at 20:22, Tom Lane wrote: Here's an updated version of citext.c. Some changes cosmetic, some not so much ... Thanks! Good catch on my missing all of the s/int/int32/ stuff related to citextcmp(). Stupid oversites on my part. I'll submit a new patch with these changes shor

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-11 Thread David E. Wheeler
On Jul 11, 2008, at 20:10, Tom Lane wrote: Oh, got one of those too ... [ time passes ... ] Nope, no leak seen here either, though you could possibly fry an egg on my laptop right now ... Yeah, gotta love that, right? So the issue must be with my version for 8.3.x, meaning, likely, my cust

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-11 Thread Tom Lane
Here's an updated version of citext.c. Some changes cosmetic, some not so much ... regards, tom lane /* * PostgreSQL type definitions for CITEXT 2.0. */ #include "postgres.h" #include "access/hash.h" #include "fmgr.h" #include "utils/builtins.h" #include "utils/format

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-11 Thread Tom Lane
"David E. Wheeler" <[EMAIL PROTECTED]> writes: > On Jul 11, 2008, at 19:18, Tom Lane wrote: >> I tried it here and didn't see any apparent memory leak, on two >> different systems. What platform are you testing on, and with >> what encoding/locale settings? > That's Mac OS X 10.5.4 "Leopard." Usi

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-11 Thread David E. Wheeler
On Jul 11, 2008, at 19:18, Tom Lane wrote: I tried it here and didn't see any apparent memory leak, on two different systems. What platform are you testing on, and with what encoding/locale settings? PostgreSQL 8.3.3 on i386-apple-darwin9.4.0, compiled by GCC i686- apple-darwin9-gcc-4.0.1 (

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-11 Thread Tom Lane
"David E. Wheeler" <[EMAIL PROTECTED]> writes: > Unfortunately, CITEXT seems to have a memory leak somewhere, I tried it here and didn't see any apparent memory leak, on two different systems. What platform are you testing on, and with what encoding/locale settings? reg

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-11 Thread David E. Wheeler
On Jul 11, 2008, at 16:45, Tom Lane wrote: Not sure about a memory leak, but this code is clearly wrong if str_tolower results in a change of physical string length (clearly possible in Turkish, for instance, and probably in some other locales too). You need to do strlen's on the lowercased str

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-11 Thread Tom Lane
"David E. Wheeler" <[EMAIL PROTECTED]> writes: > lcstr = str_tolower(VARDATA_ANY(left), VARSIZE_ANY_EXHDR(left)); > rcstr = str_tolower(VARDATA_ANY(right), VARSIZE_ANY_EXHDR(right)); > result = varstr_cmp( > lcstr, > VARSIZE_ANY_EXHDR(left), > rcstr, >

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-11 Thread David E. Wheeler
On Jul 11, 2008, at 13:02, Zdenek Kotala wrote: Thank you, Zdenek. Have you had a chance to try citext yet? Or did you just read the source? I tested version two on Solaris/SPARC and Sun studio compiler. I checked last version only quickly (comparing your changes). Thanks. I just update

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-11 Thread Zdenek Kotala
David E. Wheeler napsal(a): On Jul 11, 2008, at 12:37, Zdenek Kotala wrote: I'm sorry for late response. I'm little bit busy this week. I quickly look on your latest changes and it seems to me that everything is OK. I'm going to change status to ready for commit. After discussion with Pavel S

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-11 Thread David E. Wheeler
On Jul 11, 2008, at 12:37, Zdenek Kotala wrote: I'm sorry for late response. I'm little bit busy this week. I quickly look on your latest changes and it seems to me that everything is OK. I'm going to change status to ready for commit. After discussion with Pavel Stehule I changed meaning a

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-11 Thread Zdenek Kotala
David E. Wheeler napsal(a): Attached is a new version of a patch to add a CITEXT contrib module. Changes since v2: * Optimized citext_eq() and citext_ne() (the = and <> operators, respectively) by having them use strncmp() instead of varstr_cmp(). Per discussion. * Added `RESTRICT` and `JOI

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-09 Thread Alvaro Herrera
David E. Wheeler wrote: > On Jul 9, 2008, at 13:40, Alvaro Herrera wrote: >> One thing that jumps at me is pgTAP usage, as Zdenek said. I >> understand that it's neat and all that, but we can't include the >> tests because they won't run unless one installs pgTAP which seems a >> nonstarter. So

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-09 Thread David E. Wheeler
On Jul 9, 2008, at 13:40, Alvaro Herrera wrote: The problem is that we're in the middle of a commitfest, so everybody is busy reviewing other patches (in theory at least). Of course. One thing that jumps at me is pgTAP usage, as Zdenek said. I understand that it's neat and all that, but

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-09 Thread Alvaro Herrera
David E. Wheeler wrote: > I guess you're all just blown away by the perfection of this patch? ;-) The problem is that we're in the middle of a commitfest, so everybody is busy reviewing other patches (in theory at least). One thing that jumps at me is pgTAP usage, as Zdenek said. I understand th

Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-09 Thread David E. Wheeler
I guess you're all just blown away by the perfection of this patch? ;-) Best, David On Jul 7, 2008, at 18:03, David E. Wheeler wrote: Attached is a new version of a patch to add a CITEXT contrib module. Changes since v2: * Optimized citext_eq() and citext_ne() (the = and <> operators, re

Re: [HACKERS] PATCH: CITEXT 2.0 v2

2008-07-09 Thread David E. Wheeler
On Jul 7, 2008, at 12:06, David E. Wheeler wrote: I understand it but there is parallel project which should solve this problem completely I guess in "close" future (2-3years). Afterward this module will be obsolete and it will takes times to remove it from contrib. It seems to me that have

Re: [HACKERS] PATCH: CITEXT 2.0 v2

2008-07-08 Thread Zdenek Kotala
Martijn van Oosterhout napsal(a): On Mon, Jul 07, 2008 at 12:06:08PM -0700, David E. Wheeler wrote: I guess that'd be the reason to keep it on pgFoundry, but I have two comments: * 2-3 years is a *long* time in Internet time. There have been patches over the years, but they tend not to get

Re: [HACKERS] PATCH: CITEXT 2.0 v2

2008-07-07 Thread Martijn van Oosterhout
On Mon, Jul 07, 2008 at 12:06:08PM -0700, David E. Wheeler wrote: > I guess that'd be the reason to keep it on pgFoundry, but I have two > comments: > > * 2-3 years is a *long* time in Internet time. There have been patches over the years, but they tend not to get looked at. Recently someone pu

[HACKERS] PATCH: CITEXT 2.0 v3

2008-07-07 Thread David E. Wheeler
Attached is a new version of a patch to add a CITEXT contrib module. Changes since v2: * Optimized citext_eq() and citext_ne() (the = and <> operators, respectively) by having them use strncmp() instead of varstr_cmp(). Per discussion. * Added `RESTRICT` and `JOIN` clauses to the comparis

Re: [HACKERS] PATCH: CITEXT 2.0 v2

2008-07-07 Thread David E. Wheeler
Thanks to help from RhodiumToad on IRC, I got some things improved here: On Jul 7, 2008, at 16:24, David E. Wheeler wrote: So for some reason, after adding the indexes, the queries against the CITEXT column aren't using them. Furthermore, the `lower(text) LIKE lower(?)` query isn't using *it

Re: [HACKERS] PATCH: CITEXT 2.0

2008-07-07 Thread David E. Wheeler
On Jul 7, 2008, at 17:18, Tom Lane wrote: No, but you were: you proposed using strncmp for everything. Yes, that's right. I was trying to understand why I wouldn't use just one or the other. And the answer turned out to be, you wouldn't, except that strncmp() is an desirable optimization f

Re: [HACKERS] PATCH: CITEXT 2.0

2008-07-07 Thread Tom Lane
"David E. Wheeler" <[EMAIL PROTECTED]> writes: > On Jul 7, 2008, at 16:58, Tom Lane wrote: >> If that's so, you certainly can't use strncmp, because that would >> result >> in sort orderings totally different from lower()'s result. Even >> without >> that argument, for most multibyte cases you

Re: [HACKERS] PATCH: CITEXT 2.0

2008-07-07 Thread David E. Wheeler
On Jul 7, 2008, at 16:58, Tom Lane wrote: "David E. Wheeler" <[EMAIL PROTECTED]> writes: Hrm. So in your opinion, strncmp() could be used for all comparisons by citext, rather than varstr_cmp()? I thought the charter of this type was to work like lower(textcol). Correct. If that's so, you

Re: [HACKERS] PATCH: CITEXT 2.0

2008-07-07 Thread Tom Lane
"David E. Wheeler" <[EMAIL PROTECTED]> writes: > Hrm. So in your opinion, strncmp() could be used for all comparisons > by citext, rather than varstr_cmp()? I thought the charter of this type was to work like lower(textcol). If that's so, you certainly can't use strncmp, because that would resul

Re: [HACKERS] PATCH: CITEXT 2.0 v2

2008-07-07 Thread David E. Wheeler
No, *really* Sheesh, sorry. David try.sql Description: Binary data On Jul 7, 2008, at 16:26, David E. Wheeler wrote: And here is the script. D'oh! Thanks, David On Jul 7, 2008, at 16:24, David E. Wheeler wrote: On Jul 7, 2008, at 08:01, Andrew Dunstan wrote: What does still bot

Re: [HACKERS] PATCH: CITEXT 2.0 v2

2008-07-07 Thread David E. Wheeler
And here is the script. D'oh! Thanks, David try.sql Description: Binary data On Jul 7, 2008, at 16:24, David E. Wheeler wrote: On Jul 7, 2008, at 08:01, Andrew Dunstan wrote: What does still bother me is its performance. I'd like to know if any measurement has been done of using cite

Re: [HACKERS] PATCH: CITEXT 2.0 v2

2008-07-07 Thread David E. Wheeler
On Jul 7, 2008, at 08:01, Andrew Dunstan wrote: What does still bother me is its performance. I'd like to know if any measurement has been done of using citext vs. a functional index on lower(foo). Okay, here's a start. The attached script inserts random strings of 1-10 space-delimited wo

Re: [HACKERS] PATCH: CITEXT 2.0

2008-07-07 Thread David E. Wheeler
On Jul 7, 2008, at 13:59, Gregory Stark wrote: Of course the obvious case of two equivalent strings with different bytes would be two strings which differ only in case in a collation which doesn't distinguish based on case. So you obviously can't take this route for citext. Well, to be fa

Re: [HACKERS] PATCH: CITEXT 2.0

2008-07-07 Thread Gregory Stark
"David E. Wheeler" <[EMAIL PROTECTED]> writes: > On Jul 7, 2008, at 12:21, David E. Wheeler wrote: > >> My question is: why? Shouldn't they all use the same function for >> comparison? I'm happy to dupe this implementation for citext, but I don't >> understand it. Should not all comparisons be ex

Re: [HACKERS] PATCH: CITEXT 2.0

2008-07-07 Thread David E. Wheeler
On Jul 7, 2008, at 12:46, Zdenek Kotala wrote: So, the upshot is that the = and <> operators are not locale-aware, yes? They just do byte comparisons. Is that really the way it should be? I mean, could there not be strings that are equivalent but have different bytes? Correct. The problem

Re: [HACKERS] PATCH: CITEXT 2.0

2008-07-07 Thread David E. Wheeler
On Jul 7, 2008, at 13:10, Tom Lane wrote: We intentionally force such strings to be considered non-equal. See varstr_cmp, and if you like see the archives from back when that was put in. The = and <> operators are in fact consistent with the behavior of varstr_cmp (and had better be!); they're

Re: [HACKERS] PATCH: CITEXT 2.0

2008-07-07 Thread Tom Lane
"David E. Wheeler" <[EMAIL PROTECTED]> writes: > So, the upshot is that the = and <> operators are not locale-aware, > yes? They just do byte comparisons. Is that really the way it should > be? I mean, could there not be strings that are equivalent but have > different bytes? We intentionall

Re: [HACKERS] PATCH: CITEXT 2.0

2008-07-07 Thread Zdenek Kotala
David E. Wheeler napsal(a): On Jul 7, 2008, at 12:21, David E. Wheeler wrote: My question is: why? Shouldn't they all use the same function for comparison? I'm happy to dupe this implementation for citext, but I don't understand it. Should not all comparisons be executed consistently? Let me

Re: [HACKERS] PATCH: CITEXT 2.0

2008-07-07 Thread Pavel Stehule
2008/7/7 David E. Wheeler <[EMAIL PROTECTED]>: > On Jul 7, 2008, at 12:36, Pavel Stehule wrote: > >>> * Does it need to be locale-aware or not? >> >> yes! > > texteq() in varlena.c does not seem to be. > it's case sensitive and it's +/- binary compare Regards Pavel Stehule > Best, > > David > -

Re: [HACKERS] PATCH: CITEXT 2.0

2008-07-07 Thread David E. Wheeler
On Jul 7, 2008, at 12:36, Pavel Stehule wrote: * Does it need to be locale-aware or not? yes! texteq() in varlena.c does not seem to be. Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpr

Re: [HACKERS] PATCH: CITEXT 2.0

2008-07-07 Thread Pavel Stehule
2008/7/7 David E. Wheeler <[EMAIL PROTECTED]>: > On Jul 7, 2008, at 11:54, Alvaro Herrera wrote: > >>> Then why shouldn't I use strncmp() for all comparisons? >> >> I have no idea :-) -- because it's not locale-aware perhaps. > > Could someone who does have an idea answer these questions: > > * Doe

Re: [HACKERS] PATCH: CITEXT 2.0

2008-07-07 Thread David E. Wheeler
On Jul 7, 2008, at 12:21, David E. Wheeler wrote: My question is: why? Shouldn't they all use the same function for comparison? I'm happy to dupe this implementation for citext, but I don't understand it. Should not all comparisons be executed consistently? Let me try to answer my own que

Re: [HACKERS] PATCH: CITEXT 2.0 v2

2008-07-07 Thread David E. Wheeler
On Jul 7, 2008, at 12:10, Pavel Stehule wrote: Maybe we can have some "locale" test outside our regress tests - I think that would be useful. Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mai

Re: [HACKERS] PATCH: CITEXT 2.0

2008-07-07 Thread David E. Wheeler
On Jul 7, 2008, at 12:13, Zdenek Kotala wrote: I'm sorry. I meant bttext() http://doxygen.postgresql.org/varlena_8c-source.html#l01270 bttext should use in citextcmp function. It uses strcol function. I've no idea what bttext has to do with anything. Sorry if I'm being slow here. And ci

Re: [HACKERS] PATCH: CITEXT 2.0

2008-07-07 Thread Zdenek Kotala
David E. Wheeler napsal(a): On Jul 7, 2008, at 11:54, Alvaro Herrera wrote: Then why shouldn't I use strncmp() for all comparisons? I have no idea :-) -- because it's not locale-aware perhaps. Could someone who does have an idea answer these questions: * Does it need to be locale-aware or

Re: [HACKERS] PATCH: CITEXT 2.0 v2

2008-07-07 Thread Pavel Stehule
2008/7/7 Zdenek Kotala <[EMAIL PROTECTED]>: > David E. Wheeler napsal(a): >> >> On Jul 7, 2008, at 07:41, Zdenek Kotala wrote: >> >>> However, It seems to me that code is ok now (exclude citex_eq). I think >>> there two open problems/questions: >>> >>> 1) regression test - >>> >>> a) I think that

Re: [HACKERS] PATCH: CITEXT 2.0 v2

2008-07-07 Thread David E. Wheeler
On Jul 7, 2008, at 11:54, Zdenek Kotala wrote: Hmm, it is complex area and I'm not sure if anybody on planet know correct answer :-). I think that best approach is to keep it as is and when a problem occur then it will be fixed. Regression tests are really important, though. b) pgTap is so

Re: [HACKERS] PATCH: CITEXT 2.0

2008-07-07 Thread David E. Wheeler
On Jul 7, 2008, at 11:54, Alvaro Herrera wrote: Then why shouldn't I use strncmp() for all comparisons? I have no idea :-) -- because it's not locale-aware perhaps. Could someone who does have an idea answer these questions: * Does it need to be locale-aware or not? * Should I use strncmp()

Re: [HACKERS] PATCH: CITEXT 2.0 v2

2008-07-07 Thread Zdenek Kotala
Andrew Dunstan napsal(a): Zdenek Kotala wrote: 2) contrib vs. pgFoundry There is unresolved answer if we want to have this in contrib or not. Good to mention that citext type will be obsoleted with full collation implementation in a future. I personally prefer to keep it on pgFoundry bec

Re: [HACKERS] PATCH: CITEXT 2.0 v2

2008-07-07 Thread Zdenek Kotala
David E. Wheeler napsal(a): On Jul 7, 2008, at 07:41, Zdenek Kotala wrote: However, It seems to me that code is ok now (exclude citex_eq). I think there two open problems/questions: 1) regression test - a) I think that regresion is not correct. It depends on en_US locale, but it uses chara

Re: [HACKERS] PATCH: CITEXT 2.0

2008-07-07 Thread Alvaro Herrera
David E. Wheeler wrote: > On Jul 7, 2008, at 11:44, Alvaro Herrera wrote: > >>> I'm confused. What's the difference between strncmp() and >>> varstr_cmp()? >>> And why would I use one in one place and the other elsewhere? >>> Shouldn't >>> they be used consistently? >> >> Not necessarily -- see

Re: [HACKERS] PATCH: CITEXT 2.0

2008-07-07 Thread David E. Wheeler
On Jul 7, 2008, at 11:44, Alvaro Herrera wrote: I'm confused. What's the difference between strncmp() and varstr_cmp()? And why would I use one in one place and the other elsewhere? Shouldn't they be used consistently? Not necessarily -- see texteq. I think the point is that varstr_cmp()

Re: [HACKERS] PATCH: CITEXT 2.0

2008-07-07 Thread Alvaro Herrera
David E. Wheeler wrote: > On Jul 7, 2008, at 00:46, Zdenek Kotala wrote: > >> You have to use varstr_cmp in citextcmp. Your code is correct, >> because for >> < <= >= > operators you need collation sensible function. >> >> You need to change only citext_cmp function to use strncmp() or call >>

Re: [HACKERS] PATCH: CITEXT 2.0 v2

2008-07-07 Thread David E. Wheeler
On Jul 7, 2008, at 08:01, Andrew Dunstan wrote: What does still bother me is its performance. I'd like to know if any measurement has been done of using citext vs. a functional index on lower(foo). That's a good question. I can whip up a quick test by populating a column full of data and

  1   2   >