Re: [BUGS] BUG #5748: Invalid oidvector data during binary recv

2010-11-15 Thread Yeb Havinga

On 2010-11-11 17:02, Heikki Linnakangas wrote:

On 11.11.2010 17:48, Tom Lane wrote:

"Yeb Havinga"  writes:

postgres=# create table a as select ''::oidvector;
SELECT 1
postgres=# copy a to '/tmp/test' with binary;
COPY 1
postgres=# copy a from '/tmp/test' with binary;
ERROR:  invalid oidvector data


The problem seems to be that array_recv passes back a zero-dimensional
array, *not* a 1-D array, when it observes that the input has no
elements.  A zero-D array is not part of the subset of possible arrays
that we allow for oidvector.


Yeah, I just reached that conclusion too..
The patch below changes array_recv, so that it returns an empty 1-D 
array when an empty 1-D array was written binary. No changes in 
oidvectorrecv or int2vectorrecv are needed.


diff --git a/src/backend/utils/adt/arrayfuncs.c 
b/src/backend/utils/adt/arrayfuncs.c

index 4ceb256..98e725a 100644
--- a/src/backend/utils/adt/arrayfuncs.c
+++ b/src/backend/utils/adt/arrayfuncs.c
@@ -1288,7 +1288,7 @@ array_recv(PG_FUNCTION_ARGS)
my_extra->element_type = element_type;
}

-   if (nitems == 0)
+   if (ndim == 0)
{
/* Return empty array ... but not till we've validated 
element_type */

PG_RETURN_ARRAYTYPE_P(construct_empty_array(element_type));

Make check passes.

regards,
Yeb Havinga


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #5748: Invalid oidvector data during binary recv

2010-11-15 Thread Heikki Linnakangas

On 15.11.2010 12:08, Yeb Havinga wrote:

On 2010-11-11 17:02, Heikki Linnakangas wrote:

On 11.11.2010 17:48, Tom Lane wrote:

"Yeb Havinga" writes:

postgres=# create table a as select ''::oidvector;
SELECT 1
postgres=# copy a to '/tmp/test' with binary;
COPY 1
postgres=# copy a from '/tmp/test' with binary;
ERROR: invalid oidvector data


The problem seems to be that array_recv passes back a zero-dimensional
array, *not* a 1-D array, when it observes that the input has no
elements. A zero-D array is not part of the subset of possible arrays
that we allow for oidvector.


Yeah, I just reached that conclusion too..

The patch below changes array_recv, so that it returns an empty 1-D
array when an empty 1-D array was written binary. No changes in
oidvectorrecv or int2vectorrecv are needed.


That seems like a bad idea. array_in() represents an empty array with 
zero dimensions, we're not going to change the generic array_recv() 
function used for all arrays to behave differently because of some 
corner-case in oidvectorrecv.


If we want do anything about this, the right fix is to add a special 
case in oidvectorrecv/int2vectorrecv to handle empty array.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #5748: Invalid oidvector data during binary recv

2010-11-15 Thread Yeb Havinga

On 2010-11-15 11:40, Heikki Linnakangas wrote:

On 15.11.2010 12:08, Yeb Havinga wrote:

On 2010-11-11 17:02, Heikki Linnakangas wrote:

On 11.11.2010 17:48, Tom Lane wrote:

"Yeb Havinga" writes:

postgres=# create table a as select ''::oidvector;
SELECT 1
postgres=# copy a to '/tmp/test' with binary;
COPY 1
postgres=# copy a from '/tmp/test' with binary;
ERROR: invalid oidvector data


The problem seems to be that array_recv passes back a zero-dimensional
array, *not* a 1-D array, when it observes that the input has no
elements. A zero-D array is not part of the subset of possible arrays
that we allow for oidvector.


Yeah, I just reached that conclusion too..

The patch below changes array_recv, so that it returns an empty 1-D
array when an empty 1-D array was written binary. No changes in
oidvectorrecv or int2vectorrecv are needed.


That seems like a bad idea. array_in() represents an empty array with 
zero dimensions, we're not going to change the generic array_recv() 
function used for all arrays to behave differently because of some 
corner-case in oidvectorrecv.
It's possible to trigger it with any element type, as long as the use 
manages to construct a 1-D array. E.g. with contrib/_int.sql loaded, 
array substraction is possible. Now:

postgres=# select ARRAY[1]::int4[] - ARRAY[1]::int4[];
 ?column?
--
 {}
(1 row)

postgres=# select array_ndims(ARRAY[1]::int4[] - ARRAY[1]::int4[]);
 array_ndims
-
   1
(1 row)

Binary send and recv of the value at hand would change it into a 0-D vector.



If we want do anything about this, the right fix is to add a special 
case in oidvectorrecv/int2vectorrecv to handle empty array.
Judging from the fact that it also happens with int4 arrays, this seems 
like the wrong place.


regards,
Yeb Havinga




--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #5748: Invalid oidvector data during binary recv

2010-11-15 Thread Tom Lane
Yeb Havinga  writes:
> Binary send and recv of the value at hand would change it into a 0-D vector.

The reason for that is that in general we don't make a distinction
between zero-size arrays of different dimensions.  oidvector and
int2vector are different though.  Which is why this should be fixed
locally to those two types, rather than changing the behavior of
regular arrays.

regards, tom lane

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #5748: Invalid oidvector data during binary recv

2010-11-15 Thread Tom Lane
I wrote:
> Yeb Havinga  writes:
>> Binary send and recv of the value at hand would change it into a 0-D vector.

> The reason for that is that in general we don't make a distinction
> between zero-size arrays of different dimensions.

Actually, after consuming a bit more caffeine, I see what Yeb is on about.
Even though the system in general doesn't make much of a distinction
between zero-element arrays of different dimensionalities, there *are*
functions that can distinguish --- array_ndims() being the most obvious
one.  Shouldn't we ensure that binary dump and reload of an array value
doesn't change the value in any SQL-observable way?  If so, I think his
patch is correct, even though it's changing more than just the
originally-complained-of behavior.

While I'm looking at this ... why is it that array_ndims returns NULL
and not 0 for a zero-dimensional array?  0-D arrays might have been
unsupported at one time, but they're certainly considered valid now.

regards, tom lane

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #5748: Invalid oidvector data during binary recv

2010-11-15 Thread Greg Stark
On Mon, Nov 15, 2010 at 4:51 PM, Tom Lane  wrote:
> Actually, after consuming a bit more caffeine, I see what Yeb is on about.
> Even though the system in general doesn't make much of a distinction
> between zero-element arrays of different dimensionalities, there *are*
> functions that can distinguish --- array_ndims() being the most obvious
> one.  Shouldn't we ensure that binary dump and reload of an array value
> doesn't change the value in any SQL-observable way?  If so, I think his
> patch is correct, even though it's changing more than just the
> originally-complained-of behavior.

We went to a lot of effort to preserve lower bounds for dumped arrays
so I would agree. I was actually one of the few people that actually
ran into this prior to the fix. We had arrays generated by the intagg
functions which were 0-based and after dumping and reloading were
1-based causing our functions which calculated the array sizes to
misbehave.

>
> While I'm looking at this ... why is it that array_ndims returns NULL
> and not 0 for a zero-dimensional array?  0-D arrays might have been
> unsupported at one time, but they're certainly considered valid now.

Is this the same question as split() on enmpty strings? Do we have a
problem distinguishing between a 0-dimensional array and a
1-dimensional empty array?

-- 
greg

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #5748: Invalid oidvector data during binary recv

2010-11-15 Thread Tom Lane
Greg Stark  writes:
> Is this the same question as split() on enmpty strings? Do we have a
> problem distinguishing between a 0-dimensional array and a
> 1-dimensional empty array?

Internally they're certainly different.  I've just been sniffing around
the code a bit, and the main problem I see with trying to be more
rigorous about this is that array_out just prints '{}' for any
zero-element array, regardless of the recorded dimensionality.  We could
reserve that notation to mean a 0-D array (which is what array_in reads
it as).  But then we have to figure out what to print for other cases.
The only good idea that comes to mind is
[1:0]={}
(or variants of that depending on what the stored dimensions actually
are).  array_in currently rejects that:

regression=# select '[1:0]={}'::int[]; 
ERROR:  upper bound cannot be less than lower bound

but perhaps it wouldn't be too painful to tweak it to allow the case.

It appears to be fairly hard to actually get a non-zero-D empty array
into the system, so while this is pretty ugly I think it wouldn't affect
common usage.

Comments?

regards, tom lane

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


[BUGS] Bug in concurrency control in temporary GiST indexes

2010-11-15 Thread Heikki Linnakangas
While hacking on the GiST insertion algorithm, I noticed a bug with 
temporary GiST indexes. The scan algorithm uses LSNs to detect a 
concurrent page split, but for a temporary index, we just use a constant 
LSN on the assumption that there can't be anyone else modifying the 
index concurrently. But that assumption is not true. While a temporary 
index can't be modified by other backends, it's entirely possible to 
insert new tuples to the index within the same backend while a scan is 
in progress.


Here's a test case, using btree_gist. If you modify it to use a 
non-temporary table, it works fine, but with a temporary table the 2nd 
SELECT misses some rows, when new rows are inserted in the middle of the 
scan.


Operations on temporary tables are not WAL-logged, so we don't have LSNs 
to use, but we just need a monotonically increasing sequence of numbers 
for this purpose. Attached patch provides a function to generate such 
fake LSNs.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
-- Initialize a temporary table with 1000 rows and a gist index.
CREATE TEMPORARY TABLE tt (a integer);
CREATE INDEX i_tt ON tt USING gist (a);
INSERT INTO tt SELECT generate_series(1,1000);

-- Fetch n tuples from "tt", and write them to table called "resultset"
CREATE OR REPLACE FUNCTION fetchfunc(n int) RETURNS bool AS $$
declare
  r record;
  c refcursor;
begin
  c:= 'foocur';
  FOR i IN 1..n LOOP
FETCH FROM c INTO r;
IF r IS NULL THEN RETURN true; END IF;
INSERT INTO resultset VALUES (r.a);
  END LOOP;
  RETURN false;
end;
$$ LANGUAGE plpgsql;

-- ensure that we use a plain index scan.
set enable_bitmapscan=off;
set enable_seqscan=off;

CREATE TEMPORARY TABLE resultset (a integer);

-- Fetch all rows from tt into resultset, with no inserts in the middle
BEGIN;
DECLARE  foocur CURSOR FOR SELECT a FROM tt WHERE a > 100;
SELECT fetchfunc(100);
SELECT fetchfunc(1000);
COMMIT;
SELECT COUNT(*) FROM resultset; -- Display result

truncate resultset;

-- Do the same, but insert new tuples in the middle of the scan. The result
-- should be the same.
BEGIN;
DECLARE  foocur CURSOR FOR SELECT a FROM tt WHERE a > 100;
SELECT fetchfunc(100);
INSERT INTO tt SELECT a% 10 FROM generate_series(1,1000) a;
SELECT fetchfunc(1000);
COMMIT;
SELECT COUNT(*) FROM resultset; -- Display result
diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index 3054f98..8c2dbc9 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -22,8 +22,6 @@
 #include "storage/indexfsm.h"
 #include "utils/memutils.h"
 
-const XLogRecPtr XLogRecPtrForTemp = {1, 1};
-
 /* Working state for gistbuild and its callback */
 typedef struct
 {
@@ -132,7 +130,7 @@ gistbuild(PG_FUNCTION_ARGS)
 		PageSetTLI(page, ThisTimeLineID);
 	}
 	else
-		PageSetLSN(page, XLogRecPtrForTemp);
+		PageSetLSN(page, GetXLogRecPtrForTemp());
 
 	UnlockReleaseBuffer(buffer);
 
@@ -423,7 +421,7 @@ gistplacetopage(GISTInsertState *state, GISTSTATE *giststate)
 		{
 			for (ptr = dist; ptr; ptr = ptr->next)
 			{
-PageSetLSN(ptr->page, XLogRecPtrForTemp);
+PageSetLSN(ptr->page, GetXLogRecPtrForTemp());
 			}
 		}
 
@@ -491,7 +489,7 @@ gistplacetopage(GISTInsertState *state, GISTSTATE *giststate)
 			PageSetTLI(state->stack->page, ThisTimeLineID);
 		}
 		else
-			PageSetLSN(state->stack->page, XLogRecPtrForTemp);
+			PageSetLSN(state->stack->page, GetXLogRecPtrForTemp());
 
 		if (state->stack->blkno == GIST_ROOT_BLKNO)
 			state->needInsertComplete = false;
@@ -1027,7 +1025,7 @@ gistnewroot(Relation r, Buffer buffer, IndexTuple *itup, int len, ItemPointer ke
 		PageSetTLI(page, ThisTimeLineID);
 	}
 	else
-		PageSetLSN(page, XLogRecPtrForTemp);
+		PageSetLSN(page, GetXLogRecPtrForTemp());
 
 	END_CRIT_SECTION();
 }
diff --git a/src/backend/access/gist/gistutil.c b/src/backend/access/gist/gistutil.c
index 28ab575..9de08f9 100644
--- a/src/backend/access/gist/gistutil.c
+++ b/src/backend/access/gist/gistutil.c
@@ -677,3 +677,22 @@ gistoptions(PG_FUNCTION_ARGS)
 		PG_RETURN_BYTEA_P(result);
 	PG_RETURN_NULL();
 }
+
+/*
+ * Temporary GiST indexes are not WAL-logged, but we need LSNs to detect
+ * concurrent page splits anyway. GetXLogRecPtrForTemp() provides a fake
+ * monotonically increasing sequence of LSNs for that purpose.
+ */
+XLogRecPtr
+GetXLogRecPtrForTemp(void)
+{
+	static XLogRecPtr counter = {0, 1};
+
+	counter.xrecoff++;
+	if (counter.xrecoff == 0)
+	{
+		counter.xlogid++;
+		counter.xrecoff++;
+	}
+	return counter;
+}
diff --git a/src/backend/access/gist/gistvacuum.c b/src/backend/access/gist/gistvacuum.c
index 0ff5ba8..dbe9406 100644
--- a/src/backend/access/gist/gistvacuum.c
+++ b/src/backend/access/gist/gistvacuum.c
@@ -268,7 +268,7 @@ gistbulkdelete(PG_FUNCTION_ARGS)
 	pfree(rdata);
 }
 else
-	PageSetLSN(page, XLogRecPtrForTemp);
+	PageSetLSN(page, GetXLogRecPtrForTemp());
 
 END_CRIT_SECTION();
 			}
diff --git a/src/include/access/gist_private.h b/src/include/acce

[BUGS] BUG #5754: CTE optimization fails to account for side effects

2010-11-15 Thread David Fetter

The following bug has been logged online:

Bug reference:  5754
Logged by:  David Fetter
Email address:  da...@fetter.org
PostgreSQL version: 8.4+
Operating system:   All
Description:CTE optimization fails to account for side effects
Details: 

Here's how to reproduce:

BEGIN;
CREATE SEQUENCE my_seq;
WITH t AS (SELECT nextval('my_seq')) VALUES(1);
SELECT currval('my_seq');

ERROR:  currval of sequence "my_seq" is not yet defined in this session

What's happened is that the optimization didn't account for the idea that a
SELECT might have a side effect, and if we're going with the "CTEs execute
exactly once and (equivalent to) fully," this is a bug.

CTEs should not optimize away (parts of) SELECTs that have side effects.

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #5754: CTE optimization fails to account for side effects

2010-11-15 Thread Tom Lane
"David Fetter"  writes:
> CREATE SEQUENCE my_seq;
> WITH t AS (SELECT nextval('my_seq')) VALUES(1);
> SELECT currval('my_seq');

> ERROR:  currval of sequence "my_seq" is not yet defined in this session

> What's happened is that the optimization didn't account for the idea that a
> SELECT might have a side effect, and if we're going with the "CTEs execute
> exactly once and (equivalent to) fully," this is a bug.

The reason it's not a bug is that we have not adopted that position.
There is a proposal to make it so for wCTEs, but that doesn't mean
we should change the existing, documented and useful behavior of regular
CTEs.

(If you're wondering where it's documented, I cite section 7.8's
statement that only as much of a CTE query is evaluated as is read by
the parent query.  The limiting case of that is no reference -> no rows
read.)

regards, tom lane

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs