[SQL] running aggregates

2003-09-04 Thread Glenn
f1sum_f1  avg_f1  pseudo_recno
10  10  10  1
20  30  15  2
30  60  20  3
40  100 25  4
50  150 30  5
60  210 35  6

I know I can do this by iterating with a plpgsql routine, but there must
be a means simply using SQL

Thanks in advance
Glenn
ps sorry if this isn't the appropriate use for this forum

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


[SQL] Cascade delete question

2004-04-26 Thread Glenn MacGregor
Hi All,
I am using a cascade delete on the following table:
vidvnameparentname
1 [EMAIL PROTECTED]
2 [EMAIL PROTECTED] [EMAIL PROTECTED]
3 [EMAIL PROTECTED] [EMAIL PROTECTED]
4 [EMAIL PROTECTED] [EMAIL PROTECTED]
5 [EMAIL PROTECTED] [EMAIL PROTECTED]
So the delete cascade states if I am deleting a row whose vname matches 
the parentname delete those rows and it works fine.

I just changed the values of parentname (but the vname values stay the same)
vidvnameparentname
1 [EMAIL PROTECTED]
2 [EMAIL PROTECTED] n1
3 [EMAIL PROTECTED] n1
4 [EMAIL PROTECTED] n1
5 [EMAIL PROTECTED] n3
Is there a way to do the same cascade delete with these values? I can 
select the correct info from vname in a query:

select substring(vname, from 0 for position('@' in vname)) from table1;
This works, so I tried to put that in the cascade but it failed.
Is there any way to accomplish this?
Thanks
    Glenn MacGregor
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


[SQL] Bit by "commands ignored until end of transaction block" again

2009-07-22 Thread Glenn Maynard
Postgres kills a transaction when an error happens.  This is a pain;
it assumes that all SQL errors are unexpected and fatal to the
transaction.

There's a very simple case where it's not: UNIQUE checks.  I'm
generating a cache, with a simple flow:
 - Search for the cache key; if it exists, return its value.
 - If it didn't exist, create the data based on the key, insert it
into the table, and return it.

This has an obvious race: another thread looks up the same key and
creates it between the search and the insert.  Both threads will
create the cached data, thread A will insert it into the table, and
thread B will get an integrity error when it tries to insert it, since
it duplicates the unique key.

Here, by far the simplest fix is simply to ignore the integrity error.
 Both threads generated the same data; the failed insert is expected
and harmless.  Postgres is turning this into a fatal error.

There's so much that could make this trivially easy:

 - SQLite has the handy ON CONFLICT IGNORE, but Postgres has nothing
like that.  (ON CONFLICT REPLACE is great, too.)
 - Let me use SAVEPOINT outside of a transaction, with the effect of
starting a transaction with the savepoint and ending it when it's
committed.  Then, I could use savepoints without needing to know
whether I'm already in a transaction or not; one would simply be
started and committed for me if necessary.  (That's by far my biggest
issue with savepoints: they force me to either specify "a transaction
must be open when this function is called", or need to be able to
query whether one is running to decide whether to start a transaction
or a savepoint.  My function's use of transactions should be invisible
to the caller.)
 - Let me disable this error.  I don't want it.  (We're grownups; we
can decide for ourselves which errors are fatal.)

The first two are cleaner, since ignoring the error means I might
ignore some other integrity error from the same statement, but I can
live with that.

Lacking anything better, I'll probably end up dropping out of the ORM
and using some uglier SQL to work around this, but this is so trivial
that it's silly to have to do that.  I can't do it within the ORM; it
doesn't have the vocabulary.

Any tricks I'm missing?  It feels like Postgres is fighting me at
every turn with this one, and this isn't the first time I've had this
problem.

-- 
Glenn Maynard

-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Bit by "commands ignored until end of transaction block" again

2009-07-22 Thread Glenn Maynard
On Thu, Jul 23, 2009 at 1:31 AM, Richard Huxton wrote:
>>  - Let me use SAVEPOINT outside of a transaction,
>
> You are never outside a transaction. All queries are executed within a
> transaction.

"Transaction block", then, if you insist.

> I think this is the root of your problem - all queries are within a
> transaction so either:
> 1. You have a transaction that wraps a single statement. If you get an error
> then only that statement was affected.
> 2. You have an explicit BEGIN...COMMIT transaction which could use a
> savepoint.

Savepoints can only be used inside transaction blocks.  My function
has no idea whether it's being called inside a transaction block.

From inside a transaction block, my function would need to call
SAVEPOINT/RELEASE SAVEPOINT.

If it's not in a transaction block, it needs to call BEGIN/COMMIT
instead.  SAVEPOINT will fail with "SAVEPOINT can only be used in
transaction blocks".

This would be very simple and clean if the SAVEPOINT command
transparently issued BEGIN if executed outside of a transaction block,
marking the savepoint so it knows that when the savepoint is released
or rolled back, the associated transaction block needs to be committed
or rolled back, too.  At that point, you could stop using
BEGIN/COMMIT/ROLLBACK entirely, and just let savepoints do it, if you
wanted--with this, the transaction commands are essentially redundant.

I can't count the number of times I've wished for this.

> Typically, if you're in a plpgsql function you would just catch "unique"
> exception codes from your insert. Or, update, see if any rows were affected,
> if not try an insert and if that gives a duplicate go back and try the
> update. You might want the second approach if 99% of the time the cache is
> already populated.

It's just a simple INSERT, generated from a Model.objects.create() in Django.

>> Lacking anything better, I'll probably end up dropping out of the ORM
>> and using some uglier SQL to work around this, but this is so trivial
>> that it's silly to have to do that.  I can't do it within the ORM; it
>> doesn't have the vocabulary.
>
> The ORM can't control transactions, can't call functions or can't set
> savepoints?

It can't write the necessary SQL to say "insert this unless it already
exists", namely:

INSERT INTO cache (key, data) (
SELECT i.key, "data",
FROM
(VALUES ("key")) AS i(key)
LEFT JOIN cache prior_entry ON (prior_entry.key = "key")
WHERE
prior_entry.key IS NULL
)

It--Django--also doesn't have a mature transaction/savepoint system;
in fact, its transaction handling is an absolute mess.  I've written
helpers for my main codebase that simply says "wrap this in a
transaction block if one isn't already started, otherwise wrap it in a
savepoint".  I don't want to use that code here, because it's nitty
code: it needs to poke at Django internals to figure out whether it's
in a transaction block or not, and dealing with other API
compatibility issues.

-- 
Glenn Maynard

-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Bit by "commands ignored until end of transaction block" again

2009-07-23 Thread Glenn Maynard
On Thu, Jul 23, 2009 at 2:41 AM, Richard Huxton wrote:
> Ah [cue light-bulb effect], I think I understand. Your function isn't in the
> database is it? Surely your application knows if it's issuing BEGIN..COMMIT?

I'm writing a Python library call.  It has no idea whether the caller
happens to be inside a transaction already, and I don't want to
specify something like "always run this inside a transaction".
(Callers are equally likely to want to do either, and it's bad API to
force them to start a transaction--the fact that I'm using the
database at al should be transparent.)

> You'll have people with torches and pitchforks after you if you change
> RELEASE SAVEPOINT to mean COMMIT. I might even lend them my pitchfork.

RELEASE SAVEPOINT would only COMMIT the transaction *if* the savepoint
that it's releasing started it.  Every currently-valid case requires
that a transaction is already started, so no existing code would be
affected by this.

SAVEPOINT a; -- implicitly issues BEGIN because one wasn't started
RELEASE SAVEPOINT a; -- implicitly issues COMMIT because savepoint "a"
issued the BEGIN, not the user


BEGIN;
SAVEPOINT a;
RELEASE SAVEPOINT a; -- will not commit, because savepoint "a" didn't
start the transaction

Of course, there are other details--it probably shouldn't allow
ROLLBACK or COMMIT on an implicit transaction block, for example.

> Could it generate: "SELECT ensure_cache_contains(key,data)"? Then ten lines
> of plpgsql will neatly encapsulate the problem. That plpgsql can be
> automatically generated easily enough too.

I don't think so, at least not without digging into internals.  Django
is built around knowing all data types, so it'd need to be givne types
explicitly--for example, to know whether a timestamp should be
formatted as a timestamp, date or time.  (I do have a couple other
columns here--timestamps for cache expiration, etc.)  I'll have to ask
Django-side if there's a public API to do this, but I don't think
there is.

> Ah, the joys of badly designed ORMs. The nice thing is that there seem to be
> plenty of bad ones to choose from too. If your ORM doesn't handle
> transactions well, the more you use it the more difficult your life will
> become. I'd be tempted to tidy up your existing fixes and wrap Django's ORM
> as cleanly as you can. That's assuming they're not interested in patches.

The ORM on a whole is decent, but there are isolated areas where it's
very braindamaged--this is one of them.  They have a stable-release
API-compatibility policy, which I think just gets them stuck with some
really bad decisions for a long time.

-- 
Glenn Maynard

-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Need magical advice for counting NOTHING

2009-07-23 Thread Glenn Maynard
On Thu, Jul 23, 2009 at 1:01 AM, Andreas wrote:
> SELECT user_name, log_type_fk, COUNT(log_type_fk)
> FROM log
> JOIN users ON (user_id = user_fk)
> WHERE (ts IS BETWEEN  sometime  AND   another)
> GROUP BY user_name, log_type_fk
> ORDER BY user_name, log_type_fk

create table users (user_id integer, user_name varchar);
create table log_type  (log_type_id integer, log_type integer);
create table log (log_id integer, log_type_fk integer, user_fk integer);
insert into log_type (log_type_id, log_type) values (1, 1);
insert into log_type (log_type_id, log_type) values (2, 2);
insert into users (user_id, user_name) values (1, 'a');
insert into users (user_id, user_name) values (2, 'b');
insert into log (log_id, log_type_fk, user_fk) values (1, 1, 1);
insert into log (log_id, log_type_fk, user_fk) values (2, 2, 1);
insert into log (log_id, log_type_fk, user_fk) values (3, 2, 1);
insert into log (log_id, log_type_fk, user_fk) values (4, 1, 2);

SELECT user_name, log_type.log_type, sum((log_type_fk IS NOT
NULL)::integer) AS count
FROM users JOIN log_type ON (true) LEFT JOIN log ON (user_id = user_fk
AND log.log_type_fk = log_type.log_type)
GROUP BY user_name, log_type.log_type
ORDER BY user_name, log_type.log_type;


 user_name | log_type | count
---+--+---
 a |1 | 1
 a |2 | 2
 a |3 | 0
 b |1 | 1
 b |2 | 0
 b |3 | 0

-- 
Glenn Maynard

-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Bit by "commands ignored until end of transaction block" again

2009-07-26 Thread Glenn Maynard
> The above situation only arises if you run in autocommit mode which is the 
> default for psql (which I have *never* understood).

This is the short answer, in practice--assume that either a
transaction is started or will be started by the SAVEPOINT command,
and that if a COMMIT is needed (as a result of the SAVEPOINT or which
was already needed), that the caller will do it.

(I hate non-autocommit.  It defies basic code design instincts, which
tell me that whoever starts a transaction should finish it.  I
shouldn't be issuing a non-autocommit SAVEPOINT/RELEASE, and then
assuming the caller will COMMIT the transaction that was started
automatically.  I'm stuck with it in Django.  Yuck, but oh well;
battling the framework's idioms isn't going to help anything.)

On Thu, Jul 23, 2009 at 4:06 AM, Richard Huxton wrote:
>> I'm writing a Python library call.  It has no idea whether the caller
>> happens to be inside a transaction already, and I don't want to
>> specify something like "always run this inside a transaction".
>> (Callers are equally likely to want to do either, and it's bad API to
>> force them to start a transaction--the fact that I'm using the
>> database at al should be transparent.)
>
> That last bit is never going to work. There always needs to be some basic
> level of understanding between systems and transactions really have to be
> part of that for talking to a RDBMS. There will have to be a piece of code
> responsible for managing transactions somewhere in the
> middleware/application layers.

It's never 100% transparent--the case of making calls during a
transaction and then rolling the whole thing back still needs to be
documented.  The point, though, is that this isn't a database-centric
operation, so it shouldn't have usage restrictions like "must always"
or "must never be inside a transaction".

> All you're doing here is moving the point of confusion around, surely? At
> some point you still need to know whether you can issue
> BEGIN/ROLLBACK/COMMIT etc.

Not at all--I don't need to use any of these commands.  I just do this:

SAVEPOINT s;
INSERT INTO table ...;
RELEASE SAVEPOINT s;

to guarantee that my code's effect on the database is atomic.

someone else wrote:
> So, what you're really asking for boils down to nestable transactions?

That's how I've thought of savepoints from day one.  When I use them
in Python code, I use a with_transaction wrapper, which transparently
uses a transaction or a savepoint.

-- 
Glenn Maynard

-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


[SQL] Can COPY update or skip existing records?

2008-09-30 Thread Glenn Gillen
Hey all,

I've got a table with a unique constraint across a few fields which I
need to regularly import a batch of data into. Is there a way to do it
with COPY without getting conflicts on the unique contraint? I have no
was of being certain that some of the data I'm trying to load isn't in
the table already.

Ideally I'd like it to operate like MySQL's on_duplicate_key_update
option, but for now I'll suffice with just ignoring existing rows and
proceeding with everything else.

Thanks,

-- 
Glenn

-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


[SQL] Nested selects

2009-04-07 Thread Glenn Maynard
I'm deriving high scores from two tables: one containing data for each
time a user played (rounds), and one containing a list of stages:

CREATE TABLE stage (id SERIAL NOT NULL PRIMARY KEY, name VARCHAR);
CREATE TABLE round (id SERIAL NOT NULL PRIMARY KEY, score REAL,
stage_id INTEGER REFERENCES stage (id));
INSERT INTO stage (name) VALUES ('stage 1'), ('stage 2'), ('stage 3');
INSERT INTO round (stage_id, score) VALUES
  (1, 100), (1, 150), (1, 175),
  (2, 250), (2, 275), (2, 220),
  (3, 350), (3, 380), (3, 322);

SELECT r.* FROM round r
WHERE r.id IN (
-- Get the high scoring round ID for each stage:
SELECT
(
-- Get the high score for stage s:
SELECT r.id FROM round r
WHERE r.stage_id = s.id
ORDER BY r.score DESC LIMIT 1
)
FROM stage s
);

This works fine, and with a (stage_id, score DESC) index, is
reasonably fast with around 1000 stages.  round may expand to millions
of rows.

Unfortunately, it doesn't generalize to getting the top N scores for
each stage; LIMIT 2 isn't valid ("more than one row returned by a
subquery used as an expression").

I fiddled with putting the inner results in an array, without much
luck, and I'm not sure how well that'd optimize.  Having the results
in any particular order isn't important.  (In practice, the inner
select will often be more specific--"high scores on the west coast",
"high scores this month", and so on.)

This seems embarrassingly simple: return the top rounds for each
stage--but I'm banging my head on it for some reason.

-- 
Glenn Maynard

-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] changing multiple pk's in one update

2009-04-13 Thread Glenn Maynard
(JMdict?  I was playing with importing that into a DB a while back,
but the attributes in that XML are such a pain--and then my email died
while I was trying to get those changed, and I never picked it up
again.)

On Mon, Apr 13, 2009 at 1:20 PM, Stuart McGraw  wrote:
> 1 to the number of sentences in the entry) and the sentence text.  The pk is
> of course the entry id and the sense number.
> There are other tables that have fk's to the senses.

Your PK is a composite of (entry, order)?  Won't your foreign keys
elsewhere all break when you shift the order around?

> I guess I could add an "order"[1] column and use the sense number as a
> surrogate partial key to avoid the need for key renumbering,
> but all the api's (and the normal human way of thinking) are based
> on "sense number 1 of entry x", "sense number 2 of entry y", so
> one would need to maintain "order" as a gapless sequence (or add a new
> mapping layer to map from/to a arbitrary monotonic sequence
> to a 1,2,3,... sequence) -- the gain doesn't seem that big.

Why not do this in the straightforward way: three separate fields: a
regular, sequential PK; an FK to the entry; and an order number.  Add
an index on (entry_key, order_number).  It's a little more expensive
since you have a new column and index (the PK), but in a table with a
lot of plain text that's probably insignificant.  Now you can use the
plain PK for your FK's.

I'd anticipate other problems with changing your PKs.  If you're
referring to sense (100,3), and a concurrent user adds a sense, you
may suddenly find yourself referring to a different sense due to them
being reordered out from under you.  You'd have similar problems if
your sense is deleted entirely: instead of the row simply ceasing to
exist (and resulting in predictable, checkable errors), you may end up
silently referring to another sense.

Maybe I'm misunderstanding what you're doing, though.

You'd have to have no UNIQUE constraint on the (entry, order)
composite index, though, or you'll have the same problem when you
reorder them.

-- 
Glenn Maynard

-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] changing multiple pk's in one update

2009-04-13 Thread Glenn Maynard
On Mon, Apr 13, 2009 at 5:18 PM, Richard Broersma
 wrote:
>> Your PK is a composite of (entry, order)?  Won't your foreign keys
>> elsewhere all break when you shift the order around?
>
> If there really are foreign keys, then an update will not be allowed
> to shift a primary key unless the foreign key is set with ON UPDATE
> CASCADE then the shifts will be cascaded to all references
> automatically.

Right.  I'm still dusting off my SQL after a long period of corrosion.  :)

The same applies, though, if you have external references to the PK;
for example, if you have a web interface where "/sense/10/3" refers to
entry 10, sense 3.  If your senses are reordered by someone else, and
you operate on /10/3, you may suddenly find yourself viewing or
modifying (or deleting!) a different sense.  This could even happen
within the same transaction, if you're not very careful with
locking...

-- 
Glenn Maynard

-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] changing multiple pk's in one update

2009-04-13 Thread Glenn Maynard
On Mon, Apr 13, 2009 at 5:43 PM, Richard Broersma
 wrote:
> From what I've seen, this problem can affect both surrogate and
> natural key designs.  In both cases, care must be taken to ensure that
> an underling tuple hasn't been changed by any other clients before it
> attempts to commit its changed.   Probably the most common solution is
> to use optimistic locking, another solution that I know of is to use
> serialized transaction isolation.

Right, but if you have a delayed UI, you probably don't want to hold a
lock open--if the user is viewing "/100/3" and clicks "delete", you
need to make sure that the one you delete is the same /100/3 that the
user was viewing at the time.  That's harder to do...

-- 
Glenn Maynard

-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] changing multiple pk's in one update

2009-04-15 Thread Glenn Maynard
On Wed, Apr 15, 2009 at 8:43 AM, Jasen Betts  wrote:
> the update takes a long time too if it's updating all the rows.
> and updating the index piecewise at the same time.
> with the index extant I get from 20 (if the start and end ranges don't 
> overlap) and 28s (with ,
> to 28 seconds (maximum overlap) for a table with 100 (short) rows

He's using a compound key as his PK, eg. (entry, order), where there
are only a few order values for each entry, so I think the real case
is only updating order for a specific set of entries.

>> I have continued searching for other ways to do this
>> but most of my google results are floating in a sea
>> of "update if insert fails, like mysql" results so
>> I still have a little hope it is possible.
>> I thought I remember seeing, a year or two ago, an
>> update statement with an ordered subquery that avoided
>> duplicate key errors but I am probably misrembering.

Bear in mind that the update is failing based on the order the data is
in the table, not the PK order.

create table test (id integer primary key);
insert into test (id) values (2), (1), (3);
update test set id=id+1;
ERROR:  duplicate key value violates unique constraint "test_pkey"
update test set id=id-1;
ERROR:  duplicate key value violates unique constraint "test_pkey"

Both fail, because it tries to update 2 first.

I suppose in a real pinch, you could renumber in two steps.  For
example, if you wanted to delete id 6 and move everything else down:

insert into test (id) values (2), (7), (3), (1), (4), (5), (6), (8), (9);
begin;
set transaction isolation level serializable;
delete from test where id=6;
update test set id = id+100 where id >= 6;
update test set id = id-101 where id >= 6;
commit;

Not very nice, but if "id" is really a sequence number starting at 1
in your case and not an always-increasing generated regular serial (so
there's no chance of it actually reaching the arbitrarily large number
100), it should work.  (It'd probably be workable for real
serials, too, with a much larger offset.)

If someone else creates a new sense for that entry after the first
update, it'll sit on the order number you were about to use and the
operation will fail.  Serialize so nobody else will insert until
you're done.

-- 
Glenn Maynard

-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Distinct oddity

2009-05-09 Thread Glenn Maynard
On Sat, May 9, 2009 at 10:33 AM, Tom Lane  wrote:
> That only proves that adding the 'e' changes the sort order, which is
> completely unsurprising for any non-C locale.  What you need to do is
> dump out the *entire* results of the DISTINCT queries and look for the
> unmatched lines.  I'd try dumping to two files, stripping the 'e' with
> sed, and then sort/diff.

How could adding an "e" change the sorting of "Österreich/Welt (Ltg.)"
compared to "Šsterreichisches Verkehrsb ro AG" in de_DE or en_US (or
any locale)?

It's also odd that the "1. Mittelschule ..." line is getting sorted after those.

-- 
Glenn Maynard

-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Array: comparing first N elements?

2009-05-12 Thread Glenn Maynard
On Tue, May 12, 2009 at 4:05 AM, David Garamond  wrote:
> Previously I use TEXT column for parents, but arrays look interesting and
> convenient so I'm considering migrating to arrays. However, how do I rewrite
> this using arrays?
> SELECT * FROM product
> WHERE parents LIKE '0001/0010/%';
> In other words, testing against the first N elements in an array.

SELECT * FROM product
WHERE parents[1] = 1 AND parents[2] = 2;

I'd expect there to be a way to index this, on individual components
or a slice, eg.

CREATE INDEX parents_1 ON product(parents[1]);
CREATE INDEX parents_2to4 ON product(parents[2], parents[3], parents[4]);

... but this throws a parse error.  I don't have an immediate need for
this, but I'm curious if this is possible--it seems a natural part of
having a native array type.

-- 
Glenn Maynard

-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Distinct oddity

2009-05-13 Thread Glenn Maynard
For purposes of DISTINCT, I'd expect any sort order should do; all it
needs is for equal values to be grouped together.  If strcoll() ever
fails to do that, I'd call it a critical bug--even throwing total
garbage at it should result in a consistent ordering, even if the
ordering itself is totally meaningless.  Many sort functions depend on
this.

On Wed, May 13, 2009 at 8:37 AM, Maximilian Tyrtania
 wrote:
> Opened those files (with textwrangler, as I 've never used sed), stripped
> off the '$', sorted and looked at the differences (using textwranglers
> "compare documents"-feature).

Can you narrow down what triggers this?  Try copying off the table,
and running this:

**
CREATE FUNCTION is_inconsistent() RETURNS BOOLEAN LANGUAGE SQL AS $$
select (select count(distinct(f.bezeichnung))  from firmen_copy f) 
<>
   (select count(distinct(f.bezeichnung||'1')) from firmen_copy f)
$$;

-- If deleting [first,last] leaves the results inconsistent, return
true; otherwise
-- roll back the deletion and return false.
CREATE FUNCTION test_delete_range(first BIGINT, last BIGINT)
RETURNS BOOLEAN LANGUAGE plpgsql AS $$
BEGIN
DELETE FROM firmen_copy WHERE id BETWEEN first AND last;
IF is_inconsistent() THEN RETURN true; END IF;
SELECT 1/0;
EXCEPTION
WHEN division_by_zero THEN
RETURN false;
END;
$$;

CREATE FUNCTION test_func()
RETURNS INTEGER LANGUAGE plpgsql AS $$
DECLARE
total bigint;
BEGIN
IF NOT is_inconsistent() THEN RETURN -1; END IF;

LOOP
total := (SELECT MAX(id) FROM firmen_copy);

IF test_delete_range(0, total/2) THEN CONTINUE; END IF;
IF test_delete_range(total*1/4, total*3/4) THEN
CONTINUE; END IF;
IF test_delete_range(total/2, total) THEN CONTINUE; END IF;

RETURN 0;
END LOOP;
END;
$$;

SELECT test_func();
**

This assumes you have a primary key named "id", and that your IDs
start around 0 and are vaguely monotonic (renumber them in the copy if
necessary).  I can't easily test this code, of course, but it's a
simple binary search.  Depending on what's triggering this, it may or
may not be able to narrow in on a test case.

Tangentally, is there a better way of rolling back a function than a
dumb hack like "SELECT 1/0"?

-- 
Glenn Maynard

-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql