Re: [SQL] upsert doesn't seem to work..

2013-02-17 Thread Ben Morrow
Quoth [email protected] (Bert):
> 
> We continuously load data from flat files in our database.
> We first insert the data into unlogged tables (in the loadoltp schema), and
> then we use the 'upsert' statement to transfer the data from the load table
> into the tables we are going to use.
> 
> The load tables are unlogged, and don't have indexes / pk's on them. All
> our 'real tables', which contains the data, always have a pk consisting out
> of 2 fields. In the example those are 'tick_server_id' and 'item_id'.
> 
> At first everything seems to run ok, however it seems that new fields
> aren't always inserted as desired.
> 
> 
> This is an example query which causes troubles:

That query is basically equivalent to something like

create table "st_item" (
server_id   integer,
item_id integer,
item_desc   text,
primary key (server_id, item_id)
);
create table "st_item_insert" (
server_id   integer,
item_id integer,
item_desc   text
);

with "upsert" as (
update "st_item" et
set "item_desc" = e.item_desc
from "st_item_insert" e
where et.server_id = e.server_id
and et.item_id = e.item_id
returning et.server_id, et.item_id
)
insert into "st_item" 
("server_id", "item_id", "item_desc")
select et.server_id, et.item_id, et.item_desc
from "st_item_insert" et
where et.server_id not in (
select et.server_id
from "upsert" b) 
and et.item_id not in (
select et.item_id
from "upsert" b)

There are three problems here. The first is that the NOT IN subselect
selects from et instead of from b. In the context of this subselect "et"
is a table reference from outside the subselect, so it's treated as a
constant for each run of the subselect. That means that the subselect
will return the value you are testing against for every row in "upsert",
so if there were any updates at all you will make no insertions.

The second is that you are making two separate subselects. This means
that a row in st_item_insert will not be inserted if there is a row in
"upsert" with a matching server_id and a row in "upsert" with a matching
item_id, *even if they are different rows*. For instance, suppose
st_item_insert has

2   1   foo
1   2   bar
2   2   baz

and the 'foo' and 'bar' entries get updated. The 'baz' entry will then
not get inserted, because the first subselect will find the 'foo' row
and the second will find the 'bar' row. What you need is a single row
subselect, like this:

where (et.server_id, et.item_id) not in (
select "server_id", "item_id"
from "upsert")

The third is that upsert is not as simple as you think. It isn't
possible (at least, not in Postgres) to take a lock on a row which
doesn't exist, so it's possible that a concurrent transaction could
insert a row with a conflicting key between the time the UPDATE runs and
the time the INSERT runs. You need to either lock the whole table or use
the retry strategy documented in the 'Trapping Errors' section of the
PL/pgSQL documentation. Annoyingly, even 9's serializable transactions
don't seem to help here, at least not by my experiments.

Ben



-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Perform Function When The Rows Of A View Change

2013-02-17 Thread Ben Morrow
Quoth [email protected] (Adam):
> 
> I have a rather complicated view that is dependent upon multiple
> tables, consisting of several windowing and aggregate functions, as
> well as some time intervals. I would like to be able to perform a
> function, i.e. pg_notify(), whenever a row is added, changed, or
> removed from the view's result set.
> 
> I think the kicker is the fact that the set of results returned by the
> view is dependent on the current time.
> 
> Here's a simplified version of what's going on:
> 
> CREATE VIEW view2 AS (
>  SELECT view1.id, view1.ts
>FROM view1
>   WHERE view1.ts > (now() - '1 day'::interval)
> );
> 
> As such, even if there are no inserts, deletes, or updates performed
> on any of the tables that view1 depends on, the data contained in
> view2 will change as a function of time  (i.e. rows will disappear
> from the view as time elapses).  I have been unable to come up with a
> trigger or rule that can detect this situation and provide the
> notification I'm looking for.
> 
> I could just query the view over and over again, and look for changes
> as they occur. But I'm hoping to find a more elegant (and less
> resource-intensive) solution. Any ideas?

Well, in principle you could calculate the next time the view will
change assuming the tables don't change first, and have a client sit
there sleeping until that time. For instance, the view you posted will
next change at 

select min(t.ts)
from (
select view1.ts + '1 day'::interval "ts"
from view1
) t
where t.ts > now()

unless the tables view1 is based on change first.

Apart from the potential difficulty calculating that time, you would
need to be able to wake up that client early if one of the tables
changed. Setting triggers on the tables to send a notify to that client
(probably a different notify from the one that client then sends out to
other clients) should be sufficient, as long as that client uses
select(2) and PQconsumeInput to make sure it receives the notifications
in a timely fashion.

Ben



-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


[SQL] How to reject overlapping timespans?

2013-02-17 Thread Andreas

Hi,

I need to store data that has a valid timespan with start and enddate.

objects ( id, name, ... )
object_data ( object_id referencs objects(id), startdate, enddate, ... )

nothing special, yet

How can I have PG reject a data record where the new start- or enddate 
lies between the start- or enddate of another record regarding the same 
object_id?


Also it isn't allowed that such timespans lie within another.

There can be times where no valid data exists but there can't be more 
than one valid data-record for the same point in time.




--
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] How to reject overlapping timespans?

2013-02-17 Thread Andreas Kretschmer


Andreas  hat am 17. Februar 2013 um 18:02 geschrieben:
> Hi,
>
> I need to store data that has a valid timespan with start and enddate.
>
> objects ( id, name, ... )
> object_data ( object_id referencs objects(id), startdate, enddate, ... )
>
> nothing special, yet
>
> How can I have PG reject a data record where the new start- or enddate
> lies between the start- or enddate of another record regarding the same
> object_id?

With 9.2 you can use DATERANGE and exclusion constraints

test=# create table maps(id int, duration daterange, exclude using gist(id with
=, duration with &&));
NOTICE:  CREATE TABLE / EXCLUDE will create implicit index
"maps_id_duration_excl" for table "maps"
CREATE TABLE
test=*# insert into maps values (1,'(2013-01-01,2013-01-10]');
INSERT 0 1
test=*# insert into maps values (1,'(2013-01-05,2013-01-15]');
ERROR:  conflicting key value violates exclusion constraint
"maps_id_duration_excl"
DETAIL:  Key (id, duration)=(1, [2013-01-06,2013-01-16)) conflicts with existing
key (id, duration)=(1, [2013-01-02,2013-01-11)).
test=*#


Regards, Andreas


-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] How to reject overlapping timespans?

2013-02-17 Thread Andreas

Am 17.02.2013 19:20, schrieb Andreas Kretschmer:

Andreas  hat am 17. Februar 2013 um 18:02 geschrieben:

I need to store data that has a valid timespan with start and enddate.

objects ( id, name, ... )
object_data ( object_id referencs objects(id), startdate, enddate, ... )

nothing special, yet

How can I have PG reject a data record where the new start- or enddate
lies between the start- or enddate of another record regarding the same
object_id?

With 9.2 you can use DATERANGE and exclusion constraints

test=# create table maps(id int, duration daterange, exclude using gist(id with
=, duration with &&));
NOTICE:  CREATE TABLE / EXCLUDE will create implicit index
"maps_id_duration_excl" for table "maps"
CREATE TABLE
test=*# insert into maps values (1,'(2013-01-01,2013-01-10]');
INSERT 0 1
test=*# insert into maps values (1,'(2013-01-05,2013-01-15]');
ERROR:  conflicting key value violates exclusion constraint
"maps_id_duration_excl"
DETAIL:  Key (id, duration)=(1, [2013-01-06,2013-01-16)) conflicts with existing
key (id, duration)=(1, [2013-01-02,2013-01-11)).
test=*#



though I still have a 9.1.x as productive server so I'm afraid I have to 
find another way.


Thanks, Andreas   :)



--
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] How to reject overlapping timespans?

2013-02-17 Thread Ben Morrow
Quoth [email protected] (Andreas):
> Am 17.02.2013 19:20, schrieb Andreas Kretschmer:
> > Andreas  hat am 17. Februar 2013 um 18:02 geschrieben:
> >> I need to store data that has a valid timespan with start and enddate.
> >>
> >> objects ( id, name, ... )
> >> object_data ( object_id referencs objects(id), startdate, enddate, ... )
> >>
> >> nothing special, yet
> >>
> >> How can I have PG reject a data record where the new start- or enddate
> >> lies between the start- or enddate of another record regarding the same
> >> object_id?
> >
> > With 9.2 you can use DATERANGE and exclusion constraints
> 
> though I still have a 9.1.x as productive server so I'm afraid I have to 
> find another way.

If you don't fancy implementing or backporting a GiST operator class for
date ranges using OVERLAPS, you can fake one with the geometric types.
You will need contrib/btree_gist to get GiST indexes on integers.

create extension btree_gist;

create function point(date) 
returns point
immutable language sql
as $$ 
select point(0, ($1 - date '2000-01-01')::double precision)
$$;

create function box(date, date)
returns box
immutable language sql
as $$ 
select box(point($1), point($2))
$$;

create table objects_data (
object_id   integer references objects,
startdate   date,
enddate date,
exclude using gist
(object_id with =, box(startdate, enddate) with &&)
);

You have to use 'box' rather than 'lseg' because there are no indexes
for lsegs. I don't know how efficient this will be, and of course the
unique index will probably not be any use for anything else.

Ben



-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql