[SQL] Whole-row comparison ?

2007-05-17 Thread christian.roche.ext
Hi there,
 
I'm trying to implement a "versionned" storage for a datawarehouse
system, meaning I have tables which store daily information about
objects and I would like to have a row inserted (i.e. a new version)
only if it differs from the most recent existing row.  For instance
instead of storing
 
versionattribute1attribute2
1xy
2xy
3xy
4zy
5zy
6zt
 
we would only keep the diffs :
 
versionattribute1attribute2
1xy
4zy
6zt
 
This would save lots of space and the full table could be easily
accessed through a view.
 
However as the tables have a large number of rows (several hundreds) I
would like to avoid having to write conditions like WHERE (old.att1 !=
new.attr1 or old.attr2 != new.attr2 or ... or old.attr245 != new.attr245
)
 
So my question is: is there a way to do some kind of whole-row
comparison ? For instance a hash of the full row or something similar ?
Is there a standard way of solving this problem ?
 
Thanks a lot for any hind !
Christian


Re: [SQL] Whole-row comparison ?

2007-05-17 Thread christian.roche.ext

Ok I understand now that this row(t.*) syntax is new to postgres 8.2

As explained in the documentation, ยง4.2.11. Row Constructors:

A row constructor can include the syntax rowvalue.*, which will be expanded to 
a list of the elements of the row value, just as occurs when the .* syntax is 
used at the top level of a SELECT list. For example, if table t has columns f1 
and f2, these are the same:
SELECT ROW(t.*, 42) FROM t;
SELECT ROW(t.f1, t.f2, 42) FROM t;

This would obviously simplify the syntax of my queries a lot since I have so 
many columns.  I'm going to try and install the newest version and check it.  
I'll keep you informed.

Thanks to all!
Christian

--- 
> select * from temp."BSC_Table" t, public.bsc_view p where t.id = p.id 
> and row(t) <> row(p);
> 
> ERROR: operator does not exist: "temp"."BSC_Table" <> bsc_view SQL 
> state: 42883
> Hint: No operator matches the given name and argument type(s). You may 
> need to add explicit type casts.
> 

Don't forget to reply all so that everyone on the list can participate.  Also, 
do this is good since it doesn't limit you to my limited knowledge. ;)

row() doesn't do what you think it does.

you have to specify every column that you want to compare, so:

row(t.col1, t.col2, t.col3, t.col4) <> row(p.col1, p.col2, p.col3, p.col4)

where the datatypes of each respective column match. i.e. t.col1 maps to 
p.col1, ...

Regards,
Richard Broersma Jr.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[SQL] Versionning (was: Whole-row comparison)

2007-06-01 Thread christian.roche.ext

Hi all,

first let me tell you that this nice "whole-row comparison" feature of
postgres 8.2 allowed me to create a versionned database model very
neatly.  The SQL statement that inserts in the destination table only
the one rows that are new or that have changed since last time is very
simply written:

INSERT INTO bsc_table
SELECT nextval('version_seq'), 
FROM load.bsc_table AS ld LEFT JOIN bsc_view AS nt USING (obj_id)
WHERE nt.obj_id IS NULL OR row(nt.*) <> row(ld.*);

bsc_view is a view that returns the latest version of each object in the
bsc table:

CREATE VIEW bsc_view AS
SELECT  
FROM bsc_table
WHERE (obj_id, ver_id) IN (SELECT obj_id, max(ver_id) FROM bsc_table
GROUP BY obj_id);

This is all nice as long as I only want to access the very last version
of the table.  However what should be done if I now need to access an
earlier version ?  The most elegant way would be to pass a ver_id
parameter to bsc_view, something like :

CREATE VIEW bsc_view(int) AS
SELECT  
FROM bsc_table
WHERE (obj_id, ver_id) IN
  (SELECT obj_id, max(ver_id)
 FROM bsc_table
   WHERE ver_id <= $1
 GROUP BY obj_id));

However postgres doesn't allow parameters in views as far as I know.  I
guess I could create a function returning a set of rows, but then I
would lose most advantages of rewritten views, especially optimization,
right ?

I've contemplated reusing an awful hack from my Access era, namely using
a single-rowed table to store the parameter and joining the view on it.
The parameter would be updated before the view is called; this would
work but would definitely be ugly.  Can someone think of a better way to
do that ?

Thanks a lot,
Christian

-Original Message-
 
I'm trying to implement a "versionned" storage for a datawarehouse
system, meaning I have tables which store daily information about
objects and I would like to have a row inserted (i.e. a new version)
only if it differs from the most recent existing row.  For instance
instead of storing
 
versionattribute1attribute2
1xy
2xy
3xy
4zy
5zy
6zt
 
we would only keep the diffs :
 
versionattribute1attribute2
1xy
4zy
6zt

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [SQL] Versionning (was: Whole-row comparison)

2007-06-01 Thread christian.roche.ext
 
Hi Andrew,

I must admit I don't really understand what you say.  I don't know what
SRF stand for, and what you say about generic case is not clear to me,
sorry.

My idea is that using a parameter table allows me to keep using a view,
which is optimized for instance when used against a WHERE condition.

For example, I could write :

CREATE VIEW bsc_view AS
SELECT  
FROM bsc_table
WHERE (obj_id, ver_id) IN
  (SELECT obj_id, max(ver_id) FROM bsc_table, param_table
 WHERE ver_id <= param_table.ver_id
   GROUP BY obj_id));

and the following statement would be optimized:

UPDATE param_table SET ver_id = xxx;
SELECT * FROM bsc_view WHERE obj_id = yyy; 

which would not be the case would I have used a multi-row function.

Does this make sense ?

Thanks a lot,
Christian

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of ext Andrew Sullivan
Sent: Friday, June 01, 2007 17:47
To: [email protected]
Subject: Re: [SQL] Versionning (was: Whole-row comparison)

On Fri, Jun 01, 2007 at 08:07:46PM +0300, [EMAIL PROTECTED]
wrote:
> I've contemplated reusing an awful hack from my Access era, namely 
> using a single-rowed table to store the parameter and joining the view
on it.
> The parameter would be updated before the view is called; this would 
> work but would definitely be ugly.  Can someone think of a better way 
> to do that ?

I sort of don't see how that hack would be any different from a SRF. 
You'd lose the planner benefits anyway, I think, because you'd have to
plan for the generic case where the data could be anything, no?

A


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [SQL] Versionning (was: Whole-row comparison)

2007-06-04 Thread christian.roche.ext
 
Hi Andrew,

what is worrying me is that if I use a SRF, any additional WHERE
condition would not be taken into account before executing the
underlying query, e.g., in this request using a view, the WHERE
condition would be considered in the final query :

UPDATE params
SET version = ver_id;

SELECT *
FROM bsc_list_view
WHERE obj_id = 'xxx';

because the bsc_list_view would be expanded to the underlying request,
while using a SRF, the whole table would be scaned before the WHERE
condition is applied:

SELECT *
FROM bsc_list_srf(ver_id)
WHERE obj_id = 'xxx';

This is what I mean when I say that the optimization would be lost when
using a SRF.  Now what is the "Right Thing To Do" in this particular
case ?  The nicest thing would really to have parametrized view.  Is
there any fundamental reason why such a beast does not exist, or is it
only postgres (compared to higher-level RDBMS) ?

Thanks a lot !
Christian


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of ext Andrew Sullivan
Sent: Friday, June 01, 2007 18:51
To: [email protected]
Subject: Re: [SQL] Versionning (was: Whole-row comparison)


Yes, but I don't think it's true.  Because you change the value of
ver_id all the time, the actual result can't be collapsed to a constant,
so you end up having to execute the query with the additional value, and
you still have to plan that.  The same thing is true of a function,
which will have its plan prepared the first time you execute it.  (I
could be wrong about this; I suppose the only way would be to try it.)


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings