[SQL] converting varchar to integer

2005-08-16 Thread tv
Hi,

   I have a varchar column, and I need to

   1) check the value in it is an integer
   2) get the integer value (as integer)

   The problem is I can't suppose the're only correct
   values - ie there can be something like 'xssdkjsd',
   '230kdd' or even an empty string etc.

   I've been looking through the documentation but I've
   found no functions doing this. Are there such functions?

   I've been using to_number(...) function, but it raises
   an exception on an empty string.

   I've written two on my own (see the functions below),
   but maybe there's something faster?

   Tomas

-
-- converts the varchar value to integer
-- the value has to be already checked using the is_integer function
-
CREATE OR REPLACE FUNCTION to_integer(VARCHAR) RETURNS INTEGER AS '
DECLARE
str ALIAS FOR $1;
BEGIN
RETURN to_number(str,99);
END;
' LANGUAGE plpgsql;

-
-- checks whether the value is an integer (int4)
-
CREATE OR REPLACE FUNCTION is_integer(VARCHAR) RETURNS BOOLEAN AS '
DECLARE
str ALIAS FOR $1;
pos INT8 := 0;
BEGIN
-- only 0,1,...,9 (least one)
IF NOT str ~* ''^[0-9]+$'' THEN
RETURN false;
END IF;

SELECT INTO pos to_number($1,99);

-- check the boundaries
IF (-2147483648 <= pos) AND (+2147483647 >= pos) THEN
RETURN true;
ELSE
RETURN false;
END IF;
END;
' LANGUAGE plpgsql;

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [SQL] Query Performance

2009-12-07 Thread tv
Yes, the problem is the nested loop scan - it's scanning users 609070
times, which is awful.

Could you provide explain plan that executed fast? Was it executed with
the same parameter values or did the parameters change (maybe it's slow
for some parameters values only)?

Have you tried to rewrite the subselect to a join? I.e. something like this

select user_name,A.user_id, dnd_window_start, dnd_window_stop, B.subs as
subs, B.city_id as city_id, B.source_type as source_type from
users A left join user_subscriptions B on (A.user_id=B.user_id)
join subs_feed C ON (A.user_id = C.user_id)
where feed_id=1411 and f_sms='t'

But I guess it won't solve the issue (it seems PostgreSQL did this rewrite
on it's own).

Tomas

> Hello List,
>
> I have a query which use to run very fast now has turn into show stopper .
>
> PostgreSQL:8.2
>
> explain analyze select user_name,A.user_id, dnd_window_start,
> dnd_window_stop, B.subs as subs, B.city_id as city_id, B.source_type as
> source_type from
> users A left join user_subscriptions B on (A.user_id=B.user_id)
> where A.user_id in (select user_id from subs_feed where feed_id=1411 and
> f_sms='t')
> ;
>
>Nested Loop Left Join  (cost=986325.88..1094601.47 rows=11148
> width=55) (actual time=132635.994..1590487.280 rows=609070
> loops=1)
>->  Nested Loop  (cost=986325.88..1062280.53 rows=11148 width=26)
> (actual time=132630.057..1398299.117 rows=609070 loops=1)
>  ->  HashAggregate  (cost=986325.88..986437.36 rows=11148 width=4)
> (actual time=132591.648..133386.651 rows=609070 loops=1)
>->  Bitmap Heap Scan on subs_feed
> (cost=16316.71..985194.44 rows=452576 width=4) (actual
> time=20199.571..131566.494 rows=609070 loops=1)
>  Recheck Cond: (feed_id = 1411)
>  Filter: f_sms
>  ->  Bitmap Index Scan on feed_user_id
> (cost=0.00..16203.57 rows=681933 width=0) (actual
> time=19919.512..19919.512 rows=616900 loops=1)
>Index Cond: (feed_id = 1411)
>  ->  Index Scan using users_pkey on users a  (cost=0.00..6.79
> rows=1 width=26) (actual time=2.073..2.074 rows=1 loops=609070)
>Index Cond: (a.user_id = subs_feed.user_id)
>->  Index Scan using user_subscriptions_user_id_pk on
> user_subscriptions b  (cost=0.00..2.89 rows=1 width=33) (actual
> time=0.312..0.313 rows=1 loops=609070)
>  Index Cond: (a.user_id = b.user_id)
>  Total runtime: 1590755.918 ms
> (13 rows)
>
>
>
> This query runs almost half an hour. It is evident that nested loop is
> taking most of the time (approx 27 minutes).
>
> Any tips would be very useful.
>
> Also these table have below count:
>
> select relname,reltuples from pg_class where relname in
> ('users','user_subscriptions','subs_feed');
>   relname   |  reltuples
> +-
>  user_subscriptions |  3758304
>   users  | 1.95481e+07
>  subs_feed  | 2.96492e+07
>
>
> select n_tup_ins,n_tup_upd,n_tup_del,last_vacuum,last_analyze from
> pg_stat_user_tables where relname='user_subscriptions';
>  n_tup_ins | n_tup_upd | n_tup_del |   last_vacuum|
>last_analyze
> ---+---+---+--+--
>   86371397 |  25865942 | 0 | 2009-12-06 23:00:36.355251+05:30 |
> 2009-12-06 23:00:36.355251+05:30
>
>
>
> Thanks in advance for help ...
>
>
>



-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Howto "insert or update" ?

2010-11-22 Thread tv
> Currently I have a trigger function that should store a value in tableX
> whenever a certain column in tableY gets changed.
> I do it with:
> a)   delete from tableX where key = ( A, B )  ( regardless if there is one
> )
> b)   insert into tableX
>
> This seems not very efficient though it works.
> Is there a better way?

Efficiency matters only if the solution is correct, and that's not the
case of your function - there's a quite trivial race condition. Imagine
there are two transactions running at the same time, executing the
function concurrently.

The first one will succeed, while the other one will fail because of
unique constraint violation. A correct solution is something like this

BEGIN
  INSERT INTO ...
EXCEPTION
  -- the key already exists, so let's update
  WHEN unique_violation THEN
 UPDATE ...
END;

Regarding efficiency - I'm not aware of a better solution. There are plans
to implement true MERGE but that's in the future. All you can do right now
is to make sure the key is indexed (I guess it's a PK anyway) so that the
INSERT/UPDATE are fast.

Well, actually there's one other thing you could do - you can do a BEFORE
INSERT trigger that checks if the key already exists, and in case it does
switch to UPDATE. Something like

CREATE OR REPLACE FUNCTION my_trigger() RETURNS trigger as $$
BEGIN
   UPDATE my_table SET  WHERE key = NEW.key;
   IF (FOUND) THEN
  -- updated, do not execute the insert
  RETURN NULL;
   END IF;

   -- execute the insert
   RETURN NEW;
END;
$$ language plpgsql;

CREATE TRIGGER merge_trigger BEFORE INSERT ON my_table FOR EACH ROW
EXECUTE PROCEDURE my_trigger();

This is probably more elegant - just execute INSERT statements and it will
handle all the work. The only problem is it does not report the number of
updated rows (it just returns 0 in that case).

But generally it's just as efficient as the solution described above.

regards
Tomas


-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


[SQL] storing access rights in a postgres database

2006-10-10 Thread tv
Hi,

we are developping a web application in which we need to manage access to
several types of objects, the most important objects are 'company', 'projects',
'subproject', 'module' (and several others but that's not important for now).
In general these objects constitute a tree, as for example each company owns
several projects and each project belongs to exactly one company. So it's
relatively easy to store these objects and relations between them as it's a
simple 1:M relationship.

The funny part begins with the access rights on these objects - we want to store
them in a database in such a way to get:

(a) good performance - there will be several hundreds of users and objects etc.

(b) easy administering - all of that will be administered by humans, so it
should be as easy as possible

There'll be two basic types of questions:

(1) Does the user X have an access to the object Y? (i.e. Does the user have
access to the 'project X'?)

(2) To which objects at the level X can the user Y access? (i.e. 'To which
projects does the user have an access?')

The (b) in general means some kind of 'inheritance' is used, that is each node
in the tree inherits the access right from the node above him in case there's
no access right set directly on it. So the first step when deciding 'Does the
user X have an access to 'project Y?' would be to determine whether there's an
access right right on the project, and if not then the same question ('Does he
hava an access?' would be asked for the node above project (a 'firm' for
example).

I came up with a table

CREATE TABLE rights (
   user_id   INT NOT NULL,
   allowed   BOOLEAN NOT NULL,
   firm_id   INT,
   project_idINT,
   subproject_id INT,
   module_id INT
);

Where all the columns are references to the related tables (not important here).
The table is filled from 'left to right' that is if a column is NULL then all
the columns to right from it are NULL as well, thus each row has a meaning of a
path in the tree. For example

INSERT INTO rights(allowed,user_id,firm_id,project_id,subproject_id,module_id)
VALUES ('t',1,4,33,12,24);
INSERT INTO rights(allowed,user_id,firm_id,project_id,subproject_id,module_id)
VALUES ('t',1,4,NULL,NULL,NULL);

are valid rows, while

INSERT INTO rights(allowed,user_id,firm_id,project_id,subproject_id,module_id)
VALUES ('t',1,4,NULL,34,NULL);

is not valid as there's a 'gap' between '4' and '34'.

The question 'Does the user X have an access to object Y?' is then realized by
an SQL query (let the object be a project with id 3, belonging to firm with id
4, and let the user have id 1):

SELECT allowed FROM rights WHERE user_id = 1 AND (
 (firm_id = 4 AND project_id = 3 AND subproject_id IS NULL)
  OR (firm_id = 4 AND project_id IS NULL)
)
ORDER BY firm_id, project_id, subproject_id, module_id, allowed DESC LIMIT 1;

where the 'ORDER BY' clause sorts the results the rows so the most specific are
'at the top' and then choose 't' in prior to 'f'.

The problem is with the second type of queries (all objects the user has access
rights to) as all the ways to find that using SQL are very slow. For example to
get a list of all such projects for user with id 1 we use this:

SELECT id, (
SELECT allowed FROM rights WHERE user_id = 1 AND (
 (firm_id = projects.firm_id AND project_id = projects.id AND
subproject_id IS NULL)
  OR (firm_id = projects.firm_id AND project_id IS NULL)
)
ORDER BY firm_id, project_id, subproject_id, module_id, allowed DESC LIMIT 1
) as allowed
FROM projects;

The problem is in the 'LIMIT 1' clause - that's the reason I can't write that as
a join.

Does someone else has an idea how to solve this? If needed I can send more
complex examples and some testing data, explain plans, etc.

I've been thinking about some 'intermediate table' with results of the
subselect, updated by a set of triggers, but maybe there's some better
solution.

thanks for all your advices
Tomas

PS: We're not granting right directly to users of course - we are using roles,
but it's not necessary here. Just imagine role_id instead of user_id in all the
text.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [SQL] SQL tree duplication

2008-01-24 Thread tv
> Hi,
>
> I have the following kind of sql structure, in 3 levels:
>
> ---
> create table documents (
>   id serial,
>   name varchar(50),
>   primary key (id)
> );
>
> create table lines (
>   id serial,
>   name varchar(50),
>   document_id integer,
>   primary key (id),
>   foreign key (document_id) references documents (id)
> );
>
> create table line_details (
>   id serial,
>   name varchar(50),
>   line_id integer,
>   primary key (id),
>   foreign key (line_id) references lines (id)
> );
> ---
>
> I'd like to be able to "duplicate" a document, with all of its lines and
> line details.
>
> Is there any easy way to do that with Postgresql? The only solution I
> can think of at the moment is to loop through all lines and line
> details, and replace foreign keys properly with values fetch with
> "currval". It should work just fine, but I was wondering if some
> advanced features of Postgresql could help in this situation.

You will have to do that in 3 steps (one for each table), but looping may
not be necessary - just use INSERT ... SELECT ... syntax. Something like

INSERT INTO Lines SELECT FROM Lines WHERE document_id = OLD_ID;

But it depends on primary keys in the Lines and Line_details tables - if
the primary keys are composed (and the document_id is part of them) then
there is no problem with duplicities. Otherwise you'll have to solve it
somehow, and looping may be necessary.

Tomas


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [SQL] Subsorting GROUP BY data

2008-11-10 Thread tv
What about replacing the table by

SELECT * FROM my_table ORDER BY num

i.e. something like

SELECT cat, LAST(id), LAST(num) FROM (SELECT * FROM my_table ORDER BY num)
AS foo GROUP_BY cat;

Hope it works, just guessing it might help :-)

regards
Tomas

> SELECT cat, MAX(num) FROM my_table GROUP_BY cat;
>
> But if I add the "id" column, of course it doesn't work, since it's not
> in an aggregate function or in the GROUP_BY clause.  So I found a post
> at http://archives.postgresql.org/pgsql-hackers/2006-03/msg01324.php
> which describes how to add a "FIRST" and "LAST" aggregate function to
> PGSQL.  However, first and last don't seem to help unless you are able
> to "subsort" the grouping by the # (ie, group by cat, then subsort on
> num, and select the "last" one of the group).
>
> I would think something like the following would work, except that PGSQL
> does not like the SQL generated (it basically says I can't have a
> GROUP_BY after an ORDER_BY).  And if I move the "ORDER_BY" to the end,
> that just orders the returned groupings, so that doesn't help me either.
>
> SELECT cat, LAST(id), LAST(num) FROM my_table ORDER_BY num GROUP_BY cat;
>
>
> So does anyone know how to sort *within* a grouping so that FIRST and
> LAST return meaningful results?


-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql