Re: [GENERAL] Views and permissions

2008-01-22 Thread Albe Laurenz
Christian Schröder wrote:
> Indeed, you are right! Granting select permission to the "ts_frontend" 
> user (more precisely: granting membership to the "zert_readers" role) 
> solved the problem.
>
>> This is strange because ts_frontend can select from "EDITORS" because
>> of the membership to role zert_readers.
>
> No, the user "ts_frontend" is (was) not a member of the group 
> "zert_readers", but the user "www" who uses the view is.

My mistake, I got confused in your role hierarchy.
Anyway, the problem is solved, and the cause is clear :^)

Yours,
Laurenz Albe

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] Stored procedures when and how: was: Sun acquires MySQL

2008-01-22 Thread Merlin Moncure
On Jan 22, 2008 2:24 AM, Ivan Sergio Borgonovo <[EMAIL PROTECTED]> wrote:
> > > I doubt that what you were measuring there was either procedure
> > > call overhead or java computational speed; more likely it was the
> > > cost of calling back out of java, through pl/java's JDBC
> > > emulation, down through SPI, to re-execute the same INSERT that
> > > you then decided to execute directly.  In particular, if
> > > pl/java's JDBC doesn't know anything about caching query plans,
> > > performance for simple inserts could be expected to go into the
> > > tank just because of that.  (Whether it actually does or not, I
> > > have no idea --- but I would expect it to be a lot less mature
> > > than the mainstream JDBC driver for PG, and that took years to
> > > get smart about prepared queries ...)
>
> > > Without knowing where the bottleneck actually is, it's
> > > unreasonable to assume that it would hurt a different use-case.
>
> > Tom,
> > I have read several of your post on store procedure performance.
> > Why not give us your take on what works and what does not.
>
> Yep, the more I read, the more I get confused.
> Java loading overhead is a common myth (I can't say if true or false),
> and what Tom writes above can find a tentative place in my mind.
> But still then I can't understand where plsql should or shouldn't be
> used.

It's fairly trivial to test performance of functions vs. raw
statements, or just about anything going on with the server.  The
benchmarking tool, pgbench, allows custom sql which is great for
things like this.  It would have shown you that functions themselves
are not the reason why your application was not running quickly.  My
seat of the pants guess (I don't do java) was that your problem was in
the jdbc driver somewhere.  When using a high level database framework
like jdbc or ado.net, it can be difficult to figure out exactly what
is going on with the database at times...I tend to avoid them.

merlin

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] create a limited user in Postgre

2008-01-22 Thread Sim Zacks
That's because the definitions of the functions are not stored in the 
schema.


functions are stored in the pg_catalog schema in the pg_proc table and 
views are accessible via the multiple pg_get_viewdef functions in the 
same schema.


If you block access to that table and fucntion then pgadmin would not be 
able to show the function or view definition.



However, as long as the user has access to a backup, they have access to 
all the definitions, no matter what you do.



Sim



 Original Message  
Subject: [GENERAL] create a limited user in Postgre
From: Willy-Bas Loos <[EMAIL PROTECTED]>
To: Sim Zacks <[EMAIL PROTECTED]>
Date: Tuesday, January 22, 2008 09:53:13 AM


>and another that only could make queries and could not see nor
>modify the functions.
Up until 8.1 it is possible to see all the functions (plpgsql code) in 
pgAdmin, even if you don't have access rights to the schema.

I'm not sure how this is in 8.2 or 8.3?

WBL

On Jan 22, 2008 7:02 AM, Sim Zacks <[EMAIL PROTECTED] 
> wrote:


That's not very open-source of you. LOL.

Aside from database permission issues, which are not a problem,
you would also
have to not give them superuser access to their own database
server (something
that a lot of companies will not be happy with). Especially if
they want to the
possibility of other databases on the server. You would also have
to prevent
them from holding onto the backups, something that most companies
would not
tolerate at all.

sim



Guillermo Arias wrote:
> Hi, i have a question:
>
> I have a database and i want to create a administrator user with
total
> control and another that only could make queries and could not
see nor
> modify the functions.
>
> The reason is that in the database that i will distribute are the
> functions with a big part of the software logics and i do not want
> these to be seen by my customers.
>
> Thanks in advance, Guillermo

---(end of
broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED]
 so that your
  message can get through to the mailing list cleanly




Re: [GENERAL] Stored procedures when and how: was: Sun acquires MySQL

2008-01-22 Thread Pavel Stehule
>
> Yep, the more I read, the more I get confused.
> Java loading overhead is a common myth (I can't say if true or false),
> and what Tom writes above can find a tentative place in my mind.
> But still then I can't understand where plsql should or shouldn't be
> used.
>
> I really would enjoy to see some general guideline on how to chose.
>

1. use procedure lot of SQL statements --> use plpgsql
2. procedure needs some untrusted functionality -> use untrusted language
3. procedure contains only expressions
3.a) isn't too much important --> use plpgsql don't forgot IMMUTABLE flag
3.b) is important and is bottleneck --> try perl
3.c) is most important or is wide used --> use C
3.d) is simply implemented in C (some time, string fce) --> use C

learn some trick:

create or replace function list(int)
returns varchar as $$
declare s varchar = '';
begin
 for i in 1..$1 loop
   s := s || '' || i || '';
 end loop;
 return s;
end; $$ language plpgsql;

postgres=# select list(10);

list
---
 
12345678910
(1 row)

Time: 0,927 ms -- well

number, time
100, 5ms
1000, 75ms   ...  usable
1, 4s ... slow

so if I use fce list with param < 1000 I can use plpgsql without any
problems. With bigger value I have problem. But I forgot IMMUTABLE,
ook try again:

100, 4ms
1000, 70ms
1, 3.8s   ok IMMUTABLE doesn't help here

what is bottleneck? FOR?

create or replace function list(int)
returns varchar as $$
declare s varchar = '';
 begin
   for i in 1..$1 loop
 perform  '' || i || '';
   end loop;
   return s;
end; $$ language plpgsql immutable;

1, 443 ms ..

bottleneck is in repeated assign s := s || ..

I will try trick:

create or replace function list(int)
returns varchar as $$
 begin
   return array_to_string(array(select '' || i || ''
from generate_series(1, $1) g(i)), '');
 end$$ language plpgsql immutable;

test
100, 1.3ms
1000, 7.64ms
1, 63ms -- nice I don't need C
10, 350ms
Regards

Pavel Stehule


> thanks
>
> --
> Ivan Sergio Borgonovo
> http://www.webthatworks.it
>
>
> ---(end of broadcast)---
> TIP 4: Have you searched our list archives?
>
>http://archives.postgresql.org/
>

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[GENERAL] Selecting max(pk) is slow on empty set

2008-01-22 Thread Alexander Staubo
This is on a fresh pg_restore copy that I have additionally vacuumed
and analyzed. These queries, on a table containing 2.8 million rows,
are very fast:

# select count(*) from user_messages where user_id = 13604;
 count
---
 0
(1 row)
Time: 0.604 ms

# select * from user_messages where user_id = 13604;
 id | user_id | sender_id | sent_at | dismissed_at | message
+-+---+-+--+-
(0 rows)
Time: 0.678 ms

But doing a max() on this empty set takes a long time to run:

# explain analyze select max(id) from user_messages where user_id = 13604;

 QUERY PLAN

--
Result  (cost=633.19..633.20 rows=1 width=0) (actual
time=339160.704..339160.704 rows=1 loops=1)
   InitPlan
 ->  Limit  (cost=0.00..633.19 rows=1 width=4) (actual
time=339160.700..339160.700 rows=0 loops=1)
   ->  Index Scan Backward using user_messages_pkey on
user_messages  (cost=0.00..633188.12 rows=1000 width=4) (actual
time=339160.697..339160 Filter: ((id IS NOT NULL) AND
(user_id = 13604))
 Total runtime: 339160.770 ms
(6 rows)

Note that it's using the correct index -- user_messages_pkey is on the
id attribute. (Why rows=1000 here?)

PostgreSQL 8.2.5 on Linux and OS X Leopard.

Alexander.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[GENERAL] Errors with run_build.pl - 8.3RC2

2008-01-22 Thread cinu
Hi All, 

I was running the run_Build.pl script that is specific
to Buildfarm and encountered errors. I am listing out
the names of the logfiles and the errors that I have
seen.
Can anyone give me some clarity on these errors?
Even though these errors are existing, at the end the
latest version is getting  downloaded and when I do a
regression testing it goes through. Can anyone give me
clarity on why these errors are occuring?

The logfiles with the errors are listed below, these
errors are for the version 8.3RC2:

check.log
==
pgsql.3235/src/test/regress/log/postmaster.log
===
LOG:  database system was shut down at 2008-01-19
17:13:48 EST
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started
ERROR:  value too long for type character varying(1)
STATEMENT:  INSERT INTO VARCHAR_TBL (f1) VALUES
('cd');
ERROR:  value too long for type character(1)
STATEMENT:  INSERT INTO CHAR_TBL (f1) VALUES ('cd');
ERROR:  invalid input syntax for type boolean: "  tru
e "
STATEMENT:  SELECT '  tru e '::text::boolean AS
invalid;
ERROR:  invalid input syntax for type boolean: ""


config.log
conftest.c: In function `main':
conftest.c:62: error: `not' undeclared (first use in
this function)
conftest.c:62: error: (Each undeclared identifier is
reported only once
conftest.c:62: error: for each function it appears
in.)
conftest.c:62: error: parse error before "big"


contrib-install-check.log
=
/home/CINU/new_build/HEAD/inst/logfile
===
LOG:  database system was shut down at 2008-01-19
17:16:39 EST
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started
NOTICE:  database "contrib_regression" does not exist,
skipping
NOTICE:  type "gbtreekey4" is not yet defined
DETAIL:  Creating a shell type definition.
NOTICE:  argument type gbtreekey4 is only a shell
NOTICE:  type "gbtreekey8" is not yet defined
DETAIL:  Creating a shell type definition.
NOTICE:  argument type gbtreekey8 is only a shell
NOTICE:  type "gbtreekey16" is not yet defined
DETAIL:  Creating a shell type definition.
NOTICE:  argument type gbtreekey16 is only a shell
NOTICE:  type "gbtreekey32" is not yet defined
DETAIL:  Creating a shell type definition.
NOTICE:  argument type gbtreekey32 is only a shell
NOTICE:  type "gbtreekey_var" is not yet defined
DETAIL:  Creating a shell type definition.
NOTICE:  argument type gbtreekey_var is only a shell
NOTICE:  type "cube" is not yet defined
DETAIL:  Creating a shell type definition.
NOTICE:  return type cube is only a shell
NOTICE:  return type cube is only a shell
NOTICE:  argument type cube is only a shell
ERROR:  bad cube representation
DETAIL:  syntax error at end of input
STATEMENT:  SELECT ''::cube AS cube;
ERROR:  bad cube representation
DETAIL:  syntax error at or near "A"




ecpg-check.log
==
pgsql.3235/src/interfaces/ecpg/test/log/postmaster.log
===
LOG:  database system was shut down at 2008-01-19
17:18:15 EST
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started
NOTICE:  CREATE TABLE / PRIMARY KEY will create
implicit index "test_pkey" for table "test"
ERROR:  duplicate key value violates unique constraint
"test_pkey"
STATEMENT:  insert into test ( i  , j  ) values ( 7 ,
12 )
ERROR:  more than one row returned by a subquery used
as an expression
STATEMENT:  select  i  from test where j = ( select  j
 from test)
LOG:  unexpected EOF on client connection
ERROR:  relation "nonexistant" does not exist
STATEMENT:  select  *  from nonexistant



initdb.log
WARNING: enabling "trust" authentication for local
connections
You can change this by editing pg_hba.conf or using
the -A option the
next time you run initdb.



install-check.log
==
/home/CINU/new_build/HEAD/inst/logfile
==
LOG:  database system was shut down at 2008-01-19
17:15:24 EST
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started
NOTICE:  database "regression" does not exist,
skipping
ERROR:  invalid input syntax for type boolean: "  tru
e "
STATEMENT:  SELECT '  tru e '::text::boolean AS
invalid;
ERROR:  invalid input syntax for type boolean: ""
STATEMENT:  SELECT ''::text::boolean AS invalid;



pl-install-check.log
=
/home/CINU/new_build/HEAD/inst/logfile
===
LOG:  database system was shut down at 2008-01-19
17:16:27 EST
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started
NOTICE:  database "pl_regression" does not exist,
skipping
ERROR:  set-valued function called in context that
cannot accept a set
STATEMENT:  SELECT perl_set_int(5);
ERROR:  set-valued function called in context that
cannot accept a set
STATEMENT:  SELECT perl_set_int(5);


Any suggestions regarding the same is appreciated.

Thanks in Advance
Regards
Cinu Kuriakose.


  Get the freedom to save as many mails as you wish. To know how, go to 
http://help.yahoo.c

Re: [GENERAL] Selecting max(pk) is slow on empty set

2008-01-22 Thread Richard Huxton

Alexander Staubo wrote:

# explain analyze select max(id) from user_messages where user_id = 13604;

 QUERY PLAN

--
Result  (cost=633.19..633.20 rows=1 width=0) (actual
time=339160.704..339160.704 rows=1 loops=1)
   InitPlan
 ->  Limit  (cost=0.00..633.19 rows=1 width=4) (actual
time=339160.700..339160.700 rows=0 loops=1)
   ->  Index Scan Backward using user_messages_pkey on
user_messages  (cost=0.00..633188.12 rows=1000 width=4) (actual
time=339160.697..339160 Filter: ((id IS NOT NULL) AND
(user_id = 13604))
 Total runtime: 339160.770 ms
(6 rows)

Note that it's using the correct index -- user_messages_pkey is on the
id attribute. (Why rows=1000 here?)


1000 looks suspiciously like a default estimate if the planner knows no 
better. Odd since you say that you've just analysed.


Do you have an index on user_id? Presumably that's what's being used in 
the case of SELECT * or count(*).


What cost does the count(*) come up with?

Can you trick it with a sub-query (to see the explain)?
SELECT max(id) FROM (SELECT id FROM user_messages WHERE user_id = 13604) 
AS foo;


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] Selecting max(pk) is slow on empty set

2008-01-22 Thread Alexander Staubo
On 1/22/08, Richard Huxton <[EMAIL PROTECTED]> wrote:
> Alexander Staubo wrote:
> > # explain analyze select max(id) from user_messages where user_id = 13604;
> >
> >  QUERY PLAN
> > 
> > --
> > Result  (cost=633.19..633.20 rows=1 width=0) (actual
> > time=339160.704..339160.704 rows=1 loops=1)
> >InitPlan
> >  ->  Limit  (cost=0.00..633.19 rows=1 width=4) (actual
> > time=339160.700..339160.700 rows=0 loops=1)
> >->  Index Scan Backward using user_messages_pkey on
> > user_messages  (cost=0.00..633188.12 rows=1000 width=4) (actual
> > time=339160.697..339160 Filter: ((id IS NOT NULL) AND
> > (user_id = 13604))
> >  Total runtime: 339160.770 ms
> > (6 rows)
> >
> > Note that it's using the correct index -- user_messages_pkey is on the
> > id attribute. (Why rows=1000 here?)
>
> 1000 looks suspiciously like a default estimate if the planner knows no
> better. Odd since you say that you've just analysed.
>
> Do you have an index on user_id? Presumably that's what's being used in
> the case of SELECT * or count(*).

Yes, I do. However, for some reason it's not being used here. The
index is clustered -- but I haven't run "cluster" on it recently. Does
that matter?

> What cost does the count(*) come up with?

# explain analyze select count(*) from user_messages where user_id = 13604;

QUERY PLAN

 Aggregate  (cost=3646.04..3646.05 rows=1 width=0) (actual
time=39.448..39.448 rows=1 loops=1)
   ->  Index Scan using user_messages_user on user_messages
(cost=0.00..3643.53 rows=1000 width=0) (actual time=39.410..39.410
rows=0 loops=1)
 Index Cond: (user_id = 13604)
 Total runtime: 39.648 ms
(4 rows)

So here it's using the right index.

> Can you trick it with a sub-query (to see the explain)?
> SELECT max(id) FROM (SELECT id FROM user_messages WHERE user_id = 13604)
> AS foo;

No, I tried that as well; PostgreSQL is clever enough to optimize it
into exactly the same query as the original.

Alexander.

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Online Oracle to Postgresql data migration

2008-01-22 Thread Josh Harrison
On Jan 18, 2008 4:14 AM, Dorren <[EMAIL PROTECTED]> wrote:

> Terabytes of data: this is a lot of Oracle data to migrate. You would
> need a high performance tools capable to handle heterogeneous
> environment
> People suggested links here, so I will add some that could be very
> appropriate to your case:
>
> PostgreSQL loader is limited by the way. For instance, if you have a
> end of the line character within your data then load into PostgreSQL
> will fail.
> Check this pdf:
> http://www.wisdomforce.com/dweb/resources/docs/OracleToNetezzaWithFastReader.pdf
>
> Few tools to consider:
>
> FastReader:  http://www.wisdomforce.com/dweb/index.php?id=23 -
> extracts data from Oracle into ASCII flat files or pipe and create a
> input for PostgreSQL loader. Many people use it for fast initial
> synchronization. Fastreader performs bulk data extract when terabytes
> of data can be migrated in hours
>
> Database Sync - http://www.wisdomforce.com/dweb/index.php?id=1001  -
> also fast data transfer tool that operates as a change data capture.
> It captures all the latest transactions and could be used for data
> warehouse incremental feeds with OLTP Oracle data. You may need it if
> don't want each time to move terabytes of data but only the changed
> data


Thanks. Ill checkout those options. I also have another question in mind.
How good(or fast) will it be to use java with jdbc to transfer these
terabytes of data from oracle to postgresql? This worked okay for small
datasets but Im not sure how it will behave for large data.
 And also keep track of the changes in the Oracle production system using
triggers?

Thanks
josh


Re: [GENERAL] Selecting max(pk) is slow on empty set

2008-01-22 Thread Alexander Staubo
On 1/22/08, Richard Huxton <[EMAIL PROTECTED]> wrote:
> Hmm, but with an estimated cost of 3646 (vs.633 for the max(*) which
> uses the wrong index). That explains why it's walking backwards through
> the pkey index, it thinks that it's 8 times cheaper.
[...]
> Have a look at most_common_vals,most_common_freqs in pg_stats for
> tbl=user_messages, att=user perhaps.

# select histogram_bounds from pg_stats where
tablename='user_messages' and attname='user_id';
   histogram_bounds
---
 {1,489,1097,1824,2555,3452,4488,5679,6879,8637,13448}

# select null_frac, n_distinct, most_common_vals, most_common_freqs
from pg_stats where tablename='user_messages' and attname='user_id';
 null_frac | n_distinct | most_common_vals
|   most_common_freqs
---++--+---
 0 |   2652 |
{5826,484,1206,780,823,4085,4157,5852,1962,6453} |
{0.0093,0.0077,0.0067,0.0063,0.006,0.0057,0.0057,0.0053,0.005,0.0047}

> Then see if an ALTER TABLE SET
> STATISTICS 100 makes a difference.

So it does:

# explain analyze select max(id) from user_messages where user_id = 13604;

QUERY PLAN
---
 Aggregate  (cost=1320.52..1320.53 rows=1 width=4) (actual
time=13.640..13.640 rows=1 loops=1)
   ->  Index Scan using user_messages_user on user_messages
(cost=0.00..1319.62 rows=358 width=4) (actual time=13.631..13.631
rows=0 loops=1)
 Index Cond: (user_id = 13604)
 Total runtime: 13.712 ms

Thank you! That solves my performance problem, at least.

But it's worrying that PostgreSQL should be so off in planning the
query. Does this behaviour qualify as a bug, or is this -- that is,
the need to tweak statistics parameters -- just your garden-variety
application-specific optimization?

Alexander.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] (un)grouping question

2008-01-22 Thread Rhys Stewart
Great, this does the trick thanks!!

um... somevalue+random() is a simplified version of what I really wanted to
do, i just wante the general idea of what the  query would look like.




2008/1/21, Andrei Kovalevski <[EMAIL PROTECTED]>:
>
> May be this is what you need:
>
> select
>   test.uid, coalesce(t.somevalue + a.max + t.uid, test.somevalue)
> from
>   test
> left outer join
>   (select
>   *
> from
>   test
> where
>   (uid, somevalue) not in
>   (select min(uid), somevalue from test group by somevalue)
>   ) t on (test.uid = t.uid),
>   (select max(somevalue) from test) a
>
> Rhys Stewart wrote:
> > ok, let me clarify, dont want to remove them just want them changed
> > but need to keep the uid. However, I would like just one somevalue to
> > remain the same. so for example, uids, 2,4 and 8 have somevalue 44,
> > after i would like 2 to remain 44 but uids 4 and 8 would be changed.
> > 2008/1/21, Jeff Davis <[EMAIL PROTECTED] >:
> >
> > On Mon, 2008-01-21 at 12:36 -0500, Rhys Stewart wrote:
> > > Hi list,
> > >
> > > have the following table
> > >
> > > uid|somevalue
> > > 
> > > 1|11
> > > 2|44
> > > 3|31
> > > 4|44
> > > 5|71
> > > 6|33
> > > 7|33
> > > 8|44
> > > 9|14
> > >
> > > would like to remove the duplicate values in the column somevalue.
> > > doing this by just adding a random number  is perfectly fine,
> > however
> > > i want to retain at least one of the original values of
> > somevalue. Any
> > > ideas how to do this in in a query?
> >
> > Would something like this help?
> >
> > SELECT MIN(uid), somevalue FROM mytable GROUP BY somevalue;
> >
> > Also consider just doing:
> >
> > SELECT DISTINCT somevalue FROM mytable;
> >
> > ...if you don't need uid in the result set.
> >
> > Regards,
> > Jeff Davis
> >
> >
>
>
> --
> Andrei Kovalevski
> PostgreSQL Replication, Consulting, Custom Development, 24x7 support
> Managed Services, Shared and Dedicated Hosting
> Co-Authors: plPHP, ODBCng - http://www.commandprompt.com/
>
>


Re: [GENERAL] Selecting max(pk) is slow on empty set

2008-01-22 Thread Richard Huxton

Alexander Staubo wrote:

On 1/22/08, Richard Huxton <[EMAIL PROTECTED]> wrote:

Alexander Staubo wrote:

# explain analyze select max(id) from user_messages where user_id = 13604;

 QUERY PLAN

--
Result  (cost=633.19..633.20 rows=1 width=0) (actual
time=339160.704..339160.704 rows=1 loops=1)



Do you have an index on user_id? Presumably that's what's being used in
the case of SELECT * or count(*).


Yes, I do. However, for some reason it's not being used here. The
index is clustered -- but I haven't run "cluster" on it recently. Does
that matter?


The index is still an index...


What cost does the count(*) come up with?


# explain analyze select count(*) from user_messages where user_id = 13604;

QUERY PLAN

 Aggregate  (cost=3646.04..3646.05 rows=1 width=0) (actual
time=39.448..39.448 rows=1 loops=1)
   ->  Index Scan using user_messages_user on user_messages
(cost=0.00..3643.53 rows=1000 width=0) (actual time=39.410..39.410
rows=0 loops=1)
 Index Cond: (user_id = 13604)
 Total runtime: 39.648 ms
(4 rows)

So here it's using the right index.


Hmm, but with an estimated cost of 3646 (vs.633 for the max(*) which 
uses the wrong index). That explains why it's walking backwards through 
the pkey index, it thinks that it's 8 times cheaper.


It looks like it thinks that because the estimated cost scanning the 
whole index backwards is 633188 for 1000 rows and you only want one row 
so that's 1/1000 of that cost.


But why 1000 rows? Actually, it thinks 1000 rows above too. Could it be 
inadequate stats on the users column? If the users it gathered stats on 
all have > 1000 rows then it might use the default.


Have a look at most_common_vals,most_common_freqs in pg_stats for 
tbl=user_messages, att=user perhaps. Then see if an ALTER TABLE SET 
STATISTICS 100 makes a difference.



Can you trick it with a sub-query (to see the explain)?
SELECT max(id) FROM (SELECT id FROM user_messages WHERE user_id = 13604)
AS foo;


No, I tried that as well; PostgreSQL is clever enough to optimize it
into exactly the same query as the original.


Damn :-)



--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [GENERAL] Selecting max(pk) is slow on empty set

2008-01-22 Thread Richard Huxton

Alexander Staubo wrote:

On 1/22/08, Richard Huxton <[EMAIL PROTECTED]> wrote:

Then see if an ALTER TABLE SET
STATISTICS 100 makes a difference.


So it does:

# explain analyze select max(id) from user_messages where user_id = 13604;

QUERY PLAN
---
 Aggregate  (cost=1320.52..1320.53 rows=1 width=4) (actual
time=13.640..13.640 rows=1 loops=1)
   ->  Index Scan using user_messages_user on user_messages
(cost=0.00..1319.62 rows=358 width=4) (actual time=13.631..13.631
rows=0 loops=1)
 Index Cond: (user_id = 13604)
 Total runtime: 13.712 ms

Thank you! That solves my performance problem, at least.


Although the row-estimate still seems quite high. You might want to 
increase it even further (maximum is 1000). If this is a common query, 
I'd look at an index on (user,id) rather than just (user) perhaps.



But it's worrying that PostgreSQL should be so off in planning the
query. Does this behaviour qualify as a bug, or is this -- that is,
the need to tweak statistics parameters -- just your garden-variety
application-specific optimization?


Well, it's data-specific rather than application specific I suppose. The 
 issue is that there is a cost to tracking 100 values and you don't 
want to pay that on every column in every table. If user 13604 isn't in 
the list of most-common users then all it can really do is fix an upper 
bound on how many matches it can have. Of course you and I can reason 
outside of the data and guess that manu users won't have more than a 
handful of messages, but that's not something PG can do.


In theory, PG could auto-tune itself for various parameters. The problem 
 then is, do you:
1. Learn constantly, meaning you constantly pay the cost of checking 
your decisions and never get consistent plans.
2. Learn once, in which case a change in data frequencies or usage 
patterns renders your learning out of date.


You might find http://pgfoundry.org/ useful with the fouine / pqa 
projects to analyse query logs.


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Selecting max(pk) is slow on empty set

2008-01-22 Thread Alexander Staubo
On 1/22/08, Richard Huxton <[EMAIL PROTECTED]> wrote:
> Although the row-estimate still seems quite high. You might want to
> increase it even further (maximum is 1000). If this is a common query,
> I'd look at an index on (user,id) rather than just (user) perhaps.

Actually that index (with the same statistics setting as before)
yields slightly worse performance:

# explain analyze select max(id) from user_messages where user_id = 13604;

  QUERY PLAN

 Result  (cost=3.86..3.87 rows=1 width=0) (actual time=0.051..0.052
rows=1 loops=1)
   InitPlan
 ->  Limit  (cost=0.00..3.86 rows=1 width=4) (actual
time=0.045..0.045 rows=0 loops=1)
   ->  Index Scan Backward using user_messages_user_id_id on
user_messages  (cost=0.00..1486.79 rows=385 width=4) (actual
time=0.042..0.042 rows=0 loops=1)
 Index Cond: (user_id = 13604)
 Filter: (id IS NOT NULL)
 Total runtime: 0.128 ms

Compare with the plain index on the one attribute:

# explain analyze select max(id) from user_messages where user_id = 13604;
 QUERY
PLAN
-
 Aggregate  (cost=1388.34..1388.35 rows=1 width=4) (actual
time=0.034..0.035 rows=1 loops=1)
   ->  Index Scan using user_messages_user on user_messages
(cost=0.00..1387.40 rows=374 width=4) (actual time=0.030..0.030 rows=0
loops=1)
 Index Cond: (user_id = 13604)
 Total runtime: 0.085 ms

> > But it's worrying that PostgreSQL should be so off in planning the
> > query. Does this behaviour qualify as a bug, or is this -- that is,
> > the need to tweak statistics parameters -- just your garden-variety
> > application-specific optimization?
>
> Well, it's data-specific rather than application specific I suppose. The
>   issue is that there is a cost to tracking 100 values and you don't
> want to pay that on every column in every table. If user 13604 isn't in
> the list of most-common users then all it can really do is fix an upper
> bound on how many matches it can have. Of course you and I can reason
> outside of the data and guess that manu users won't have more than a
> handful of messages, but that's not something PG can do.

Absolutely. Thanks for the pointers.

Alexander.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] Selecting max(pk) is slow on empty set

2008-01-22 Thread Richard Huxton

Alexander Staubo wrote:

On 1/22/08, Richard Huxton <[EMAIL PROTECTED]> wrote:

Although the row-estimate still seems quite high. You might want to
increase it even further (maximum is 1000). If this is a common query,
I'd look at an index on (user,id) rather than just (user) perhaps.


Actually that index (with the same statistics setting as before)
yields slightly worse performance:

# explain analyze select max(id) from user_messages where user_id = 13604;
 Total runtime: 0.128 ms

Compare with the plain index on the one attribute:

# explain analyze select max(id) from user_messages where user_id = 13604;
 Total runtime: 0.085 ms


Ah, but:
1. Those times are so small, I'm not sure you can reliably separate 
them. Certainly not from one run.
2. For a range of different user-ids I'd expect user_id_id index to 
maintain a near-constant time regardless of the number of messages for 
that user.
3. You might be able to reduce your statistics on the user column and 
still keep the fast plan.


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


Re: [GENERAL] [HACKERS] Errors with run_build.pl - 8.3RC2

2008-01-22 Thread Andrew Dunstan




cinu wrote:
Hi All, 


I was running the run_Build.pl script that is specific
to Buildfarm and encountered errors. I am listing out
the names of the logfiles and the errors that I have
seen.
Can anyone give me some clarity on these errors?
Even though these errors are existing, at the end the
latest version is getting  downloaded and when I do a
regression testing it goes through. Can anyone give me
clarity on why these errors are occuring?

The logfiles with the errors are listed below, these
errors are for the version 8.3RC2:


  

[snip]

Any suggestions regarding the same is appreciated.


  


Errors in the logfiles are expected. Many of these errors are there by 
design - the regression tests include many tests for error conditions. 
For example, look at the corresponding logs for a green run here: 
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=dungbeetle&dt=2008-01-21%2012:44:01


If the buildfarm script gets through to the end without complaining, 
then it succeeded.


Please note that in general questions regarding use of the buildfarm 
client belong on its mailing list, not on the pgsql lists. For more 
information about the list see here: 
http://pgfoundry.org/mailman/listinfo/pgbuildfarm-members.


cheers

andrew

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] Selecting max(pk) is slow on empty set

2008-01-22 Thread Alexander Staubo
On 1/22/08, Richard Huxton <[EMAIL PROTECTED]> wrote:
> Alexander Staubo wrote:
> > On 1/22/08, Richard Huxton <[EMAIL PROTECTED]> wrote:
> >> Although the row-estimate still seems quite high. You might want to
> >> increase it even further (maximum is 1000). If this is a common query,
> >> I'd look at an index on (user,id) rather than just (user) perhaps.
> >
> > Actually that index (with the same statistics setting as before)
> > yields slightly worse performance:
> >
> > # explain analyze select max(id) from user_messages where user_id = 13604;
> >  Total runtime: 0.128 ms
> >
> > Compare with the plain index on the one attribute:
> >
> > # explain analyze select max(id) from user_messages where user_id = 13604;
> >  Total runtime: 0.085 ms
>
> Ah, but:
> 1. Those times are so small, I'm not sure you can reliably separate
> them. Certainly not from one run.
> 2. For a range of different user-ids I'd expect user_id_id index to
> maintain a near-constant time regardless of the number of messages for
> that user.
> 3. You might be able to reduce your statistics on the user column and
> still keep the fast plan.

Actually, I wasn't looking at the time, but at the cost and estimated
number of rows, which are both lower for the original index, and the
complexity of the plan, which looks (at least to me) simpler than the
backwards scan.

But you're right. With the combined index I can set the granularity
back to 1000, and empty queries as well as non-empty queries perform
well. The row estimate is still way off, though.

What are the drawbacks of making the statistics buckets finer-grained?

Alexander.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Selecting max(pk) is slow on empty set

2008-01-22 Thread Pavel Stehule
>
> But you're right. With the combined index I can set the granularity
> back to 1000, and empty queries as well as non-empty queries perform
> well. The row estimate is still way off, though.¨

Bigger value --> slow analyze. Real maximum is about 200-300. So be carefully.

Regards

Pavel

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Stored procedures when and how: was: Sun acquires MySQL

2008-01-22 Thread brian

Pavel Stehule wrote:



> ...


bottleneck is in repeated assign s := s || ..

I will try trick:

create or replace function list(int)
returns varchar as $$
 begin
   return array_to_string(array(select '' || i || ''
from generate_series(1, $1) g(i)), '');
 end$$ language plpgsql immutable;

test
100, 1.3ms
1000, 7.64ms
1, 63ms -- nice I don't need C
10, 350ms
Regards

Pavel Stehule



That's some trick! Thanks for the lessons, Pavel.

b

---(end of broadcast)---
TIP 6: explain analyze is your friend


[GENERAL] altering foreign keys

2008-01-22 Thread Adam Rich
In my database, I have a core table that nearly all other tables
key against.  Now I need to adjust all of those foreign keys to
add a "on update cascade" action.  Is there a way to alter the 
existing keys? (it didn't jump out at me in the manual)

If not, is there a serious issue preventing this feature?

If I have to drop and re-create all of the foreign keys, is it
possible to wrap the whole operation in a transaction without
risking invalid inserts in the referring tables? (I come from
an Oracle background, where DDL causes an implicit commit)





---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[GENERAL] PostgreSQL professionals group at LinkedIn.com

2008-01-22 Thread Gevik Babakhani
Dear all,

I have created a group for PostgreSQL professionals at LinkedIn.com
Feel free to join if you like.

http://www.linkedin.com/e/gis/51776/760A11717C03

Regards,
Gevik Babakhani

PostgreSQL NL   http://www.postgresql.nl
TrueSoftware BV http://www.truesoftware.nl



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] altering foreign keys

2008-01-22 Thread Erik Jones


On Jan 22, 2008, at 1:11 PM, Adam Rich wrote:


In my database, I have a core table that nearly all other tables
key against.  Now I need to adjust all of those foreign keys to
add a "on update cascade" action.  Is there a way to alter the
existing keys? (it didn't jump out at me in the manual)

If not, is there a serious issue preventing this feature?

If I have to drop and re-create all of the foreign keys, is it
possible to wrap the whole operation in a transaction without
risking invalid inserts in the referring tables? (I come from
an Oracle background, where DDL causes an implicit commit)


DDL in Postgres is transactional so your second option will work just  
fine.


Erik Jones

DBA | Emma®
[EMAIL PROTECTED]
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] altering foreign keys

2008-01-22 Thread Richard Broersma Jr
--- On Tue, 1/22/08, Adam Rich <[EMAIL PROTECTED]> wrote:

> Is there a way to alter the 
> existing keys? (it didn't jump out at me in the manual)

ALTER TABLE your_table
DROP CONSTRAINT your_column_fkey_constraint,
 ADD CONSTRAINT your_column_fkey_constraint 
FOREIGN KEY your_column
 REFERENCES parent_table( primary_column )
  ON UPDATE CASCADE;

Regards,
Richard Broersma Jr.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] altering foreign keys

2008-01-22 Thread Adam Rich
> In my database, I have a core table that nearly all other tables
> key against.  Now I need to adjust all of those foreign keys to
> add a "on update cascade" action.  Is there a way to alter the
> existing keys? (it didn't jump out at me in the manual)
> 

Would it be possible to modify confupdtype in pg_constraint ?



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


[GENERAL] CSV options in COPY

2008-01-22 Thread hjenkins
Hello, all,

On the subject of the COPY command
(http://www.postgresql.org/docs/current/interactive/sql-copy.html), is it
the case that the HEADER, QUOTE, escape, FORCE QUOTE, and FORCE NOT NULL
options can only be used in CSV mode? If so, why? A tab-delimited table
with a header line and quoted strings seems a perfectly reasonable format.
If not, how?

Regards,
Hazel Jenkins



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] CSV options in COPY

2008-01-22 Thread Tom Lane
"hjenkins" <[EMAIL PROTECTED]> writes:
> On the subject of the COPY command
> (http://www.postgresql.org/docs/current/interactive/sql-copy.html), is it
> the case that the HEADER, QUOTE, escape, FORCE QUOTE, and FORCE NOT NULL
> options can only be used in CSV mode? If so, why? A tab-delimited table
> with a header line and quoted strings seems a perfectly reasonable format.
> If not, how?

So set the CSV delimiter to TAB, if that's what you want.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


[GENERAL] Installing PGSQL

2008-01-22 Thread Jamiil Abduqadir
I am trying to install PosgreSQL using postgresql-8.3-dev1 on my WindowsXP
machine, but I get a message that reads

Fail to create a temporary directory

Does anyone know why I am getting this error message?

Thank in advance
-- 
Happiness has many doors, and when one of them closes another opens, yet we
spent so much time looking at the one that is shut that we don't  see the
one that just  opened.


[GENERAL] Postgresql + digital signature

2008-01-22 Thread Luis Alberto Pérez Paz
I'm working in a project which is using postgres (great database!, I love
it)

We're in a stage where I need to implement a mechanism to prevent the data
modification.

I'm thinking on 'Digital Signatures' (maybe RSA) in each row. If there's a
modification, the signature doesn't verify.


However before start I need your help to know:

Is there in postgres something functionality like this?
Does any know if there's something similar another database system?


thanks in advance!



Luis Alberto Pérez Paz