date:20070815

[GENERAL] memory optimization

2007-08-15 Thread Sabin Coanda

Hi there,

I have a procedure which uses temporary objects (table and sequence). I 
tried to optimize it, using common variables (array and long varchar) 
instead. I didn't found any difference in performance, but I'd like to 
choose the best option from other points of view. One of them is the memory.

So, what is better from the postgres memory point of view: to use temporary 
objects, or to use common variables ?

Can you suggest me other point of views to be taken into consideration in my 
case ?

TIA,
Sabin 



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] Transactional DDL

2007-08-15 Thread Greg Williamson


Harpreet Dhaliwal wrote:
And this feature i.e. transactional DDL is not there in other major 
RDBMS like sql server, oracle etc?
 
thanks

~Harpreet


<...snipped earlier postings...>

It surprised me when I saw Oracle's behavior. Informix supports DDL 
within transactions quite happily:


create table foo22 (id int,myvalue varchar);
create table foo23 (id int);
begin;
alter table foo22 drop (myvalue);
alter table foo23 add (mynewvalue varchar);
rollback;

INFO - foo22:   Columns  Indexes  Privileges  References  Status  ...
Display column names and data types for a table.
--- [EMAIL PROTECTED] -- Press CTRL-W for Help 


Column name  TypeNulls
id   integer yes
myvalue  varchar(1,1)yes

AND
INFO - foo23:   Columns  Indexes  Privileges  References  Status  ...
Display column names and data types for a table.
--- [EMAIL PROTECTED] -- Press CTRL-W for Help 


Column name  TypeNulls
id   integer yes

QED.

It's a strong point in PostgreSQL's favor that it behaves in what I 
regard as a sane manner. That Oracle stuff makes me shudder -- it's unclean.


Greg Williamson
Senior DBA
GlobeXplorer LLC, a DigitalGlobe company

Confidentiality Notice: This e-mail message, including any attachments, 
is for the sole use of the intended recipient(s) and may contain 
confidential and privileged information and must be protected in 
accordance with those provisions. Any unauthorized review, use, 
disclosure or distribution is prohibited. If you are not the intended 
recipient, please contact the sender by reply e-mail and destroy all 
copies of the original message.


(My corporate masters made me say this.)


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Persistent connections in PHP

2007-08-15 Thread Hannes Dorbath


On 14.08.2007 23:13, Dmitry Koterov wrote:

Pconnects are absolutely necessary if we use tsearch2, because it
initializes its dictionaries on a first query in a session. It's a very
heavy process (500 ms and more). So, if we do not use pconnect, we waste
about 500 ms on each DB connection. Too much pain.


We've been using pconnect for exactly the same reason. Though startup 
time for our dictionary is even higher (around 2 seconds). The problem 
is that persistent connections in PHP are not clean implemented, they 
can get randomly garbage collected. The problem seems well known, though 
I'm unaware of any fix. I think it's best to use pgbouncer and plain 
connect ATM. Additionally, as mentioned earlier, using pconnect under 
the Apache webserver is not a good idea at all, at least not with it's 
current architecture.



--
Regards,
Hannes Dorbath

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[GENERAL] check if database is correctly created

2007-08-15 Thread Alain Roger

Hi,

i would like to check (via PHP or C#) if my database has been correctly
created.
for that i use the following SQL :
select * from pg_tables where tablename = 'xxx' AND schemaname = 'yyy';
this i repeat till i check all tables.

But how to check sequences, index, functions, and so on ?

thanks a lot,

-- 
Alain

Windows XP SP2
PostgreSQL 8.2.3
Apache 2.2.4
PHP 5.2.3

Re: [GENERAL] check if database is correctly created

2007-08-15 Thread Karsten Hilbert

On Wed, Aug 15, 2007 at 10:08:36AM +0200, Alain Roger wrote:

> i would like to check (via PHP or C#) if my database has been correctly
> created.
> for that i use the following SQL :
> select * from pg_tables where tablename = 'xxx' AND schemaname = 'yyy';
> this i repeat till i check all tables.
> 
> But how to check sequences, index, functions, and so on ?

Use psql with -E and then issue any variety of \d style
commands to find out what psql does to display indexes,
functions, sequences etc.

However, be aware that checking for the existence of an
appropriately named table/function doesn't give any
guarantuee about what they really *are*.

We do something similar during database upgrade migrations:
we calculate a hash over our tables with columns and column
datatypes (tables only as they hold the real data). Only if
the hash matches an expected value do we migrate (change)
the tables themselves. Views, functions, indexes,
constraints can all be re-run from scratch upon failure
without affecting the data in the tables.

http://cvs.savannah.gnu.org/viewvc/gnumed/gnumed/server/sql/gmSchemaRevisionViews.sql?root=gnumed&view=markup

and now

http://cvs.savannah.gnu.org/viewvc/gnumed/gnumed/server/sql/v5-v6/dynamic/gm-schema.sql?root=gnumed&view=markup

Karsten
-- 
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] Persistent connections in PHP

2007-08-15 Thread Hannes Dorbath


On 15.08.2007 10:53, Torsten Zühlsdorff wrote:
If the dictionary is not too large, you should store it directly in the 
memory of the server. Therefore you can use Shared Memory 
(http://www.php.net/shmop, http://de3.php.net/manual/en/ref.sem.php).


Uhm, but how does TSearch get it from there? And even if it does, it 
will still held its own copy?



--
Regards,
Hannes Dorbath

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Transactional DDL

2007-08-15 Thread Jan de Visser

On Wednesday 15 August 2007 1:58:07 am Harpreet Dhaliwal wrote:
> So you mean to say something like this as far as oracle is concerned:
>
> BEGIN
>   DDL 1 (commits right after its execution)
>   DDL 2 (commits right after its execution)
> END
>
> That means there's no concept of putting DDL statements in a transaction in
> oracle basically, right?

Even worse: In certain transaction isolation levels (READ COMMITTED and 
SERIALIZABLE IIRC) it yells at you and doesn't perform the DDL command.

Which makes it virtually impossible to do dynamic DDL in any serious J2EE 
application.


God, how I hate Oracle.


>
> Thanks,
> ~Harpreet

jan

>
> On 8/15/07, Scott Marlowe <[EMAIL PROTECTED]> wrote:
> > On 8/14/07, Harpreet Dhaliwal <[EMAIL PROTECTED]> wrote:
> > > So you mean to say DDL statements can't be put in one single
> > > transaction
> >
> > in
> >
> > > Oracle ?
> >
> > You can put them in, but then they will cause the previous DMK to be
> > silently committed



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Transactional DDL

2007-08-15 Thread Alexander Staubo

On 8/15/07, Harpreet Dhaliwal <[EMAIL PROTECTED]> wrote:
> And this feature i.e. transactional DDL is not there in other major RDBMS
> like sql server, oracle etc?

The subject of transactional DDL and its prevalence was discussed in a
May thread, "why postgresql over other RDBMS"
(http://archives.postgresql.org/pgsql-general/2007-05/msg01151.php).

It was pointed out that Ingres, Firebird, InterBase and NonStop SQL
also supported transactional DDL, as did a couple of legacy database
systems now sold by Oracle -- no other products support transactional
DDL.

(You keep asking the list about novel features that supposedly make
PostgreSQL unique or superior. Still haven't found a thesis topic?)

Alexander.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Unable to connect to PostgreSQL server via PHP

2007-08-15 Thread John Coulthard

It was SELinux denying apache permission to make TCP connections!

I thought I had SELinux turned off but it wasn't. To be sure it is do
  /usr/sbin/sestatus | grep SELinux
and if it comes back with anything other than SELinux status:  disabled it's 
still running.

While I was talking to the php board I discovered you can configure php to 
run from the command line (mine was by default).  So I wrote this bit of php 
(obviously change the database name and you may need a user and passwd)...

 " ;
   printf("%s", pg_errormessage( $dbconn ) );
   exit(); }
else {echo "connected", "\n";}
?>

saved it as test.php and at a shell prompt ran...

  [john@ tmp]$ php test.php
  connected

If you get 'connected' you know php and postgres are talking via tcp and 
some secondary process (firewall/SELinux) is preventing the apache 
connection.

Thanks for the help.

From: "John Coulthard" <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
CC: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Unable to connect to PostgreSQL server via PHP
Date: Mon, 13 Aug 2007 15:34:19 +

From: Tom Lane <[EMAIL PROTECTED]>
To: "John Coulthard" <[EMAIL PROTECTED]>
CC: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Unable to connect to PostgreSQL server via PHP 
Date: Mon, 13 Aug 2007 10:09:15 -0400

"John Coulthard" <[EMAIL PROTECTED]> writes:
> That's not my problem though this is "could not connect to server:
> Permission denied"  If it's denying permission I must have the 
permissions

> set wrong but where to I start looking for them?

"Permission denied" is a pretty strange error for a TCP connect failure,
as that is not a filesystem operation.

Are you able to connect with psql, or some other non-php client?
Use "psql -h localhost" to make sure it tries a TCP connection not
a Unix-socket connection.

Thanks. You mean like this?  This connects without an error.

[EMAIL PROTECTED] john]# su webuser
[EMAIL PROTECTED] john]$ psql -h localhost lumbribase
Welcome to psql 8.0.8, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
  \h for help with SQL commands
  \? for help with psql commands
  \g or terminate with semicolon to execute query
  \q to quit

lumbribase=>

_
FREE pop-up blocking with the new MSN Toolbar - get it now! 
http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

_
Don't just search. Find. Check out the new MSN Search! 
http://search.msn.click-url.com/go/onm00200636ave/direct/01/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[GENERAL] is this trigger safe and efective? - locking (caching via triiggers)

2007-08-15 Thread Pavel Stehule

Hello,

I am sorry, this mail had to be send only to pgsql-general

nice a day
Pavel Stehule

-- Forwarded message --
From: Pavel Stehule <[EMAIL PROTECTED]>
Date: 15.8.2007 8:01
Subject: is this trigger safe and efective? - locking (caching via triiggers)
To: PostgreSQL Hackers <[EMAIL PROTECTED]>


Hello

I write sample about triggers and i have question. is my solution
correct and exists better solution?

Regards
Pavel Stehule

DROP SCHEMA safecache CASCADE;

CREATE SCHEMA safecache;

CREATE TABLE safecache.source_tbl(category int, int_value int);

CREATE TABLE safecache.cache(category int, sum_val int);

CREATE OR REPLACE FUNCTION safecache.source_tbl_trg_fce()
RETURNS trigger AS
$$
BEGIN
  IF TG_OP = 'INSERT' THEN
-- row cannot exists in cache -- complication
-- I would to finish these transaction without conflict
IF NOT EXISTS(SELECT category
 FROM safecache.cache
WHERE category = NEW.category) THEN
  LOCK TABLE safecache.cache IN SHARE ROW EXCLUSIVE MODE;
  -- I have to repeat test
  IF NOT EXISTS(SELECT category
   FROM safecache.cache
  WHERE category = NEW.category) THEN
INSERT INTO safecache.cache
   VALUES(NEW.category, NEW.int_value);
  END IF;
ELSE
  -- simple
  UPDATE safecache.cache
 SET sum_val = sum_val + NEW.int_value
WHERE category = NEW.category;
END IF;
  ELSEIF TG_OP = 'UPDATE' THEN
-- if category is without change simple
IF NEW.category = OLD.category THEN
  UPDATE safecache.cache
 SET sum_val = sum_val + (NEW.int_value - OLD.int_value)
WHERE category = OLD.category;
ELSE
  -- old category has to exists
  UPDATE safecache.cache
 SET sum_val = sum_val - OLD.int_value
WHERE category = OLD.category;
  -- new category is maybe problem
  IF NOT EXISTS(SELECT category
   FROM safecache.cache
  WHERE category = NEW.category) THEN
LOCK TABLE safecache.cache IN SHARE ROW EXCLUSIVE MODE;
-- I have to repeat test
IF NOT EXISTS(SELECT category
 FROM safecache.cache
WHERE category = NEW.category) THEN
  INSERT INTO safecache.cache
 VALUES(NEW.category, NEW.int_value);
END IF;
  ELSE
-- simple, new category exists
UPDATE safecache.cache
   SET sum_val = sum_val + OLD.int_value
  WHERE category = NEW.category;
  END IF;
END IF;
  ELSE -- DELETE
-- value have to exist in cache, simple
UPDATE safecache.cache
   SET sum_val = sum_val - OLD.int_value
  WHERE category = OLD.category;
  END IF;
  RETURN NEW;
END
$$ LANGUAGE plpgsql;

CREATE TRIGGER actualise_cache
   AFTER INSERT OR UPDATE OR DELETE
   ON safecache.source_tbl
   FOR EACH ROW EXECUTE PROCEDURE safecache.source_tbl_trg_fce();

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

[GENERAL] Best practice for: ERROR: invalid byte sequence for encoding "UTF8"

2007-08-15 Thread Ivan Zolotukhin

Hello,

Imagine a web application that process text search queries from
clients. If one types a text search query in a browser it then sends
proper UTF-8 characters and application after all needed processing
(escaping, checks, etc) passes it to database. But if one modifies URL
of the query adding some trash non-UTF-8 characters, database raises
an error: invalid byte sequence for encoding "UTF8".

What is the best practice to process such a broken strings before
passing them to PostgreSQL? Iconv from utf-8 to utf-8 dropping bad
characters?

--
Regards,
 Ivan

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[GENERAL] How to trap exceptions inside PL/Perl functions?

2007-08-15 Thread Dmitry Koterov

Hello.

In PL/PGSQL I could write:

BEGIN
  SELECT * FROM non_existed_table;
EXCEPTION
  WHEN ... THEN ...
END;

How to do it in PL/Perl? I tried the standard for Perl trapping method:

eval {
  spi_exec_query("SELECT * FROM non_existed_table");
};
if ($@) { ... }

but it does not work - it says that "eval is not safe" or something like
that. But I use eval with {}, not with quotes, so - it has to be safe.
So, how to trap errors in this case?

Re: [GENERAL] Best practice for: ERROR: invalid byte sequence for encoding "UTF8"

2007-08-15 Thread Martijn van Oosterhout

On Wed, Aug 15, 2007 at 03:41:30PM +0400, Ivan Zolotukhin wrote:
> Hello,
> 
> Imagine a web application that process text search queries from
> clients. If one types a text search query in a browser it then sends
> proper UTF-8 characters and application after all needed processing
> (escaping, checks, etc) passes it to database. But if one modifies URL
> of the query adding some trash non-UTF-8 characters, database raises
> an error: invalid byte sequence for encoding "UTF8".
> 
> What is the best practice to process such a broken strings before
> passing them to PostgreSQL? Iconv from utf-8 to utf-8 dropping bad
> characters?

Well, the query as given by the user is invalid, so returning an error
message complaining about the invalid byte sequence seems entirely
reasonable.

I don't see any reason to try and be smart. There's no way you can
"fix" the query.

Have a nice day,
-- 
Martijn van Oosterhout   <[EMAIL PROTECTED]>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature

Re: [GENERAL] Best practice for: ERROR: invalid byte sequence for encoding "UTF8"

2007-08-15 Thread Phoenix Kiula

On 15/08/07, Ivan Zolotukhin <[EMAIL PROTECTED]> wrote:
> Hello,
>
> Imagine a web application that process text search queries from
> clients. If one types a text search query in a browser it then sends
> proper UTF-8 characters and application after all needed processing
> (escaping, checks, etc) passes it to database. But if one modifies URL
> of the query adding some trash non-UTF-8 characters, database raises
> an error: invalid byte sequence for encoding "UTF8".
>
> What is the best practice to process such a broken strings before
> passing them to PostgreSQL? Iconv from utf-8 to utf-8 dropping bad
> characters?



Could you do not do this from the application itself, before sending
it to the DB server? mb_convert_encoding() in PHP for instance?

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] Database Select Slow

2007-08-15 Thread carter ck

Hi,

Thanks for the clarification. It helps to resolve the problem. Now, the page 
can be fully loaded within 2 seconds.

Thanks.

From: "Scott Marlowe" <[EMAIL PROTECTED]>
To: "carter ck" <[EMAIL PROTECTED]>
CC: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Database Select Slow
Date: Fri, 10 Aug 2007 10:57:19 -0500

On 8/10/07, carter ck <[EMAIL PROTECTED]> wrote:
> Hi all,
>
> I am facing a performance issue here. Whenever I do a count(*) on a 
table
> that contains about 300K records, it takes few minutes to complete. 
Whereas
> my other application which is counting > 500K records just take less 
than 10

> seconds to complete.
>
> I have indexed all the essential columns and still it does not improve 
the

> speed.

As previously mentioned, indexes won't help with a count(*) with no
where clause.

They might help with a where clause, if it's quite selective, but if
you're grabbing a noticeable percentage of a table, pgsql will rightly
switch to a seq scan.

Here's some examples from my goodly sized stats db here at work:

\timing
explain select * from businessrequestsummary;
 QUERY PLAN
-
 Seq Scan on businessrequestsummary  (cost=0.00..3280188.63
rows=67165363 width=262)
Time: 0.441 ms

gives me an approximate value of 67,165,363 rows.

explain select * from businessrequestsummary where lastflushtime >
now() - interval '1 day';
 QUERY PLAN
-
 Index Scan using businessrequestsummary_lastflushtime_dx on
businessrequestsummary  (cost=0.00..466.65 rows=6661 width=262)
   Index Cond: (lastflushtime > (now() - '1 day'::interval))
says 6661 rows. and takes 0.9 ms and would use the index.

To run the real queries I get much slower times. :)

Now, to run the real count(*) queries:

 select count(*) from businessrequestsummary where lastflushtime >
now() - interval '1 day';
 count

 274192
(1 row)

Time: 546.528 ms

(data in the buffers makes it fast)

select count(*) from businessrequestsummary where lastflushtime >
now() - interval '1 week';
  count
-
 1700050
(1 row)

Time: 26291.155 ms

second run (data now in buffer)

select count(*) from businessrequestsummary where lastflushtime >
now() - interval '1 week';
  count
-
 1699689
(1 row)

Time: 2592.573 ms

Note the number changed, because this db is constantly being updated
in real time with production statistics.

I'm not going to run a select count(*) on that db, because it would
take about 30 minutes to run.  It's got about 67million rows in it.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

_
Find just what you are after with the more precise, more powerful new MSN 
Search. http://search.msn.com.sg/ Try it now.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

[GENERAL] Writing most code in Stored Procedures

2007-08-15 Thread Rohit

I have few queries regarding the use of Stored Procedures, Functions
and Triggers in an RDBMS.

(1) When to use Stored Procedure? Writing an INSERT query in a Stored
Procedure is better or firing it from the application level?

(2) Can a Trigger call a Stored Procedure?

(3) What type of code must reside in application and what type in
RDBMS?

(4) Is it faster to work at application level or at the database level?


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Insert or Replace or \copy (bulkload)

2007-08-15 Thread Gregory Stark

"Tom Lane" <[EMAIL PROTECTED]> writes:

> Ow Mun Heng <[EMAIL PROTECTED]> writes:
>> Ps : Is it this list's norm to have the OP/sender in the "to" list and
>> mailing list on the "CC" list?
>
> Yes.  If you don't like that you can try including a "Reply-To: "
> header in what you send to the list; or perhaps better, I think there's
> a way to tell the list bot to not send you copies of messages you're
> cc'd on.  

Beware that setting Reply-To is actually "incorrect". It will cause people who
think they're sending you private email to unexpectedly send their private
mail to the list.

There's a correct header to set, I think Mail-Followup-To or something like
that. But sadly not all mail readers support it.

> I for one have a reputation of running spam filters that eat pets and small
> children ... so if you want to be sure to get through to me, don't forget to
> cc: the list.

They eat all my emails, but I'm sure that's intentional :)

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] Blobs in Postgresql

2007-08-15 Thread Gregory Stark

"Ron Olson" <[EMAIL PROTECTED]> writes:

> Hi all-
>
> I am evaluating databases for use in a large project that will hold image
> data as blobs. I know, everybody says to just store pointers to files on the
> disk...

Well not everyone. I usually do, but if you're not handling these blobs under
heavy load independent of the database (like web servers) then either approach
works.

> So turning to Postgresql, can I get any recommendations, suggestions and
> tips on blob handling in the database? The image sizes will be pretty
> variable, from a few kilobytes to several hundred megabytes, so I need
> something that will handle the various file sizes, hopefully transparently.

There are basically two options. If you are not handling data that are too
large to copy around in memory, and you don't need to upload and download the
data in chunks (usually these are the same issue) then you can just store your
images in a bytea. Postgres transparently treats *all* large variable-sized
data whether text, bytea, arrays, like a blob. It stores it in a separate
table outside the main table.

If your data can sometimes be so large that you cannot manipulate the whole
thing in memory all at once (Keep in mind that Postgres expects to be able to
handle a few copies of the data at the same time. Conservatively expect 5
simultaneous copies to have to fit in memory.) then you'll have to look into
the large object interface which is a set of functions starting with lo_*

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] Writing most code in Stored Procedures

2007-08-15 Thread Richard Huxton


Rohit wrote:

I have few queries regarding the use of Stored Procedures, Functions
and Triggers in an RDBMS.


These are all easy questions to answer: "it depends".

OK, so you might want some reasons...


(1) When to use Stored Procedure? Writing an INSERT query in a Stored
Procedure is better or firing it from the application level?


Some people like to have all access to the database through stored 
procedures. This can make it simpler to control access to the data and 
seems popular with MSSQL developers.


Personally, I use SQL as my interface - that's its purpose. Actually, 
that's not quite true, I use various wrappers that generate SQL for most 
queries.



(2) Can a Trigger call a Stored Procedure?


It has to, although they're just called functions in PostgreSQL, can be 
written in a range of procedural languages and take place within a 
transaction the same as any other function-call.



(3) What type of code must reside in application and what type in
RDBMS?


Anything that's to do with data integrity I put in the database.
Anything that's to do with process I put in the application (or 
middle-ware layer).
The tricky bit is deciding what to do when you have something that's a 
bit of both.



(4) Is it faster to work at application level or at the database level?


Probably faster in the database, assuming you have only one machine. If 
you have more than one machine then you can have each machine designed 
for its purpose. Of course, faster to run might be at the cost of more 
difficult to write. It might also mean your process code can crash the 
database server.


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Compound Indexes

2007-08-15 Thread Gregory Stark

"Phoenix Kiula" <[EMAIL PROTECTED]> writes:

>> >SELECT * from trades where id = 9
>> >and c_id = 
>> >ORDER by s_id;
>> >
>> >SELECT * from trades where id = 9
>> >and s_id = 0
>> >ORDER by created_on desc ;
>> >
>> >SELECT * from trades where id = 9
>> >and s_id = 0
>> >and t_brief ~* 'more|than|one|word'
>> >ORDER by created_on desc ;

Well I would start with testing:

trades(id, c_id, s_id)
trades(id, s_id, created_on)

However you may (and actually probably will, i expect) find that the third
column is not helping. That's especially true if the result of matching id and
either c_id or s_id is always a small enough set of records that sorting them
is quick (I would expect the point where an extra column in the index would
start to save you anything to be somewhere around 100-1,000, possibly even as
much as 10,000 or more).

Note that in released versions getting an index which is useful for ORDER BY
created_on *DESC* is actually quite difficult. So unless these queries are
returning thousands of records I would suggest ignoring the ORDER BY clauses
and just looking at the WHERE clauses.

If id,s_id and id,c_id are selective enough to return only a few records I
would actually expect you to end up with just

trades(id, s_id)
trades(id, c_id)

You might also be able to build some kind of index to help the ~* clause. If
you do a lot of queries like that and the id,s_id restriction isn't very
selective you might look into tsearch2 which can index that type of query.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] Best practice for: ERROR: invalid byte sequence for encoding "UTF8"

2007-08-15 Thread Ivan Zolotukhin

Hello,

Well, PostgreSQL is correct entirely, I would post this message to the
-hackers list otherwise :) The question was rather about application
processing of user input not about change of database reaction on
broken UTF-8 string. But I am 100% sure one should fix the input in
this case since web site user can see some bad error (even if
application caught this SQL exception for instance) otherwise.

--
Regards,
 Ivan


On 8/15/07, Martijn van Oosterhout <[EMAIL PROTECTED]> wrote:
> On Wed, Aug 15, 2007 at 03:41:30PM +0400, Ivan Zolotukhin wrote:
> > Hello,
> >
> > Imagine a web application that process text search queries from
> > clients. If one types a text search query in a browser it then sends
> > proper UTF-8 characters and application after all needed processing
> > (escaping, checks, etc) passes it to database. But if one modifies URL
> > of the query adding some trash non-UTF-8 characters, database raises
> > an error: invalid byte sequence for encoding "UTF8".
> >
> > What is the best practice to process such a broken strings before
> > passing them to PostgreSQL? Iconv from utf-8 to utf-8 dropping bad
> > characters?
>
> Well, the query as given by the user is invalid, so returning an error
> message complaining about the invalid byte sequence seems entirely
> reasonable.
>
> I don't see any reason to try and be smart. There's no way you can
> "fix" the query.
>
> Have a nice day,
> --
> Martijn van Oosterhout   <[EMAIL PROTECTED]>   http://svana.org/kleptog/
> > From each according to his ability. To each according to his ability to 
> > litigate.
>
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.4.1 (GNU/Linux)
>
> iD8DBQFGwuvTIB7bNG8LQkwRAudJAJ9c8gvUQ25/S54gtJAPdqMOd81pNwCfUeLi
> JoWU92WJKZ1qM3UMRG5Zn0Y=
> =dPLv
> -END PGP SIGNATURE-
>
>

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Transactional DDL

2007-08-15 Thread Martin Gainty

you can use SET TRANSACTION LEVEL READ UNCOMMITTED to acquire the dirty 
reads

From your perspective how *should* the DB handle this?

M
This email message and any files transmitted with it contain confidential
information intended only for the person(s) to whom this email message is
addressed.  If you have received this email message in error, please notify
the sender immediately by telephone or email and destroy the original
message without making a copy.  Thank you.

- Original Message - 
From: "Jan de Visser" <[EMAIL PROTECTED]>

To: 
Cc: "Harpreet Dhaliwal" <[EMAIL PROTECTED]>; "Scott Marlowe" 
<[EMAIL PROTECTED]>

Sent: Wednesday, August 15, 2007 5:15 AM
Subject: Re: [GENERAL] Transactional DDL

On Wednesday 15 August 2007 1:58:07 am Harpreet Dhaliwal wrote:

So you mean to say something like this as far as oracle is concerned:

BEGIN
  DDL 1 (commits right after its execution)
  DDL 2 (commits right after its execution)
END

That means there's no concept of putting DDL statements in a transaction 
in

oracle basically, right?

Even worse: In certain transaction isolation levels (READ COMMITTED and
SERIALIZABLE IIRC) it yells at you and doesn't perform the DDL command.

Which makes it virtually impossible to do dynamic DDL in any serious J2EE
application.

God, how I hate Oracle.

Thanks,
~Harpreet

jan

On 8/15/07, Scott Marlowe <[EMAIL PROTECTED]> wrote:
> On 8/14/07, Harpreet Dhaliwal <[EMAIL PROTECTED]> wrote:
> > So you mean to say DDL statements can't be put in one single
> > transaction
>
> in
>
> > Oracle ?
>
> You can put them in, but then they will cause the previous DMK to be
> silently committed

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [GENERAL] Compound Indexes

2007-08-15 Thread Phoenix Kiula

> you do a lot of queries like that and the id,s_id restriction isn't very
> selective you might look into tsearch2 which can index that type of query.
>


Thanks. Does tsearch2 come installed with 8.2.3? I am not techie
enough to do all the compiling stuff so I'm hoping it does! How can I
check?

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Insert or Replace or \copy (bulkload)

2007-08-15 Thread Scott Marlowe

On 8/15/07, Gregory Stark <[EMAIL PROTECTED]> wrote:
> "Tom Lane" <[EMAIL PROTECTED]> writes:
> > I for one have a reputation of running spam filters that eat pets and small
> > children ... so if you want to be sure to get through to me, don't forget to
> > cc: the list.
>
> They eat all my emails, but I'm sure that's intentional :)

I was wondering why my kids were so quiet...

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] Best practice for: ERROR: invalid byte sequence for encoding "UTF8"

2007-08-15 Thread Ivan Zolotukhin

Hello,

Actually I tried smth like $str = @iconv("UTF-8", "UTF-8//IGNORE",
$str); when preparing string for SQL query and it worked. There's
probably a better way in PHP to achieve this: simply change default
values in php.ini for these parameters:

mbstring.encoding_translation = On
mbstring.substitute_character = none

and broken symbols will be automatically stripped off from the input
and output. But I am interested in general solution and better
practices anyway...

--
Regards,
 Ivan


On 8/15/07, Phoenix Kiula <[EMAIL PROTECTED]> wrote:
> On 15/08/07, Ivan Zolotukhin <[EMAIL PROTECTED]> wrote:
> > Hello,
> >
> > Imagine a web application that process text search queries from
> > clients. If one types a text search query in a browser it then sends
> > proper UTF-8 characters and application after all needed processing
> > (escaping, checks, etc) passes it to database. But if one modifies URL
> > of the query adding some trash non-UTF-8 characters, database raises
> > an error: invalid byte sequence for encoding "UTF8".
> >
> > What is the best practice to process such a broken strings before
> > passing them to PostgreSQL? Iconv from utf-8 to utf-8 dropping bad
> > characters?
>
>
>
> Could you do not do this from the application itself, before sending
> it to the DB server? mb_convert_encoding() in PHP for instance?
>

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[GENERAL] Yet Another COUNT(*)...WHERE...question

2007-08-15 Thread Phoenix Kiula

I'm grappling with a lot of reporting code for our app that relies on
queries such as:

 SELECT COUNT(*) FROM TABLE WHERE (conditions)...

And I still do not find, from the discussions on this thread, any
truly viable solution for this. The one suggestion is to have a
separate counts table, which is fine for total aggregates related to,
say, an ID. E.g., a table with:

trader_id, trade_count

But this is an overall count for the trader (in my example). What if I
need a count of all his trades in the last one week. Then I need a
timestamp condition in there as well. The number of such possibilities
for multiple WHERE conditions is infinite...how should we account for
all these avenues?

Would love to hear experiences of others and what compromises they
have made. From a reporting perspective, waiting for 10 minutes for a
simple count to return seems untenable.

TIA!

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] Yet Another COUNT(*)...WHERE...question

2007-08-15 Thread Gregory Stark

"Phoenix Kiula" <[EMAIL PROTECTED]> writes:

> I'm grappling with a lot of reporting code for our app that relies on
> queries such as:
>
>  SELECT COUNT(*) FROM TABLE WHERE (conditions)...
>...
> The number of such possibilities for multiple WHERE conditions is
> infinite...

Depends on the "conditions" bit. You can't solve all of the infinite
possibilities -- well you can, just run the query above -- but if you want to
do better it's all about understanding your data.
-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Yet Another COUNT(*)...WHERE...question

2007-08-15 Thread Phoenix Kiula

On 15/08/07, Gregory Stark <[EMAIL PROTECTED]> wrote:
> "Phoenix Kiula" <[EMAIL PROTECTED]> writes:
>
> > I'm grappling with a lot of reporting code for our app that relies on
> > queries such as:
> >
> >  SELECT COUNT(*) FROM TABLE WHERE (conditions)...
> >...
> > The number of such possibilities for multiple WHERE conditions is
> > infinite...
>
> Depends on the "conditions" bit. You can't solve all of the infinite
> possibilities -- well you can, just run the query above -- but if you want > 
> to do better it's all about understanding your data.

I am not sure what the advice here is. The WHERE condition comes from
the indices. So if the query was not "COUNT(*)" but just a couple of
columns, the query executes in less than a second. Just that COUNT(*)
becomes horribly slow. And since the file system based query caching
feature of PG is unclear to me (I am just moving from MySQL where the
cache is quite powerful) I don't quite know what to do to speed up
these queries!

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] Yet Another COUNT(*)...WHERE...question

2007-08-15 Thread Scott Marlowe

On 8/15/07, Phoenix Kiula <[EMAIL PROTECTED]> wrote:
> I'm grappling with a lot of reporting code for our app that relies on
> queries such as:
>
>  SELECT COUNT(*) FROM TABLE WHERE (conditions)...
>
> And I still do not find, from the discussions on this thread, any
> truly viable solution for this. The one suggestion is to have a
> separate counts table, which is fine for total aggregates related to,
> say, an ID. E.g., a table with:
>
> trader_id, trade_count
>
> But this is an overall count for the trader (in my example). What if I
> need a count of all his trades in the last one week. Then I need a
> timestamp condition in there as well. The number of such possibilities
> for multiple WHERE conditions is infinite...how should we account for
> all these avenues?
>
> Would love to hear experiences of others and what compromises they
> have made. From a reporting perspective, waiting for 10 minutes for a
> simple count to return seems untenable.

Generally, for these kinds of things it's often best to use
materialized views / rollup tables so that you aren't re-aggregating
the same data over and over.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] Yet Another COUNT(*)...WHERE...question

2007-08-15 Thread Richard Huxton


Phoenix Kiula wrote:


 SELECT COUNT(*) FROM TABLE WHERE (conditions)...


I am not sure what the advice here is. The WHERE condition comes from
the indices. So if the query was not "COUNT(*)" but just a couple of
columns, the query executes in less than a second. Just that COUNT(*)
becomes horribly slow.


The count(*) shouldn't slow things down compared to running the query to 
fetch columns. It should be at least as fast, or faster if the columns 
you fetch are large.

1. Do you have an example?
2. You're not running a query to get the columns, then a separate 
count(*) to get a rowcount are you?


> And since the file system based query caching

feature of PG is unclear to me (I am just moving from MySQL where the
cache is quite powerful) I don't quite know what to do to speed up
these queries!


There isn't a "file system based query caching" feature, there's your 
operating-systems file-cache and PG's buffers. Neither of which cache 
query-results, but cache disk pages instead.


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Yet Another COUNT(*)...WHERE...question

2007-08-15 Thread Scott Marlowe

On 8/15/07, Phoenix Kiula <[EMAIL PROTECTED]> wrote:
> On 15/08/07, Gregory Stark <[EMAIL PROTECTED]> wrote:
> > "Phoenix Kiula" <[EMAIL PROTECTED]> writes:
> >
> > > I'm grappling with a lot of reporting code for our app that relies on
> > > queries such as:
> > >
> > >  SELECT COUNT(*) FROM TABLE WHERE (conditions)...
> > >...
> > > The number of such possibilities for multiple WHERE conditions is
> > > infinite...
> >
> > Depends on the "conditions" bit. You can't solve all of the infinite
> > possibilities -- well you can, just run the query above -- but if you want 
> > > to do better it's all about understanding your data.
>
>
> I am not sure what the advice here is. The WHERE condition comes from
> the indices. So if the query was not "COUNT(*)" but just a couple of
> columns, the query executes in less than a second. Just that COUNT(*)
> becomes horribly slow.

Sorry, but I don't believe you.  if you're doing a count(*) on the
same dataset that returns in < 1 second, then the count(*) with the
same where clause will run in < 1 second.  I haven't seen pgsql do
anything else.

> And since the file system based query caching
> feature of PG is unclear to me

There is no "query caching" in pgsql.  There is data caching.  Each
query has to get planned and executed though (unless prepared, then
just executed)

> (I am just moving from MySQL where the
> cache is quite powerful)

As long as nothing is changing behind the query, and invalidating the
query cache.  It is useful for reporting apps, but in a constantly
updating db pretty much useless.

> I don't quite know what to do to speed up
> these queries!

Post them with explain analyze output.  i.e.

explain analyze yourqueryhere

cut and past the query and the output.  as well as the table schema.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] check if database is correctly created

2007-08-15 Thread Tom Lane

"Alain Roger" <[EMAIL PROTECTED]> writes:
> i would like to check (via PHP or C#) if my database has been correctly
> created.
> for that i use the following SQL :
> select * from pg_tables where tablename = 'xxx' AND schemaname = 'yyy';
> this i repeat till i check all tables.
> But how to check sequences, index, functions, and so on ?

Instead of re-inventing the wheel, why not run "pg_dump -s" and diff its
output against that from a known good database?

I think there is a more sophisticated "database diff" tool out there,
too ... maybe on pgfoundry.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

[GENERAL] Customizing psql console to show execution times

2007-08-15 Thread Phoenix Kiula

In some examples posted to this forum, it seems to me that when people
execute queries in the psql window, they also see "90 ms taken"
(milliseconds), which denotes the time taken to execute the query.
Where can I set this option because I'm not seeing it in my psql
window on both Win XP and Linux. Thanks!

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Yet Another COUNT(*)...WHERE...question

2007-08-15 Thread Scott Marlowe

On 8/15/07, Phoenix Kiula <[EMAIL PROTECTED]> wrote:
> On 15/08/07, Scott Marlowe <[EMAIL PROTECTED]> wrote:
> > On 8/15/07, Phoenix Kiula <[EMAIL PROTECTED]> wrote:
> > > On 15/08/07, Gregory Stark <[EMAIL PROTECTED]> wrote:
> > > > "Phoenix Kiula" <[EMAIL PROTECTED]> writes:
> > > >
> > > > > I'm grappling with a lot of reporting code for our app that relies on
> > > > > queries such as:
> > > > >
> > > > >  SELECT COUNT(*) FROM TABLE WHERE (conditions)...
> > > > >...
> > > > > The number of such possibilities for multiple WHERE conditions is
> > > > > infinite...
> > > >
> > > > Depends on the "conditions" bit. You can't solve all of the infinite
> > > > possibilities -- well you can, just run the query above -- but if you 
> > > > want > to do better it's all about understanding your data.
> > >
> > >
> > > I am not sure what the advice here is. The WHERE condition comes from
> > > the indices. So if the query was not "COUNT(*)" but just a couple of
> > > columns, the query executes in less than a second. Just that COUNT(*)
> > > becomes horribly slow.
> >
> > Sorry, but I don't believe you.  if you're doing a count(*) on the
> > same dataset that returns in < 1 second, then the count(*) with the
> > same where clause will run in < 1 second.  I haven't seen pgsql do
> > anything else.
>
>
>
> Sorry I was not clear. Imagine an Amazon.com search results page. It
> has about 15 results on Page 1, then it shows "Page 1 of 190".
>
> To show each page, the query probably has a "LIMIT 15 OFFSET 0" for
> Page 1. However, to calculate the total number of pages, they probably
> do a separate counts query, because doing a "select *" and then
> counting the number of rows returned would be even more inefficient
> than a count(*).

When I go to amazon.com I only ever get three pages of results.  ever.
 Because they know that returning 190 pages is not that useful, as
hardly anyone is going to wander through that many pages.

Google, you'll notice says "Results 1 - 10 of about 5,610,000 for
blacksmith"  i.e. it's guesstimating as well.  no reason for google to
look at every single row for blacksmith to know that there's about 5.6
million.

> So, in reporting, two queries are fairly common I would think, unless
> I am missing something?

Yes, optimization. :)  You don't need an exact count to tell someone
that there's more data and they can go to it.  Note that if you are
planning on doing things google sized, you'll need to do what they
did, invent your own specialized database.

For us mere mortals, it's quite likely that you can do something like:

explain select * from table where field like 'abc%';

and then parse the explain output for an approximate number.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] Yet Another COUNT(*)...WHERE...question

2007-08-15 Thread Richard Broersma Jr

--- Scott Marlowe <[EMAIL PROTECTED]> wrote:
> Generally, for these kinds of things it's often best to use
> materialized views / rollup tables so that you aren't re-aggregating
> the same data over and over.

I don't know if this was already mentioned, but here is one of the links that 
describe the method
of implementing a materialized view.
http://www.jonathangardner.net/PostgreSQL/materialized_views/matviews.html

other useful docs like this one can be found here:

http://www.postgresql.org/docs/techdocs.2

Regards,
Richard Broersma Jr.

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] Yet Another COUNT(*)...WHERE...question

2007-08-15 Thread Richard Broersma Jr

--- Phoenix Kiula <[EMAIL PROTECTED]> wrote:
> Sorry I was not clear. Imagine an Amazon.com search results page. It
> has about 15 results on Page 1, then it shows "Page 1 of 190".

I don't think that amazon or google really need to give an accurate count in 
determining an
estimated number of pages...

Could you determine the number of pages quickly from postgresql:

[ row count estimate ] / [ number of rows you want per page]

The estimated row count is updated every time you vacuum your tables.  And 
getting the estimate
takes very little time. 

> To show each page, the query probably has a "LIMIT 15 OFFSET 0" for
> Page 1.

The "LIMIT 15 OFFSET 1500" technique can be a performance killer since offset 
does not use an
index.

Is is better to use the last entry of each page in the query for the next page, 
so you can write
your query this way:

  SELECT *
FROM your_table
   WHERE item_nbr > [: last item on previous page :]
ORDER BY item_nbr
   LIMIT 15;

This method was discuss on the list a couple of months ago.

Regards,
Richard Broersma Jr.

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] Yet Another COUNT(*)...WHERE...question

2007-08-15 Thread Phoenix Kiula

On 15/08/07, Scott Marlowe <[EMAIL PROTECTED]> wrote:
> On 8/15/07, Phoenix Kiula <[EMAIL PROTECTED]> wrote:
> > On 15/08/07, Gregory Stark <[EMAIL PROTECTED]> wrote:
> > > "Phoenix Kiula" <[EMAIL PROTECTED]> writes:
> > >
> > > > I'm grappling with a lot of reporting code for our app that relies on
> > > > queries such as:
> > > >
> > > >  SELECT COUNT(*) FROM TABLE WHERE (conditions)...
> > > >...
> > > > The number of such possibilities for multiple WHERE conditions is
> > > > infinite...
> > >
> > > Depends on the "conditions" bit. You can't solve all of the infinite
> > > possibilities -- well you can, just run the query above -- but if you 
> > > want > to do better it's all about understanding your data.
> >
> >
> > I am not sure what the advice here is. The WHERE condition comes from
> > the indices. So if the query was not "COUNT(*)" but just a couple of
> > columns, the query executes in less than a second. Just that COUNT(*)
> > becomes horribly slow.
>
> Sorry, but I don't believe you.  if you're doing a count(*) on the
> same dataset that returns in < 1 second, then the count(*) with the
> same where clause will run in < 1 second.  I haven't seen pgsql do
> anything else.



Sorry I was not clear. Imagine an Amazon.com search results page. It
has about 15 results on Page 1, then it shows "Page 1 of 190".

To show each page, the query probably has a "LIMIT 15 OFFSET 0" for
Page 1. However, to calculate the total number of pages, they probably
do a separate counts query, because doing a "select *" and then
counting the number of rows returned would be even more inefficient
than a count(*).

So, in reporting, two queries are fairly common I would think, unless
I am missing something?

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Customizing psql console to show execution times

2007-08-15 Thread Richard Broersma Jr

--- Phoenix Kiula <[EMAIL PROTECTED]> wrote:

> In some examples posted to this forum, it seems to me that when people
> execute queries in the psql window, they also see "90 ms taken"
> (milliseconds), which denotes the time taken to execute the query.
> Where can I set this option because I'm not seeing it in my psql
> window on both Win XP and Linux. Thanks!

You can get the timing from psql two ways:

EXPLAIN ANALYZE your_query;

Or you can use of the psql internal commands to see the this of psql internal 
command type "\?"
from the psql command prompt.  The one you want is "\timing".  However, notice 
that "\timing" and
explain analyze do not exactly agree on the results they produce.

IIRC, "\time" also counts its own overhead.

Regards,
Richard Broersma Jr.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] Yet Another COUNT(*)...WHERE...question

2007-08-15 Thread Phoenix Kiula

> Yes, optimization. :)  You don't need an exact count to tell someone
> that there's more data and they can go to it.


In general, I agree. But my example of Amazon was only to illustrate
the point about two queries and why they may be needed. I seem to see
many more pages than you do, but in any case, Google and Amazon can
afford to be less precise.

Thanks for the suggestion of using EXPLAIN and parsing an
approximation, but when you need to show a trader how many trades he
has made, for instance, then approximation is not a possibility at
all. Especially not if the numbers sway so wildly --


FIRSTDB=# explain select * from trades where t_id = 'kXjha';
QUERY PLAN
---
 Bitmap Heap Scan on trades (cost=15.77..1447.12 rows=374 width=224)
   Recheck Cond: ((t_id)::text = 'kXjha'::text)
   ->  Bitmap Index Scan on trades_tid_date  (cost=0.00..15.67 rows=374 width=0)
 Index Cond: ((t_id)::text = 'kXjha'::text)
(4 rows)

FIRSTDB=# select count(*) from trades where t_id = 'kXjha';
 count
---
  3891
(1 row)



Could I do something so that the EXPLAIN showed up with slightly more
close-to-accurate stats? The above query is just after a "vacuum
analyze"!

Much appreciate the suggestions.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Customizing psql console to show execution times

2007-08-15 Thread Leif B. Kristensen

On Wednesday 15. August 2007, Phoenix Kiula wrote:
>In some examples posted to this forum, it seems to me that when people
>execute queries in the psql window, they also see "90 ms taken"
>(milliseconds), which denotes the time taken to execute the query.
>Where can I set this option because I'm not seeing it in my psql
>window on both Win XP and Linux. Thanks!

pgslekt=> select count(*) from persons;
 count
---
 16704
(1 row)

pgslekt=> \timing
Timing is on.
pgslekt=> select count(*) from persons;
 count
---
 16704
(1 row)

Time: 3,127 ms
pgslekt=> \timing
Timing is off.
pgslekt=>
-- 
Leif Biberg Kristensen | Registered Linux User #338009
http://solumslekt.org/ | Cruising with Gentoo/KDE
My Jazz Jukebox: http://www.last.fm/user/leifbk/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] Customizing psql console to show execution times

2007-08-15 Thread Scott Marlowe

On 8/15/07, Phoenix Kiula <[EMAIL PROTECTED]> wrote:
> In some examples posted to this forum, it seems to me that when people
> execute queries in the psql window, they also see "90 ms taken"
> (milliseconds), which denotes the time taken to execute the query.
> Where can I set this option because I'm not seeing it in my psql
> window on both Win XP and Linux. Thanks!

If your on an OS with a time command (linux, windows with the gnu
utils ported to it) you can do:

time psql -c "select ..."

OR you can do

echo "\\\timing\nselect "|psql dbname

This is all from linux.  I don't know how this "windows" you speak of
works really.  Seriously, I stopped using it so long ago I'm lost when
I'm on it.

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] Customizing psql console to show execution times

2007-08-15 Thread Tom Lane

"Phoenix Kiula" <[EMAIL PROTECTED]> writes:
> In some examples posted to this forum, it seems to me that when people
> execute queries in the psql window, they also see "90 ms taken"
> (milliseconds), which denotes the time taken to execute the query.
> Where can I set this option because I'm not seeing it in my psql
> window on both Win XP and Linux. Thanks!

I think you're looking for the \timing command?
http://www.postgresql.org/docs/8.2/static/app-psql.html
(under meta-commands, about halfway down the page)

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] is this trigger safe and efective? - locking (caching via triiggers)

2007-08-15 Thread Erik Jones


On Aug 15, 2007, at 4:57 AM, Pavel Stehule wrote:


I write sample about triggers and i have question. is my solution
correct and exists better solution?

Regards
Pavel Stehule

DROP SCHEMA safecache CASCADE;

CREATE SCHEMA safecache;

CREATE TABLE safecache.source_tbl(category int, int_value int);

CREATE TABLE safecache.cache(category int, sum_val int);

CREATE OR REPLACE FUNCTION safecache.source_tbl_trg_fce()
RETURNS trigger AS
$$
BEGIN
  IF TG_OP = 'INSERT' THEN
-- row cannot exists in cache -- complication
-- I would to finish these transaction without conflict
IF NOT EXISTS(SELECT category
 FROM safecache.cache
WHERE category = NEW.category) THEN
  LOCK TABLE safecache.cache IN SHARE ROW EXCLUSIVE MODE;
  -- I have to repeat test
  IF NOT EXISTS(SELECT category
   FROM safecache.cache
  WHERE category = NEW.category) THEN
INSERT INTO safecache.cache
   VALUES(NEW.category, NEW.int_value);
  END IF;
ELSE
  -- simple
  UPDATE safecache.cache
 SET sum_val = sum_val + NEW.int_value
WHERE category = NEW.category;
END IF;
  ELSEIF TG_OP = 'UPDATE' THEN
-- if category is without change simple
IF NEW.category = OLD.category THEN
  UPDATE safecache.cache
 SET sum_val = sum_val + (NEW.int_value - OLD.int_value)
WHERE category = OLD.category;
ELSE
  -- old category has to exists
  UPDATE safecache.cache
 SET sum_val = sum_val - OLD.int_value
WHERE category = OLD.category;
  -- new category is maybe problem
  IF NOT EXISTS(SELECT category
   FROM safecache.cache
  WHERE category = NEW.category) THEN
LOCK TABLE safecache.cache IN SHARE ROW EXCLUSIVE MODE;
-- I have to repeat test
IF NOT EXISTS(SELECT category
 FROM safecache.cache
WHERE category = NEW.category) THEN
  INSERT INTO safecache.cache
 VALUES(NEW.category, NEW.int_value);
END IF;
  ELSE
-- simple, new category exists
UPDATE safecache.cache
   SET sum_val = sum_val + OLD.int_value
  WHERE category = NEW.category;
  END IF;
END IF;
  ELSE -- DELETE
-- value have to exist in cache, simple
UPDATE safecache.cache
   SET sum_val = sum_val - OLD.int_value
  WHERE category = OLD.category;
  END IF;
  RETURN NEW;
END
$$ LANGUAGE plpgsql;

CREATE TRIGGER actualise_cache
   AFTER INSERT OR UPDATE OR DELETE
   ON safecache.source_tbl
   FOR EACH ROW EXECUTE PROCEDURE safecache.source_tbl_trg_fce();


From what I can tell from your example it's "correct" and should  
work under light loads.  However, if that trigger will fire a lot,  
you might see those updates "stacking" due to the necessary locking  
(both your explicit locks and those take out by the updates).  What  
I've done in that case (this is actually a pretty standard setup), is  
to have the trigger just make inserts into another table of the  
category that needs to be updated and by how much.  The you have some  
other (probably user-land) process periodically sweep that table,  
aggregate the updates to the cache table, then delete the interim  
entries just processed.  Oh yeah, you could simplify that function a  
lot by simply initializing your cache table with a row for each  
category with sum_val = 0.  Then it's all updates and you don't need  
those locks to determine if the category exists there.


Erik Jones

Software Developer | Emma®
[EMAIL PROTECTED]
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [GENERAL] Permission ALTER PASSWORD

2007-08-15 Thread Decibel!

On Wed, Aug 08, 2007 at 06:35:51PM -0300, Anderson Alves de Albuquerque  wrote:
>  After user $USER execute this ALTER, it get change PASSWORD. Could I block
> command ALTER password to user $USER?

No, there's no way to do that. You might want to look at using
ident-based authentication for that user instead.
-- 
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpwsZ3lNWqxV.pgp
Description: PGP signature

Re: [GENERAL] Interpreting statistics collector output

2007-08-15 Thread Decibel!

On Thu, Aug 09, 2007 at 09:14:55PM -0400, Steve Madsen wrote:
> On Aug 8, 2007, at 6:08 PM, Decibel! wrote:
> >Something else I like to look at is pg_stat_all_tables seq_scan and
> >seq_tup_read. If seq_scan is a large number and seq_tup_read/ 
> >seq_scan is
> >also large, that indicates that you could use an index on that table.
> 
> If seq_tup_read / seq_scan is large relative to the number of rows in  
> the table, wouldn't that imply that those sequential scans are often  
> returning most of the rows in the table?  In that case, would an  
> index help much or is a sequential scan the expected result?

I can't really think of a case where a seqscan wouldn't return all the
rows in the table... that's what it's meant to do.

What I was driving at by looking at seq_tup_read is that a small table
isn't going to use indexes anyway, so for the small tables it's
generally not worth worrying about indexes. If you wanted to be more
accurate you could look at reltuples or maybe relpages in pg_class
instead.
-- 
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpg1IhUaxRGw.pgp
Description: PGP signature

Re: [GENERAL] Select time jump after adding filter; please help me figure out what I'm doing wrong.

2007-08-15 Thread Andrew Edson

I apologize about the CC; I thought I had done so.

  There are fourteen (14) distinct values in rcrd_cd.  And I don't know if this 
counts as something odd, but I got the following values by doing a vacuum full 
analyze, then running the set with index, dropping index, running set without.

  Values with index:
> 
> attest=# select count(*) from ptrans;
>  16488578
> 
> attest=# select count(*) from ptrans where rcrd_cd = '0A';
>  6701655
> 
> attest=# explain analyze select cntrct_id, tran_dt from ptrans where rcrd_cd 
> = '   
>   
>>  0A';
>  Bitmap Heap Scan on ptrans  (cost=1223.86..151183.39 rows=87439 width=21) 
> (actu 
>   
>  >  al time=2255.640..70371.304 rows=6701655 loops=1)
>->  Bitmap Index Scan on ptrans_cid_trandt_idx  (cost=0.00..1223.86 
> rows=8743 
>   
>  >  9 width=0) (actual time=2216.856..2216.856 rows=204855 loops=1)
>  Total runtime: 89964.419 ms
> 
  Values without index:
> 
> attest=# select count(*) from ptrans;
>  16488578
> 
> attest=# select count(*) from ptrans where rcrd_cd = '0A';
>  204855
> 
> attest=# explain analyze select cntrct_id, tran_dt from ptrans where rcrd_cd 
> = '0A';
>  Seq Scan on ptrans  (cost=0.00..384813.22 rows=87439 width=21) (actual 
> time=20.286..65330.049 rows=204855 loops=1)
>Filter: (rcrd_cd = '0A'::bpchar)
>  Total runtime: 65945.160 ms

  Again, someone mind pointing out to me where I've managed to mess this one up?

Richard Huxton <[EMAIL PROTECTED]> wrote:
Don't forget to CC: the list.

Andrew Edson wrote:
> I apologize; You are correct in that I mistyped my original structure. Here 
> is the information for the correct explain and explain analyze statements.
> 
> attest=# explain select cntrct_id, tran_dt from ptrans where rcrd_cd = '0A';

No need for the simple explain - explain analyse includes all the 
information.

> attest=# explain analyze select cntrct_id, tran_dt from ptrans where rcrd_cd 
> = '0A';
> QUERY PLAN
> 
> Bitmap Heap Scan on ptrans (cost=1223.86..149853.23 rows=85741 width=21) 
> (actual time=2302.363..70321.838 rows=6701655 loops=1)
> -> Bitmap Index Scan on ptrans_cid_trandt_idx (cost=0.00..1223.86 rows=85741 
> width=0) (actual time=2269.064..2269.064 rows=204855 loops=1)
> Total runtime: 89854.843 ms

Well, it's taking 90 seconds to return 6.7 million rows. Depending on 
your system and memory settings, that might not be unreasonable.

It *is* getting the estimate of returned rows wrong (it thinks 85,741 
will match) which is hugely out of line. Is there something odd with 
this table/column or haven't you analysed recently? How many unique 
values does rcrd_cd have, and how many rows does the table have?

I don't know that you'll get this down to sub-second responses though, 
not if you're trying to return 6 million rows from an even larger table.

-- 
Richard Huxton
Archonet Ltd

---(end of broadcast)---
TIP 6: explain analyze is your friend

-
Ready for the edge of your seat? Check out tonight's top picks on Yahoo! TV.

Re: [GENERAL] Customizing psql console to show execution times

2007-08-15 Thread Phoenix Kiula

> I think you're looking for the \timing command?
> http://www.postgresql.org/docs/8.2/static/app-psql.html
> (under meta-commands, about halfway down the page)


Thanks everyone. "\timing" it is!

Happy camper.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Best practice for: ERROR: invalid byte sequence for encoding "UTF8"

2007-08-15 Thread Vivek Khera



On Aug 15, 2007, at 7:41 AM, Ivan Zolotukhin wrote:


What is the best practice to process such a broken strings before
passing them to PostgreSQL? Iconv from utf-8 to utf-8 dropping bad
characters?


This rings of GIGO... if your user enters garbage, how do you know  
what they wanted?  You don't.  You tell them they entered garbage,  
else the result will be garbage.  Data validation... learn from   
microsoft's mistakes :-)



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Customizing psql console to show execution times

2007-08-15 Thread Tom Lane

Richard Broersma Jr <[EMAIL PROTECTED]> writes:
> However, notice that "\timing" and
> explain analyze do not exactly agree on the results they produce.

\timing reports the total elapsed time as seen at the client.  EXPLAIN
ANALYZE tells you about the query execution path inside the server; so
it omits the costs of parsing, planning, and network data transmission.

EXPLAIN ANALYZE also has much higher measurement overhead (typically 2
gettimeofday() calls per row, rather than 2 per query as for \timing).
So it's not unheard of for E.A. to report a number *larger* than the
actual execution time, especially on cheap PC hardware which tends to
have dog-slow gettimeofday().

They're both useful, but you have to keep in mind what you're measuring
and for what purpose.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] Cluster and MVCC

2007-08-15 Thread Decibel!

On Fri, Aug 10, 2007 at 06:34:03PM +0100, Simon Riggs wrote:
> On Fri, 2007-08-10 at 10:02 -0400, Brad Nicholson wrote:
> > I just want to confirm that the cluster/MVCC issues are due to
> > transaction visibility.  Assuming that no concurrent access is happening
> > to a given table when the cluster command is issued (when takes it
> > visibility snapshot), it is safe to cluster that table.  Correct?
> 
> Yes, as long as pre-existing transactions do not then access the
> clustered table. If they do, rows they should have seen will now not be
> visible, yet you won't get an error message to say so.

Don't you also need to be in a serialized transaction?
-- 
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpM50NoxGTMF.pgp
Description: PGP signature

Re: [GENERAL] Yet Another COUNT(*)...WHERE...question

2007-08-15 Thread Bill Moran

In response to "Phoenix Kiula" <[EMAIL PROTECTED]>:

> > Yes, optimization. :)  You don't need an exact count to tell someone
> > that there's more data and they can go to it.
> 
> 
> In general, I agree. But my example of Amazon was only to illustrate
> the point about two queries and why they may be needed. I seem to see
> many more pages than you do, but in any case, Google and Amazon can
> afford to be less precise.
> 
> Thanks for the suggestion of using EXPLAIN and parsing an
> approximation, but when you need to show a trader how many trades he
> has made, for instance, then approximation is not a possibility at
> all. Especially not if the numbers sway so wildly --
> 
> 
> FIRSTDB=# explain select * from trades where t_id = 'kXjha';
> QUERY PLAN
> ---
>  Bitmap Heap Scan on trades (cost=15.77..1447.12 rows=374 width=224)
>Recheck Cond: ((t_id)::text = 'kXjha'::text)
>->  Bitmap Index Scan on trades_tid_date  (cost=0.00..15.67 rows=374 
> width=0)
>  Index Cond: ((t_id)::text = 'kXjha'::text)
> (4 rows)
> 
> FIRSTDB=# select count(*) from trades where t_id = 'kXjha';
>  count
> ---
>   3891
> (1 row)
> 
> 
> 
> Could I do something so that the EXPLAIN showed up with slightly more
> close-to-accurate stats? The above query is just after a "vacuum
> analyze"!

In the above case, you could probably materialize the data with a trigger
that updates a counter in a separate table every time a new trade is
added.  This will give you 100% accurate results with _very_ fast
response time.

Part of the problem is that there's no one answer to your question, there
are multiple approaches to solving it, depending on the details of the
problem and the acceptable time/accuracy of the answers.  Some basic
approaches:

1) Materialize the data.  MySQL actually does this automatically for you
   with MyISAM tables, which is why count(*) is so fast.  But if you
   absolutely need fast, accurate counts, you can build your own
   triggers in PG.  This is unlikely to be practical with all queries.
2) Estimate.  The accuracy of estimates can vary wildly by query and
   how often the database is analyzed, etc.  For something like,
   "show results 1 - 10 of about 50,000", estimates are great and fast,
   but for other cases, not acceptable.  The good news is you can get
   a fast estimate from any query with no up-front work.
3) Accept that sometimes to get accurate answers it's going to take
   time.  Around here, we call it the "Orbitz" technique, because when
   we discuss it, everyone thinks of the "please wait while I process
   your query" page you get from orbitz.com.  You'd be surprised how
   willing your users are to wait, as long as they know they have to
   wait.
4) Throw more hardware at it.  If you absolutely _must_have_ super-
   accurate results faster, then you may need to buy more RAM, faster
   disks and faster CPUs to accomplish it.
5) Come up with something revolutionary that nobody's every thought of
   before.  Good luck with this one.

Of course, all of these ideas are only practical if you've already
ensured that your system is properly tuned.  Crappy values for
shared_buffers and other tuning will lead you to waste time trying
to redesign something that should work just fine, so verify all your
configuration first.  You may be able to get more acceptable estimates
by increasing your statistics targets, for example.

-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] Yet Another COUNT(*)...WHERE...question

2007-08-15 Thread Erik Jones


On Aug 15, 2007, at 9:36 AM, Phoenix Kiula wrote:


I'm grappling with a lot of reporting code for our app that relies on
queries such as:

 SELECT COUNT(*) FROM TABLE WHERE (conditions)...

And I still do not find, from the discussions on this thread, any
truly viable solution for this. The one suggestion is to have a
separate counts table, which is fine for total aggregates related to,
say, an ID. E.g., a table with:

trader_id, trade_count

But this is an overall count for the trader (in my example). What if I
need a count of all his trades in the last one week. Then I need a
timestamp condition in there as well. The number of such possibilities
for multiple WHERE conditions is infinite...how should we account for
all these avenues?


There is no general solution.  While theoretically the multiple WHERE  
conditions are infinite, in reality their limited to your actual use  
cases and the solutions are thereby dictated by those.   Using a  
separate cache table is often a viable option used in situations  
where constantly up to date realtime values.  Another common option  
is smart usage of indexes, i.e remember that you can index on the  
results of a function applied to row values as well as partial  
indexes.  Another is table partitioning.  Asking how to optimize  
"SELECT COUNT(*) FROM TABLE WHER... (conditions)" is not a good  
question as the solution is dependent on those conditions.  Pick your  
most common conditions and optimize for those.


Also, in many cases for reporting apps, 10 minutes is not long at  
all.  If you have reports that you can't make happen faster, schedule  
and automate them.


Erik Jones

Software Developer | Emma®
[EMAIL PROTECTED]
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [GENERAL] Deadlocks caused by a foreign key constraint

2007-08-15 Thread Decibel!

On Fri, Aug 10, 2007 at 09:38:36PM +0400, Dmitry Koterov wrote:
> Hello.
> 
> I have a number of deadlock because of the foreign key constraint:
> 
> Assume we have 2 tables: A and B. Table A has a field fk referenced to
> B.idas a foreign key constraint.
> 
> 
> -- transaction #1
> BEGIN;
> ...
> INSERT INTO A(x, y, fk) VALUES (1, 2, 666);
> ...
> END;
> 
> 
> -- transaction #2
> BEGIN;
> UPDATE B SET z = z + 1 WHERE id = 666;
> ...
> UPDATE B SET z = z + 1 WHERE id = 666;
> ...
> UPDATE B SET z = z + 1 WHERE id = 666;
> END;
> 
> 
> You see, table A is only inserted, and table B is only updated their field z
> on its single row.
> If we execute a lot of these transactions concurrently using multiple
> parellel threads, sometimes we have a deadlock:
> 
> DETAIL:  Process 6867 waits for ShareLock on transaction 1259392; blocked by
> process 30444.
> Process 30444 waits for ShareLock on transaction 1259387; blocked by
> process 6867.
> CONTEXT:  SQL statement "SELECT 1 FROM ONLY "public"."B" x WHERE "id" = $1
> FOR SHARE OF x"
> 
> If I delete the foreign key constraint, all begins to work fine.
> Seems Postgres thinks that "UPDATE B SET z = z + 1 WHERE id = 666" query may
> modify B.id field and touch A.fk, so it holds the shareable lock on it.

What version are you running? I'm pretty sure that recent versions check
to see if the key actually changed.

> The question is: is it possible to KEEP this foreign key constraint, but
> avoid deadlocks?

I'm pretty sure that the deadlock is actually being caused by your
application code, likely because you're doing multiple updates within
one transaction, but not being careful about the id order you do them
in.
-- 
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpYkssvlP10m.pgp
Description: PGP signature

Re: [GENERAL] Performance check of my database

2007-08-15 Thread Decibel!

On Sun, Aug 12, 2007 at 05:40:26PM -0400, Harpreet Dhaliwal wrote:
> Hi,
> 
> Lately I completed the business logic of my application and all related
> database work.
> 
> Now i need to check the performance of my database, how much load it can
> bear, perfomance to different queries and stored procedures.
> 
> Basically i need to do the performance testing of my database and based on
> that I need to take a call whether i should go for clustering or not.
> 
> Please let me know the best practices in postgres for such an activity.

Best bet is to write something that drives a fake workload though your
complete system; that way you can see exactly how everything in the
system will handle load.
-- 
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgp5CPDG1bTag.pgp
Description: PGP signature

Re: [GENERAL] is this trigger safe and efective? - locking (caching via triiggers)

2007-08-15 Thread Pavel Stehule

2007/8/15, Erik Jones <[EMAIL PROTECTED]>:
> On Aug 15, 2007, at 4:57 AM, Pavel Stehule wrote:
>
> > I write sample about triggers and i have question. is my solution
> > correct and exists better solution?
> >
> > Regards
> > Pavel Stehule
> >
> > DROP SCHEMA safecache CASCADE;
> >
> > CREATE SCHEMA safecache;
> >
> > CREATE TABLE safecache.source_tbl(category int, int_value int);
> >
> > CREATE TABLE safecache.cache(category int, sum_val int);
> >
> > CREATE OR REPLACE FUNCTION safecache.source_tbl_trg_fce()
> > RETURNS trigger AS
> > $$
> > BEGIN
> >   IF TG_OP = 'INSERT' THEN
> > -- row cannot exists in cache -- complication
> > -- I would to finish these transaction without conflict
> > IF NOT EXISTS(SELECT category
> >  FROM safecache.cache
> > WHERE category = NEW.category) THEN
> >   LOCK TABLE safecache.cache IN SHARE ROW EXCLUSIVE MODE;
> >   -- I have to repeat test
> >   IF NOT EXISTS(SELECT category
> >FROM safecache.cache
> >   WHERE category = NEW.category) THEN
> > INSERT INTO safecache.cache
> >VALUES(NEW.category, NEW.int_value);
> >   END IF;
> > ELSE
> >   -- simple
> >   UPDATE safecache.cache
> >  SET sum_val = sum_val + NEW.int_value
> > WHERE category = NEW.category;
> > END IF;
> >   ELSEIF TG_OP = 'UPDATE' THEN
> > -- if category is without change simple
> > IF NEW.category = OLD.category THEN
> >   UPDATE safecache.cache
> >  SET sum_val = sum_val + (NEW.int_value - OLD.int_value)
> > WHERE category = OLD.category;
> > ELSE
> >   -- old category has to exists
> >   UPDATE safecache.cache
> >  SET sum_val = sum_val - OLD.int_value
> > WHERE category = OLD.category;
> >   -- new category is maybe problem
> >   IF NOT EXISTS(SELECT category
> >FROM safecache.cache
> >   WHERE category = NEW.category) THEN
> > LOCK TABLE safecache.cache IN SHARE ROW EXCLUSIVE MODE;
> > -- I have to repeat test
> > IF NOT EXISTS(SELECT category
> >  FROM safecache.cache
> > WHERE category = NEW.category) THEN
> >   INSERT INTO safecache.cache
> >  VALUES(NEW.category, NEW.int_value);
> > END IF;
> >   ELSE
> > -- simple, new category exists
> > UPDATE safecache.cache
> >SET sum_val = sum_val + OLD.int_value
> >   WHERE category = NEW.category;
> >   END IF;
> > END IF;
> >   ELSE -- DELETE
> > -- value have to exist in cache, simple
> > UPDATE safecache.cache
> >SET sum_val = sum_val - OLD.int_value
> >   WHERE category = OLD.category;
> >   END IF;
> >   RETURN NEW;
> > END
> > $$ LANGUAGE plpgsql;
> >
> > CREATE TRIGGER actualise_cache
> >AFTER INSERT OR UPDATE OR DELETE
> >ON safecache.source_tbl
> >FOR EACH ROW EXECUTE PROCEDURE safecache.source_tbl_trg_fce();
>
>  From what I can tell from your example it's "correct" and should
> work under light loads.  However, if that trigger will fire a lot,
> you might see those updates "stacking" due to the necessary locking
> (both your explicit locks and those take out by the updates).  What
> I've done in that case (this is actually a pretty standard setup), is
> to have the trigger just make inserts into another table of the
> category that needs to be updated and by how much.  The you have some
> other (probably user-land) process periodically sweep that table,
> aggregate the updates to the cache table, then delete the interim
> entries just processed.  Oh yeah, you could simplify that function a
> lot by simply initializing your cache table with a row for each
> category with sum_val = 0.  Then it's all updates and you don't need
> those locks to determine if the category exists there.

I know it, but I don't know all posible category numbers, and others.
I sent sample with minimum (one) pk attributies.

>
> Erik Jones

I have 98% of SELECTs and 2% of INSERTs and UPDATE

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Select time jump after adding filter; please help me figure out what I'm doing wrong.

2007-08-15 Thread Richard Huxton

Andrew Edson wrote:

I apologize about the CC; I thought I had done so.

no problem

There are fourteen (14) distinct values in rcrd_cd.  And I don't know
if this counts as something odd, but I got the following values by
doing a vacuum full analyze, then running the set with index,
dropping index, running set without.

Might want to do ALTER TABLE ... ALTER COLUMN rcrd_cd SET STATISTICS =
14 (or a few more than that if you think it might be useful) - won't
help you with this though.

So - are you saying that with these two queries...

attest=# select count(*) from ptrans where rcrd_cd = '0A'; 

>> 6701655

attest=# select count(*) from ptrans where rcrd_cd = '0A'; 

>> 204855

...the only difference is that you've dropped an index?

Because that's just strange - and I don't think it's anything you're doing.

Do you have other partial indexes for different values of rcrd_cd, and
do they have similar problems? If this can be reproduced it might point 
to something odd with bitmap scans.

Oh, remind me what version of PostgreSQL you're running?

--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [GENERAL] is this trigger safe and efective? - locking (caching via triiggers)

2007-08-15 Thread Erik Jones


On Aug 15, 2007, at 11:14 AM, Pavel Stehule wrote:


2007/8/15, Erik Jones <[EMAIL PROTECTED]>:

On Aug 15, 2007, at 4:57 AM, Pavel Stehule wrote:


I write sample about triggers and i have question. is my solution
correct and exists better solution?

Regards
Pavel Stehule

DROP SCHEMA safecache CASCADE;

CREATE SCHEMA safecache;

CREATE TABLE safecache.source_tbl(category int, int_value int);

CREATE TABLE safecache.cache(category int, sum_val int);

CREATE OR REPLACE FUNCTION safecache.source_tbl_trg_fce()
RETURNS trigger AS
$$
BEGIN
  IF TG_OP = 'INSERT' THEN
-- row cannot exists in cache -- complication
-- I would to finish these transaction without conflict
IF NOT EXISTS(SELECT category
 FROM safecache.cache
WHERE category = NEW.category) THEN
  LOCK TABLE safecache.cache IN SHARE ROW EXCLUSIVE MODE;
  -- I have to repeat test
  IF NOT EXISTS(SELECT category
   FROM safecache.cache
  WHERE category = NEW.category) THEN
INSERT INTO safecache.cache
   VALUES(NEW.category, NEW.int_value);
  END IF;
ELSE
  -- simple
  UPDATE safecache.cache
 SET sum_val = sum_val + NEW.int_value
WHERE category = NEW.category;
END IF;
  ELSEIF TG_OP = 'UPDATE' THEN
-- if category is without change simple
IF NEW.category = OLD.category THEN
  UPDATE safecache.cache
 SET sum_val = sum_val + (NEW.int_value - OLD.int_value)
WHERE category = OLD.category;
ELSE
  -- old category has to exists
  UPDATE safecache.cache
 SET sum_val = sum_val - OLD.int_value
WHERE category = OLD.category;
  -- new category is maybe problem
  IF NOT EXISTS(SELECT category
   FROM safecache.cache
  WHERE category = NEW.category) THEN
LOCK TABLE safecache.cache IN SHARE ROW EXCLUSIVE MODE;
-- I have to repeat test
IF NOT EXISTS(SELECT category
 FROM safecache.cache
WHERE category = NEW.category) THEN
  INSERT INTO safecache.cache
 VALUES(NEW.category, NEW.int_value);
END IF;
  ELSE
-- simple, new category exists
UPDATE safecache.cache
   SET sum_val = sum_val + OLD.int_value
  WHERE category = NEW.category;
  END IF;
END IF;
  ELSE -- DELETE
-- value have to exist in cache, simple
UPDATE safecache.cache
   SET sum_val = sum_val - OLD.int_value
  WHERE category = OLD.category;
  END IF;
  RETURN NEW;
END
$$ LANGUAGE plpgsql;

CREATE TRIGGER actualise_cache
   AFTER INSERT OR UPDATE OR DELETE
   ON safecache.source_tbl
   FOR EACH ROW EXECUTE PROCEDURE safecache.source_tbl_trg_fce();


 From what I can tell from your example it's "correct" and should
work under light loads.  However, if that trigger will fire a lot,
you might see those updates "stacking" due to the necessary locking
(both your explicit locks and those take out by the updates).  What
I've done in that case (this is actually a pretty standard setup), is
to have the trigger just make inserts into another table of the
category that needs to be updated and by how much.  The you have some
other (probably user-land) process periodically sweep that table,
aggregate the updates to the cache table, then delete the interim
entries just processed.  Oh yeah, you could simplify that function a
lot by simply initializing your cache table with a row for each
category with sum_val = 0.  Then it's all updates and you don't need
those locks to determine if the category exists there.


I know it, but I don't know all posible category numbers, and others.
I sent sample with minimum (one) pk attributies.


One workaround is to make an on insert trigger that fires before this  
one and checks to see if this is a new category and sets up the row  
with value 0 in the cache table.




Erik Jones


I have 98% of SELECTs and 2% of INSERTs and UPDATE


Sounds like you should be ok then and you may not need to go with the  
suggestions I've outlined.  However, be sure to keep a close eye on  
pg_locks when you push that trigger into production.


Erik Jones

Software Developer | Emma®
[EMAIL PROTECTED]
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [GENERAL] Select time jump after adding filter; please help me figure out what I'm doing wrong.

2007-08-15 Thread Andrew Edson

Yes.  The only difference between the two selects was that the index on the 
table in question was dropped.  As far as I know, that was the only partial 
index on there, although since it's a test db, I could probably go in and 
experiment on a few more if needed.

  This problem may have already been solved; I'm using an older version of 
Postgres; 8.1.3.  My boss has requested that it not be upgraded just yet, 
however, so I'm stuck with it for the moment.

  Richard Huxton <[EMAIL PROTECTED]> wrote:
  Andrew Edson wrote:
> I apologize about the CC; I thought I had done so.

no problem

> There are fourteen (14) distinct values in rcrd_cd. And I don't know
> if this counts as something odd, but I got the following values by
> doing a vacuum full analyze, then running the set with index,
> dropping index, running set without.

Might want to do ALTER TABLE ... ALTER COLUMN rcrd_cd SET STATISTICS =
14 (or a few more than that if you think it might be useful) - won't
help you with this though.

So - are you saying that with these two queries...

>> attest=# select count(*) from ptrans where rcrd_cd = '0A'; 
>> 6701655

>> attest=# select count(*) from ptrans where rcrd_cd = '0A'; 
>> 204855

...the only difference is that you've dropped an index?

Because that's just strange - and I don't think it's anything you're doing.

Do you have other partial indexes for different values of rcrd_cd, and
do they have similar problems? If this can be reproduced it might point 
to something odd with bitmap scans.

Oh, remind me what version of PostgreSQL you're running?

-- 
Richard Huxton
Archonet Ltd

-
Moody friends. Drama queens. Your life? Nope! - their life, your story.
 Play Sims Stories at Yahoo! Games.

Re: [GENERAL] Select time jump after adding filter; please help me figure out what I'm doing wrong.

2007-08-15 Thread Tom Lane

Andrew Edson <[EMAIL PROTECTED]> writes:
>   This problem may have already been solved; I'm using an older
>   version of Postgres; 8.1.3.

Ah.  I think your result is explained by this 8.1.4 bug fix:

2006-05-18 14:57  tgl

* src/backend/optimizer/plan/createplan.c (REL8_1_STABLE): When a
bitmap indexscan is using a partial index, it is necessary to
include the partial index predicate in the scan's "recheck
condition".  Otherwise, if the scan becomes lossy for lack of
bitmap memory, we would fail to enforce that returned rows satisfy
the predicate.  Noted while studying bug #2441 from Arjen van der
Meijden.

IOW, once the bitmap goes lossy, we'd return *every* row on any page
that the index fingered as having *any* relevant row.

>   My boss has requested that it not be
>   upgraded just yet, however, so I'm stuck with it for the moment. 

Better press him a little harder.  There are a lot of known bugs in
8.1.3, and not any very good reason not to update to a later 8.1.x.

As a really short-term measure, you could possibly avoid this bug by
increasing work_mem enough that the bitmap doesn't get compressed.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] Best practice for: ERROR: invalid byte sequence for encoding "UTF8"

2007-08-15 Thread Phoenix Kiula

On 15/08/07, Ivan Zolotukhin <[EMAIL PROTECTED]> wrote:
> Hello,
>
> Actually I tried smth like $str = @iconv("UTF-8", "UTF-8//IGNORE",
> $str); when preparing string for SQL query and it worked. There's
> probably a better way in PHP to achieve this: simply change default
> values in php.ini for these parameters:
>
> mbstring.encoding_translation = On
> mbstring.substitute_character = none
>
> and broken symbols will be automatically stripped off from the input
> and output.

Sadly, they don't always do that, not with Asian scripts.

And I do not completely agree, like the other poster suggested, with
the concept of GIGO. Sometimes you want the end-user's experience to
be seamless. For example, in one of our web sites, we allow users to
submit text through a bookmarklet, where the title of the webpage
comes in rawurlencoded format. We try to rawurldecode() it on our end
but most of the times the Asian interpretation is wrong. We have all
the usual mbstring settings in php.ini. In this scenario, the user did
not enter any garbage. Our application should have the ability to
recognize the text. We do what we can with mb_convert...etc, but the
database just throws an error.

PGSQL really needs to get with the program when it comes to utf-8 input.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] Select time jump after adding filter; please help me figure out what I'm doing wrong.

2007-08-15 Thread Andrew Edson



Tom Lane <[EMAIL PROTECTED]> wrote:  Ah. I think your result is explained by 
this 8.1.4 bug fix:

2006-05-18 14:57 tgl

* src/backend/optimizer/plan/createplan.c (REL8_1_STABLE): When a
bitmap indexscan is using a partial index, it is necessary to
include the partial index predicate in the scan's "recheck
condition". Otherwise, if the scan becomes lossy for lack of
bitmap memory, we would fail to enforce that returned rows satisfy
the predicate. Noted while studying bug #2441 from Arjen van der
Meijden.

IOW, once the bitmap goes lossy, we'd return *every* row on any page
that the index fingered as having *any* relevant row.

Better press him a little harder. There are a lot of known bugs in
8.1.3, and not any very good reason not to update to a later 8.1.x.

As a really short-term measure, you could possibly avoid this bug by
increasing work_mem enough that the bitmap doesn't get compressed.

regards, tom lane
I'll press as I can...thanks for pointing that out to me.  How would I 
determine how much to increase work_mem by (or to) until then?

   
-
Got a little couch potato? 
Check out fun summer activities for kids.

Re: [GENERAL] Select time jump after adding filter; please help me figure out what I'm doing wrong.

2007-08-15 Thread Richard Huxton


Andrew Edson wrote:

Yes.  The only difference between the two selects was that the index
on the table in question was dropped.  As far as I know, that was the
only partial index on there, although since it's a test db, I could
probably go in and experiment on a few more if needed.

This problem may have already been solved; I'm using an older version
of Postgres; 8.1.3.  My boss has requested that it not be upgraded
just yet, however, so I'm stuck with it for the moment.


You should probably upgrade to 8.1.9 - there are a couple of bugfixes 
that mention bitmap scans, although nothing that describes your problem 
in-depth.


http://www.postgresql.org/docs/8.1/static/release.html#RELEASE-8-1-9

You don't need to dump/restore the database, it's just a matter of 
upgrading the binaries within the 8.1 series. You should plan on doing 
point-release updates fairly frequently.


I'd put the partial index back, see if you can reproduce it and if so 
tell your boss you should upgrade. If you can't talk him around, give 
him the URL for the release notes along with the three bugs you think 
are most dangerous. If that doesn't work, I'm sure we can get someone 
from the core developers to tell you the same thing I just have, and we 
can forward that to your boss :-)


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/

[GENERAL] pg_dump on local Windows, pg_restore on Linux?

2007-08-15 Thread Phoenix Kiula

Couple of questions with porting:

1. I have been playing around with my databases locally on Win XP so
as not to hurt our website traffic. Now I would like to move the
database to a Linux CentOS server. Can I use pg_dump on Windows and
pg_restore it on Linux? If so, any tips on what I should keep in mind
(e.g., manual seems to suggest that pg_restore prefers tar gzipped
format...but I'm not sure if Windows can generate this?)

2. I would like my database to be UTF-8. I issue the command

CREATE DATABASE mydb OWNER me ENCODING 'utf8';

Should I add anything else to it, such as collation? I did not find
any option for that in here:
http://www.postgresql.org/docs/8.2/static/sql-createdatabase.html

3. Also, just to confirm, if I have utf-8 database, then all tables in
it should be able to take utf-8 data? I would just like these tables
to take whatever I send to them. No error checking or encoding
checking. Can I disable the painful error that PG keeps throwing if
even a single erroneous byte shows up? I'd rather have 'garbage data'
than not go through with the query.

Thanks for any tips!

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Interpreting statistics collector output

2007-08-15 Thread Steve Madsen


On Aug 15, 2007, at 11:52 AM, Decibel! wrote:

I can't really think of a case where a seqscan wouldn't return all the
rows in the table... that's what it's meant to do.


Isn't a sequential scan the only option if an appropriate index does  
not exist?  E.g., for a query with a WHERE clause, but none of the  
referenced columns are indexed.


Put another way: consider a large table with no indexes.   
seq_tup_read / seq_scan is the average number of rows returned per  
scan, and if this is a small percentage of the row count, then it  
seems reasonable to say an index should help query performance.   
(With the understanding that it's fewer common rather than many  
unique queries.)



What I was driving at by looking at seq_tup_read is that a small table
isn't going to use indexes anyway, so for the small tables it's
generally not worth worrying about indexes.


OK, so it sounds like there is a threshold to be determined where  
this sort of analysis isn't very interesting.


In the interests of stirring up more interest, earlier this week I  
released a plugin for Ruby on Rails that extracts these statistics  
from Postgres and provides helper methods to format them nicely in  
web page views.  I'm using it to put DB monitors on an administrative  
dashboard for a site I run.


I'd love to add some analysis logic to the plugin.  If useful  
suggestions can be drawn from interpreting the numbers, small badges  
can show up in the web page, making the tuning process a little less  
painful.


(The plugins project home page is at http://groups.google.com/group/ 
pgsql_stats.)


--
Steve Madsen <[EMAIL PROTECTED]>
Light Year Software, LLC  http://lightyearsoftware.com
ZingLists: Stay organized, and share lists online.  http://zinglists.com



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

[GENERAL] User-Friendly TimeZone List

2007-08-15 Thread Naz Gassiep


Hi all,
   I am still, after quite some time, wrangling over the time zone 
system in my app. I have sorted out all the internal handling, however I 
am still uncertain as to what the best way to get the user to select 
their time zone is.


   I was thinking of having users just select their timezones from a 
list which was populated with the contents of the query:

select * from pg_timezone_names ;
which would look something like this.
http://mrnaz.com/tztest.html

This however is problematic for a number of reasons:

1. The timezone list there isn't exactly user friendly, there are many 
Etc/* timezones there, as well as others that would be potentially 
confusing for users who are trying to select the timezone they are in.
2. If a timezone is removed from the list for whatever reason, then the 
system will be left with users who have selected a timezone that is no 
longer a valid choice in the list.


   The ideal situation would be to maintain my own persistent list of 
timezones (the way Microsoft maintains their own user-friendly list that 
they use for Windows) that maps to the timezones embedded in the PG back 
end, but I haven't the resources to pull this off. Has anyone else 
worked on a mechanism to allow users to supply the timezone they are in, 
and if so, do you have any comments on how best to handle the apparently 
mutually exclusive problems of simplicity for users and accuracy in the 
back end?


   At the simple end of the I can't just have users only select from a 
list going from GMT-12 to GMT+12. At the complex end of the scale I'd 
just give them the output from the list and any that are deprecated will 
result in the user reverting to UTC until they reset a new timezone.


Comments on this usability problem are appreciated,
- Naz.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Best practice for: ERROR: invalid byte sequence for encoding "UTF8"

2007-08-15 Thread Scott Marlowe

On 8/15/07, Phoenix Kiula <[EMAIL PROTECTED]> wrote:
> > What, exactly, does that mean?
> >
> > That PostgreSQL should take things in invalid utf-8 format and just store 
> > them?
> > Or that PostgreSQL should autoconvert from invalid utf-8 to valid
> > utf-8, guessing the proper codes?
> >
> > Seriously, what do you want pgsql to do with these invalid inputs?
>
>
> PG should let me, as the administrator of the database, decide whether
> I mind my DB to have an option to:
>
> 1. Either allow the "invalid" input
>
> 2. Or to continue storing the other information in the table even if
> an exception was thrown for the utf-8 column (which may be an
> unrequired column, for instance, so I may want it not to block the
> storage of other valid input which is more important)
>
> I am not advocating what others should do. But I know what I need my
> DB to do. If I want it to store data that does not match puritanical
> standards of textual storage, then it should allow me to...
>
> It's just a wishlist item from me, but I realize my voice may be
> inconsequential.

Can't you just create your db as SQL_ASCII and let it store anything
then?  I mean if you create a database in UTF-8 you're asking it to
carefully check the input.  If you set it to SQL_ASCII, you're telling
it to store anything.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] Best practice for: ERROR: invalid byte sequence for encoding "UTF8"

2007-08-15 Thread Ben


On Thu, 16 Aug 2007, Phoenix Kiula wrote:


I am not advocating what others should do. But I know what I need my
DB to do. If I want it to store data that does not match puritanical
standards of textual storage, then it should allow me to...


It does allow that: store it as a BLOB, and then treat it as text in your 
app.


Me personally, if I expect something to be text, I expect it to be valid 
text.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [GENERAL] User-Friendly TimeZone List

2007-08-15 Thread Scott Marlowe

On 8/15/07, Naz Gassiep <[EMAIL PROTECTED]> wrote:
> Hi all,
> I am still, after quite some time, wrangling over the time zone
> system in my app. I have sorted out all the internal handling, however I
> am still uncertain as to what the best way to get the user to select
> their time zone is.
>
> I was thinking of having users just select their timezones from a
> list which was populated with the contents of the query:
>  select * from pg_timezone_names ;
> which would look something like this.
>  http://mrnaz.com/tztest.html
>
> This however is problematic for a number of reasons:
>
> 1. The timezone list there isn't exactly user friendly, there are many
> Etc/* timezones there, as well as others that would be potentially
> confusing for users who are trying to select the timezone they are in.
> 2. If a timezone is removed from the list for whatever reason, then the
> system will be left with users who have selected a timezone that is no
> longer a valid choice in the list.

Do the views in 8.2 pg_timezone_abbrevs and pg_timezone_names help at all?

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] pg_dump on local Windows, pg_restore on Linux?

2007-08-15 Thread Phoenix Kiula

On 16/08/07, Scott Marlowe <[EMAIL PROTECTED]> wrote:
> On 8/15/07, Phoenix Kiula <[EMAIL PROTECTED]> wrote:
> > Couple of questions with porting:
> >
> > 1. I have been playing around with my databases locally on Win XP so
> > as not to hurt our website traffic. Now I would like to move the
> > database to a Linux CentOS server. Can I use pg_dump on Windows and
> > pg_restore it on Linux? If so, any tips on what I should keep in mind
> > (e.g., manual seems to suggest that pg_restore prefers tar gzipped
> > format...but I'm not sure if Windows can generate this?)
> >
> > 2. I would like my database to be UTF-8. I issue the command
> >
> > CREATE DATABASE mydb OWNER me ENCODING 'utf8';
> >
> > Should I add anything else to it, such as collation? I did not find
> > any option for that in here:
> > http://www.postgresql.org/docs/8.2/static/sql-createdatabase.html
> >
> > 3. Also, just to confirm, if I have utf-8 database, then all tables in
> > it should be able to take utf-8 data? I would just like these tables
> > to take whatever I send to them. No error checking or encoding
> > checking. Can I disable the painful error that PG keeps throwing if
> > even a single erroneous byte shows up? I'd rather have 'garbage data'
> > than not go through with the query.
>
> Requirements 2 and 3 are exclusive.  Either you want your database to
> be UTF-8, which means that invalid byte sequences should be rejected
> because they AREN'T utf-8, or you want your database to swallow
> whatever you throw at it, in which case, that's not UTF-8.



Thanks. Is there an encoding that is so flexible that it will silently
accept whatever I send to it without throwing an exception?

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] pg_dump on local Windows, pg_restore on Linux?

2007-08-15 Thread Leif B. Kristensen

On Wednesday 15. August 2007, Phoenix Kiula wrote:
>Couple of questions with porting:
>
>1. I have been playing around with my databases locally on Win XP so
>as not to hurt our website traffic. Now I would like to move the
>database to a Linux CentOS server. Can I use pg_dump on Windows and
>pg_restore it on Linux? If so, any tips on what I should keep in mind
>(e.g., manual seems to suggest that pg_restore prefers tar gzipped
>format...but I'm not sure if Windows can generate this?)

This is how I import my output from pg_dump:

createdb --encoding=UNICODE $DB
psql -U postgres -d $DB -f $INFILE > restore.log 2>&1

This way, you just feed psql with an uncompressed dump file. I've never 
used pg_restore.

>2. I would like my database to be UTF-8. I issue the command
>
>CREATE DATABASE mydb OWNER me ENCODING 'utf8';
>
>Should I add anything else to it, such as collation? I did not find
>any option for that in here:
>http://www.postgresql.org/docs/8.2/static/sql-createdatabase.html

At least on a *nix system, collation is based on the value of the LC_ALL 
environment variable at dbinit time. There's nothing you can do about 
it in a live database. IMO that's a little awkward, and is what finally 
made me change the global from ISO-8859-1 to UTF-8 on my three Gentoo 
Linux machines. 
-- 
Leif Biberg Kristensen | Registered Linux User #338009
http://solumslekt.org/ | Cruising with Gentoo/KDE
My Jazz Jukebox: http://www.last.fm/user/leifbk/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] pqlib in c++: PQconnectStart PQconnectPoll

2007-08-15 Thread madhtr

- Original Message - 
From: "madhtr" <[EMAIL PROTECTED]>

To: "Tom Lane" <[EMAIL PROTECTED]>
Cc: 
Sent: Wednesday, February 14, 2007 22:33
Subject: Re: [GENERAL] pqlib in c++: PQconnectStart PQconnectPoll

Another line of thought, given the reading-between-the-lines conclusion
that you are trying to use PG 7.3 libraries on Windows, is that there
was something broken in the async-connect code back then on that
platform.  If you really are trying to do that, do yourself a favor and
move to 8.0 or later.  Nobody's going to be very interested in fixing
7.3.  (I did try your program with 7.3 on Unix, though, and it seemed
fine except the error message was spelled a bit differently.)

Ty, I'll check that ... :)

Rats ... my source version is 8.2.3, psql returns 8.1.4 for select 
version();

back to the drawing board ... It worked for you, I just have to make it work 
for me ... mebbe I'll go through the source a bit, ty:)

madhtr 

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Best practice for: ERROR: invalid byte sequence for encoding "UTF8"

2007-08-15 Thread Phoenix Kiula

> What, exactly, does that mean?
>
> That PostgreSQL should take things in invalid utf-8 format and just store 
> them?
> Or that PostgreSQL should autoconvert from invalid utf-8 to valid
> utf-8, guessing the proper codes?
>
> Seriously, what do you want pgsql to do with these invalid inputs?


PG should let me, as the administrator of the database, decide whether
I mind my DB to have an option to:

1. Either allow the "invalid" input

2. Or to continue storing the other information in the table even if
an exception was thrown for the utf-8 column (which may be an
unrequired column, for instance, so I may want it not to block the
storage of other valid input which is more important)

I am not advocating what others should do. But I know what I need my
DB to do. If I want it to store data that does not match puritanical
standards of textual storage, then it should allow me to...

It's just a wishlist item from me, but I realize my voice may be
inconsequential.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] Best practice for: ERROR: invalid byte sequence for encoding "UTF8"

2007-08-15 Thread Scott Marlowe

On 8/15/07, Phoenix Kiula <[EMAIL PROTECTED]> wrote:
> On 15/08/07, Ivan Zolotukhin <[EMAIL PROTECTED]> wrote:
> > Hello,
> >
> > Actually I tried smth like $str = @iconv("UTF-8", "UTF-8//IGNORE",
> > $str); when preparing string for SQL query and it worked. There's
> > probably a better way in PHP to achieve this: simply change default
> > values in php.ini for these parameters:
> >
> > mbstring.encoding_translation = On
> > mbstring.substitute_character = none
> >
> > and broken symbols will be automatically stripped off from the input
> > and output.
>
>
> Sadly, they don't always do that, not with Asian scripts.
>
> And I do not completely agree, like the other poster suggested, with
> the concept of GIGO. Sometimes you want the end-user's experience to
> be seamless. For example, in one of our web sites, we allow users to
> submit text through a bookmarklet, where the title of the webpage
> comes in rawurlencoded format. We try to rawurldecode() it on our end
> but most of the times the Asian interpretation is wrong. We have all
> the usual mbstring settings in php.ini. In this scenario, the user did
> not enter any garbage. Our application should have the ability to
> recognize the text. We do what we can with mb_convert...etc, but the
> database just throws an error.
>
> PGSQL really needs to get with the program when it comes to utf-8 input.

What, exactly, does that mean?

That PostgreSQL should take things in invalid utf-8 format and just store them?
Or that PostgreSQL should autoconvert from invalid utf-8 to valid
utf-8, guessing the proper codes?

Seriously, what do you want pgsql to do with these invalid inputs?

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] pg_dump on local Windows, pg_restore on Linux?

2007-08-15 Thread Scott Marlowe

On 8/15/07, Phoenix Kiula <[EMAIL PROTECTED]> wrote:
> Couple of questions with porting:
>
> 1. I have been playing around with my databases locally on Win XP so
> as not to hurt our website traffic. Now I would like to move the
> database to a Linux CentOS server. Can I use pg_dump on Windows and
> pg_restore it on Linux? If so, any tips on what I should keep in mind
> (e.g., manual seems to suggest that pg_restore prefers tar gzipped
> format...but I'm not sure if Windows can generate this?)
>
> 2. I would like my database to be UTF-8. I issue the command
>
> CREATE DATABASE mydb OWNER me ENCODING 'utf8';
>
> Should I add anything else to it, such as collation? I did not find
> any option for that in here:
> http://www.postgresql.org/docs/8.2/static/sql-createdatabase.html
>
> 3. Also, just to confirm, if I have utf-8 database, then all tables in
> it should be able to take utf-8 data? I would just like these tables
> to take whatever I send to them. No error checking or encoding
> checking. Can I disable the painful error that PG keeps throwing if
> even a single erroneous byte shows up? I'd rather have 'garbage data'
> than not go through with the query.

Requirements 2 and 3 are exclusive.  Either you want your database to
be UTF-8, which means that invalid byte sequences should be rejected
because they AREN'T utf-8, or you want your database to swallow
whatever you throw at it, in which case, that's not UTF-8.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Copy command and duplicate items (Support Replace?)

2007-08-15 Thread Decibel!

On Tue, Aug 14, 2007 at 10:50:33AM +0800, Ow Mun Heng wrote:
> Hi,
> 
> Writing a script to pull data from SQL server into a flat-file (or just
> piped in directly to PG using Perl DBI)
> 
> Just wondering if the copy command is able to do a replace if there are
> existing data in the Db already. (This is usually in the case of updates
> to specific rows and there be a timestamp indicating it has been changed
> etc.)
> 
> In MySQL, the mysqlimport util has the --replace function which will
> replace the data if there is any event of a duplicate.
> 
> Does PG support this?

No; you'll need to COPY into a temporary or staging table and then
proceed from there. Alternatively, you could use
http://pgfoundry.org/projects/pgloader/.
-- 
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpyQNuNDvD9l.pgp
Description: PGP signature

Re: [GENERAL] pg_dump on local Windows, pg_restore on Linux?

2007-08-15 Thread Phoenix Kiula

> At least on a *nix system, collation is based on the value of the LC_ALL
> environment variable at dbinit time. There's nothing you can do about
> it in a live database. IMO that's a little awkward, and is what finally
> made me change the global from ISO-8859-1 to UTF-8 on my three Gentoo
> Linux machines.


This is great info, thanks. Could you let me know how I could change
the global values of "LC_ALL"?  I am on Linux too, just CentOS, but I
suppose it should be the same or similar?

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] Best practice for: ERROR: invalid byte sequence for encoding "UTF8"

2007-08-15 Thread Ben


On Thu, 16 Aug 2007, Phoenix Kiula wrote:


1. Even if it were bytea, would it work with regular SQL operators
such as regexp and LIKE?

2. Would tsearch2 work with bytea in the future as long as the stuff
in it was text?


As far as I know, regexp, [i]like, tsearch2, etc. all require valid text 
encodings, and so only work with text, not blobs. But I might be wrong. 
I'm sure the manual would be enlightening. :)


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] pg_dump on local Windows, pg_restore on Linux?

2007-08-15 Thread Tom Lane

"Phoenix Kiula" <[EMAIL PROTECTED]> writes:
> Thanks. Is there an encoding that is so flexible that it will silently
> accept whatever I send to it without throwing an exception?

SQL_ASCII does that.  Whether it's a good idea to use it is
questionable.  One thing to think about is that you will be unable to
provide translation to and from different client encodings --- the
database will always just regurgitate the bytes it was given, since it
doesn't really know what encoding they are in.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Interpreting statistics collector output

2007-08-15 Thread Decibel!

On Wed, Aug 15, 2007 at 01:26:02PM -0400, Steve Madsen wrote:
> On Aug 15, 2007, at 11:52 AM, Decibel! wrote:
> >I can't really think of a case where a seqscan wouldn't return all the
> >rows in the table... that's what it's meant to do.
> 
> Isn't a sequential scan the only option if an appropriate index does  
> not exist?  E.g., for a query with a WHERE clause, but none of the  
> referenced columns are indexed.

Yes, and that seqscan is going to read the entire table and then apply a
filter.

> Put another way: consider a large table with no indexes.   
> seq_tup_read / seq_scan is the average number of rows returned per  
> scan, and if this is a small percentage of the row count, then it  
> seems reasonable to say an index should help query performance.   
> (With the understanding that it's fewer common rather than many  
> unique queries.)
 
decibel=# select * into i from generate_series(1,9) i;
SELECT
decibel=# select seq_scan, seq_tup_read  from pg_stat_all_tables where 
relname='i';
 seq_scan | seq_tup_read 
--+--
0 |0
(1 row)

decibel=# select * from i where i=1;
 i 
---
 1
(1 row)

decibel=# select seq_scan, seq_tup_read  from pg_stat_all_tables where 
relname='i';
 seq_scan | seq_tup_read 
--+--
1 |9
(1 row)
-- 
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpP8XODjoEZA.pgp
Description: PGP signature

Re: [GENERAL] Best practice for: ERROR: invalid byte sequence for encoding "UTF8"

2007-08-15 Thread Phoenix Kiula

On 16/08/07, Phoenix Kiula <[EMAIL PROTECTED]> wrote:
> On 16/08/07, Ben <[EMAIL PROTECTED]> wrote:
> > On Thu, 16 Aug 2007, Phoenix Kiula wrote:
> >
> > > I am not advocating what others should do. But I know what I need my
> > > DB to do. If I want it to store data that does not match puritanical
> > > standards of textual storage, then it should allow me to...
> >
> > It does allow that: store it as a BLOB, and then treat it as text in your
> > app.
> >
> > Me personally, if I expect something to be text, I expect it to be valid
> > text.
> >
>
>
> This is very useful, thanks. This would be "bytea"? Quick questions:
>
> 1. Even if it were bytea, would it work with regular SQL operators
> such as regexp and LIKE?
>
> 2. Would tsearch2 work with bytea in the future as long as the stuff
> in it was text?


I get the following error when I try to change a column that was
hitherto "varchar(500)" to bytea--

SQL error:
ERROR:  column "title" cannot be cast to type "bytea"

In statement:
ALTER TABLE "links" ALTER COLUMN "title" TYPE bytea


Any thoughts on how I can overcome this limitation?

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] Best practice for: ERROR: invalid byte sequence for encoding "UTF8"

2007-08-15 Thread Phoenix Kiula

On 16/08/07, Ben <[EMAIL PROTECTED]> wrote:
> On Thu, 16 Aug 2007, Phoenix Kiula wrote:
>
> > I am not advocating what others should do. But I know what I need my
> > DB to do. If I want it to store data that does not match puritanical
> > standards of textual storage, then it should allow me to...
>
> It does allow that: store it as a BLOB, and then treat it as text in your
> app.
>
> Me personally, if I expect something to be text, I expect it to be valid
> text.
>


This is very useful, thanks. This would be "bytea"? Quick questions:

1. Even if it were bytea, would it work with regular SQL operators
such as regexp and LIKE?

2. Would tsearch2 work with bytea in the future as long as the stuff
in it was text?

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] Yet Another COUNT(*)...WHERE...question

2007-08-15 Thread Andrew Armstrong

I don't know how PSQL does it, but MySQL has an SQL_CALC_FOUND_ROWS
extension which allows the query to also return how many rows exist without
the LIMIT clause. Perhaps there is similar for PSQL (check LIMIT docs?)

- Andrew

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Scott Marlowe
Sent: Thursday, 16 August 2007 1:24 AM
To: Phoenix Kiula
Cc: Gregory Stark; Postgres General
Subject: Re: [GENERAL] Yet Another COUNT(*)...WHERE...question

On 8/15/07, Phoenix Kiula <[EMAIL PROTECTED]> wrote:
> On 15/08/07, Scott Marlowe <[EMAIL PROTECTED]> wrote:
> > On 8/15/07, Phoenix Kiula <[EMAIL PROTECTED]> wrote:
> > > On 15/08/07, Gregory Stark <[EMAIL PROTECTED]> wrote:
> > > > "Phoenix Kiula" <[EMAIL PROTECTED]> writes:
> > > >
> > > > > I'm grappling with a lot of reporting code for our app that relies
on
> > > > > queries such as:
> > > > >
> > > > >  SELECT COUNT(*) FROM TABLE WHERE (conditions)...
> > > > >...
> > > > > The number of such possibilities for multiple WHERE conditions is
> > > > > infinite...
> > > >
> > > > Depends on the "conditions" bit. You can't solve all of the infinite
> > > > possibilities -- well you can, just run the query above -- but if
you want > to do better it's all about understanding your data.
> > >
> > >
> > > I am not sure what the advice here is. The WHERE condition comes from
> > > the indices. So if the query was not "COUNT(*)" but just a couple of
> > > columns, the query executes in less than a second. Just that COUNT(*)
> > > becomes horribly slow.
> >
> > Sorry, but I don't believe you.  if you're doing a count(*) on the
> > same dataset that returns in < 1 second, then the count(*) with the
> > same where clause will run in < 1 second.  I haven't seen pgsql do
> > anything else.
>
>
>
> Sorry I was not clear. Imagine an Amazon.com search results page. It
> has about 15 results on Page 1, then it shows "Page 1 of 190".
>
> To show each page, the query probably has a "LIMIT 15 OFFSET 0" for
> Page 1. However, to calculate the total number of pages, they probably
> do a separate counts query, because doing a "select *" and then
> counting the number of rows returned would be even more inefficient
> than a count(*).

When I go to amazon.com I only ever get three pages of results.  ever.
 Because they know that returning 190 pages is not that useful, as
hardly anyone is going to wander through that many pages.

Google, you'll notice says "Results 1 - 10 of about 5,610,000 for
blacksmith"  i.e. it's guesstimating as well.  no reason for google to
look at every single row for blacksmith to know that there's about 5.6
million.

> So, in reporting, two queries are fairly common I would think, unless
> I am missing something?

Yes, optimization. :)  You don't need an exact count to tell someone
that there's more data and they can go to it.  Note that if you are
planning on doing things google sized, you'll need to do what they
did, invent your own specialized database.

For us mere mortals, it's quite likely that you can do something like:

explain select * from table where field like 'abc%';

and then parse the explain output for an approximate number.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] memory optimization

2007-08-15 Thread Decibel!

On Wed, Aug 15, 2007 at 10:21:31AM +0300, Sabin Coanda wrote:
> Hi there,
> 
> I have a procedure which uses temporary objects (table and sequence). I 
> tried to optimize it, using common variables (array and long varchar) 
> instead. I didn't found any difference in performance, but I'd like to 
> choose the best option from other points of view. One of them is the memory.
> 
> So, what is better from the postgres memory point of view: to use temporary 
> objects, or to use common variables ?

A temp table might take *slightly* more room than variables...

> Can you suggest me other point of views to be taken into consideration in my 
> case ?

Code maintenance. I can't think of anyway to replace a temp table with
variables that isn't a complete nightmare.
-- 
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpHaqvgATEv0.pgp
Description: PGP signature

[GENERAL] SELECT ... FOR UPDATE performance costs? alternatives?

2007-08-15 Thread D. Dante Lorenso


All,

I have a stored procedure that I use to manage a queue.  I want to pop 
an item off the queue to ensure that only one server is processing the 
queue item, so inside PGSQL, use SELECT ... FOR UPDATE to lock the row. 
 Here's how I pop the queue item:


--
CREATE OR REPLACE FUNCTION "public"."reserve_next_tcqueue" (in_hostname 
varchar, in_status char, in_new_status char) RETURNS bigint AS

$body$
DECLARE
my_reserved_id BIGINT;
BEGIN
 /* find and lock a row with the indicated status */
 SELECT tcq_id
 INTO my_reserved_id
 FROM queue q
 WHERE q.status = in_status
 ORDER BY tcq_id ASC
 LIMIT 1
 FOR UPDATE;

/* we didn't find anything matching */
IF NOT FOUND THEN
RETURN 0;
END IF;

/* change the status to the new status */
UPDATE queue SET
  status = in_new_status,
  ts_start = NOW(),
  ts_end = NULL,
  hostname = COALESCE(in_hostname, hostname)
WHERE tcq_id = my_reserved_id;

/* send back our reserved ID */
RETURN my_reserved_id;
END;
$body$
LANGUAGE 'plpgsql' VOLATILE RETURNS NULL ON NULL INPUT SECURITY INVOKER;
--

When my server is under severe load, however, this function begins to 
take a long time to execute and I begin to suspect that the FOR UPDATE 
lock might be locking the whole table and not just the row.


How do I know if this is the case, how can I optimize this procedure, 
and how should I be doing this differently?  I'm guessing stacks and 
queues would be common patterns handled in the PostgreSQL community.


Thoughts?

-- Dante

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [GENERAL] MVCC cons

2007-08-15 Thread Jeff Davis

On Wed, 2007-08-15 at 07:06 +0530, Merlin Moncure wrote:
> You were half right.  Inserts in PostgreSQL perform similar to other
> databases (or at least, use similar mechanisms).  It's the updates
> that suffer, because this translates to delete + insert essentially.
> Databases that use simple locking strategies can simply update the
> record in place.

I think in some databases that use locking, an INSERT can actually block
a SELECT, and vice-versa. So wouldn't that mean PostgreSQL MVCC is
better for INSERT performance?

Regards,
Jeff Davis


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] Trigger Procedure Error: NEW used in query that is not in a rule

2007-08-15 Thread Decibel!

On Sat, Aug 11, 2007 at 02:43:30AM -0500, Javier Fonseca V. wrote:
> Hello.
> 
> I'm doing a Trigger Procedure in pl/pgSQL.  It makes some kind of auditing.
> 
> I think that it's working alright except for the next line:
> 
> EXECUTE 'INSERT INTO ' || quote_ident(somedynamictablename) || ' SELECT
> new.*';
> 
> PostgreSQL keeps telling me: "ERROR: NEW used in query that is not in a
> rule".  I think that this NEW problem is because of the scope of the EXECUTE
> statement (outside the scope of the trigger), so it doesn't recognize the
> NEW record.

Sort-of... the issue is that EXECUTE hands the string off to the
backend, which has no clue what "NEW" is; only the trigger procedure
understands NEW.

> Maybe I could fix it concatenating column names and the 'new' values but I
> want to do my trigger as flexible as possible (I have several tables to
> audit).
> 
> Somebody has any suggestion?

You could theoretically make the trigger entirely dynamic by having it
pull the needed info out of the system catalogs... but I wouldn't want
to see the performance of that... If you care about performance *at
all*, I'd suggest writing some code that will generate the triggers for
a given table for you. I don't expect it'd be much harder than writing a
completely dynamic trigger.
-- 
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpQrMR6ZYvad.pgp
Description: PGP signature

Re: [GENERAL] Deadlocks caused by a foreign key constraint

2007-08-15 Thread Dmitry Koterov

No.

I have tested all cases, the code I quoted is complete and minimal. All
operations are non-blocking (count incrementation is non-blocking, insertion
with a foreign key is non-blocking too), but it still generates a deadlock
time to time. Deletion of the foreign key constraint completely solves the
problem.

I am using the latest version of Postgres.

You said "I'm pretty sure that recent versions check to see if the key
actually changed", but how could it be if Postgres uses a row-level locking,
not field-level locking? Seems it cannot check what fields are changed, it
locks the whole row.


On 8/15/07, Decibel! <[EMAIL PROTECTED]> wrote:
>
> On Fri, Aug 10, 2007 at 09:38:36PM +0400, Dmitry Koterov wrote:
> > Hello.
> >
> > I have a number of deadlock because of the foreign key constraint:
> >
> > Assume we have 2 tables: A and B. Table A has a field fk referenced to
> > B.idas a foreign key constraint.
> >
> >
> > -- transaction #1
> > BEGIN;
> > ...
> > INSERT INTO A(x, y, fk) VALUES (1, 2, 666);
> > ...
> > END;
> >
> >
> > -- transaction #2
> > BEGIN;
> > UPDATE B SET z = z + 1 WHERE id = 666;
> > ...
> > UPDATE B SET z = z + 1 WHERE id = 666;
> > ...
> > UPDATE B SET z = z + 1 WHERE id = 666;
> > END;
> >
> >
> > You see, table A is only inserted, and table B is only updated their
> field z
> > on its single row.
> > If we execute a lot of these transactions concurrently using multiple
> > parellel threads, sometimes we have a deadlock:
> >
> > DETAIL:  Process 6867 waits for ShareLock on transaction 1259392;
> blocked by
> > process 30444.
> > Process 30444 waits for ShareLock on transaction 1259387; blocked by
> > process 6867.
> > CONTEXT:  SQL statement "SELECT 1 FROM ONLY "public"."B" x WHERE "id" =
> $1
> > FOR SHARE OF x"
> >
> > If I delete the foreign key constraint, all begins to work fine.
> > Seems Postgres thinks that "UPDATE B SET z = z + 1 WHERE id = 666" query
> may
> > modify B.id field and touch A.fk, so it holds the shareable lock on it.
>
> What version are you running? I'm pretty sure that recent versions check
> to see if the key actually changed.
>
> > The question is: is it possible to KEEP this foreign key constraint, but
> > avoid deadlocks?
>
> I'm pretty sure that the deadlock is actually being caused by your
> application code, likely because you're doing multiple updates within
> one transaction, but not being careful about the id order you do them
> in.
> --
> Decibel!, aka Jim Nasby[EMAIL PROTECTED]
> EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)
>
>

Re: [GENERAL] pg_dump on local Windows, pg_restore on Linux?

2007-08-15 Thread Leif B. Kristensen

On Wednesday 15. August 2007, Phoenix Kiula wrote:
>> At least on a *nix system, collation is based on the value of the
>> LC_ALL environment variable at dbinit time. There's nothing you can
>> do about it in a live database. IMO that's a little awkward, and is
>> what finally made me change the global from ISO-8859-1 to UTF-8 on
>> my three Gentoo Linux machines.
>
>This is great info, thanks. Could you let me know how I could change
>the global values of "LC_ALL"?  I am on Linux too, just CentOS, but I
>suppose it should be the same or similar?

I don't have the foggiest idea how it's done in CentOS. In Gentoo, you 
just edit the contents of /etc/env.d/02locale:

balapapa ~ # cat /etc/env.d/02locale
LC_ALL="nb_NO.UTF-8"
LANG=""
LC_CTYPE="nb_NO.UTF.8"
LC_NUMERIC="nb_NO.UTF.8"
LC_TIME="nb_NO.UTF.8"
LC_COLLATE="nb_NO.UTF.8"
LC_MONETARY="nb_NO.UTF.8"
LC_PAPER="nb_NO.UTF.8"
LC_NAME="nb_NO.UTF.8"
LC_ADDRESS="nb_NO.UTF.8"
LC_TELEPHONE="nb_NO.UTF.8"
LC_MEASUREMENT="nb_NO.UTF.8"
LC_IDENTIFICATION="nb_NO.UTF.8"

To update the environment settings globally, the Gentoo method is to 
issue the command

"env-update && source /etc/profile"

from root. But you should really Read The Fine Manual about this. If you 
for instance have filenames with non-ASCII characters in them, prepare 
yourself for some interesting challenges.
-- 
Leif Biberg Kristensen | Registered Linux User #338009
http://solumslekt.org/ | Cruising with Gentoo/KDE
My Jazz Jukebox: http://www.last.fm/user/leifbk/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] Best practice for: ERROR: invalid byte sequence for encoding "UTF8"

2007-08-15 Thread Martijn van Oosterhout

On Thu, Aug 16, 2007 at 01:56:52AM +0800, Phoenix Kiula wrote:
> This is very useful, thanks. This would be "bytea"? Quick questions:
> 
> 1. Even if it were bytea, would it work with regular SQL operators
> such as regexp and LIKE?

bytea is specifically designed for binary data, as such it has all
sorts of quoting rules for dealing with embedded nulls and such. It's
not quite a drop in replacement.

The earlier suggestion of SQL_ASCII is probably closer to what you
want. It does to regexes and LIKE, however postgres will treat all your
data as bytes. If you want you regexes to match Unicode character
classes that's too bad; you can't have it both ways. Sorting it goes in
byte order, you don't have a lot of choice there either.

> 2. Would tsearch2 work with bytea in the future as long as the stuff
> in it was text?

Doubt it, SQL_ASCII would work though.

Have a nice day,
-- 
Martijn van Oosterhout   <[EMAIL PROTECTED]>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.

signature.asc
Description: Digital signature

Re: [GENERAL] MVCC cons

2007-08-15 Thread Scott Marlowe

On 8/15/07, Jeff Davis <[EMAIL PROTECTED]> wrote:
> On Wed, 2007-08-15 at 07:06 +0530, Merlin Moncure wrote:
> > You were half right.  Inserts in PostgreSQL perform similar to other
> > databases (or at least, use similar mechanisms).  It's the updates
> > that suffer, because this translates to delete + insert essentially.
> > Databases that use simple locking strategies can simply update the
> > record in place.
>
> I think in some databases that use locking, an INSERT can actually block
> a SELECT, and vice-versa. So wouldn't that mean PostgreSQL MVCC is
> better for INSERT performance?

For certain values of better, yes.

For inserting 10,000,000 rows, it's probably not as fast as some other
databases.  But, you can insert those 10,000,000 rows while 100 users
run select statements and none of them will block.  So, even if it
takes twice as long as a table locking db, that's still probably ok,
because the readers can keep right on reading.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Interpreting statistics collector output

2007-08-15 Thread Gregory Stark

"Decibel!" <[EMAIL PROTECTED]> writes:

> On Wed, Aug 15, 2007 at 01:26:02PM -0400, Steve Madsen wrote:
>> On Aug 15, 2007, at 11:52 AM, Decibel! wrote:
>> >I can't really think of a case where a seqscan wouldn't return all the
>> >rows in the table... that's what it's meant to do.

LIMIT

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] SELECT ... FOR UPDATE performance costs? alternatives?

2007-08-15 Thread Erik Jones


On Aug 15, 2007, at 1:09 PM, D. Dante Lorenso wrote:


All,

I have a stored procedure that I use to manage a queue.  I want to  
pop an item off the queue to ensure that only one server is  
processing the queue item, so inside PGSQL, use SELECT ... FOR  
UPDATE to lock the row.  Here's how I pop the queue item:


--
CREATE OR REPLACE FUNCTION  
"public"."reserve_next_tcqueue" (in_hostname varchar, in_status  
char, in_new_status char) RETURNS bigint AS

$body$
DECLARE
my_reserved_id BIGINT;
BEGIN
 /* find and lock a row with the indicated status */
 SELECT tcq_id
 INTO my_reserved_id
 FROM queue q
 WHERE q.status = in_status
 ORDER BY tcq_id ASC
 LIMIT 1
 FOR UPDATE;

/* we didn't find anything matching */
IF NOT FOUND THEN
RETURN 0;
END IF;

/* change the status to the new status */
UPDATE queue SET
  status = in_new_status,
  ts_start = NOW(),
  ts_end = NULL,
  hostname = COALESCE(in_hostname, hostname)
WHERE tcq_id = my_reserved_id;

/* send back our reserved ID */
RETURN my_reserved_id;
END;
$body$
LANGUAGE 'plpgsql' VOLATILE RETURNS NULL ON NULL INPUT SECURITY  
INVOKER;

--

When my server is under severe load, however, this function begins  
to take a long time to execute and I begin to suspect that the FOR  
UPDATE lock might be locking the whole table and not just the row.


How do I know if this is the case, how can I optimize this  
procedure, and how should I be doing this differently?  I'm  
guessing stacks and queues would be common patterns handled in the  
PostgreSQL community.


Thoughts?


SELECT ... FOR UPDATE should only be locking the rows returned by  
your the select statement, in this case the one row.  You can check  
what locks exist on a table (and their type) with the pg_locks system  
view.


Erik Jones

Software Developer | Emma®
[EMAIL PROTECTED]
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] SELECT ... FOR UPDATE performance costs? alternatives?

2007-08-15 Thread btober


Erik Jones wrote:

On Aug 15, 2007, at 1:09 PM, D. Dante Lorenso wrote:

...to ensure that only one server is processing the queue item, so 
inside PGSQL, use SELECT ... FOR UPDATE to lock the row... 

When my server is under severe load, however, this function begins to 
take a long time to execute and I begin to suspect that the FOR 
UPDATE lock might be locking the whole table and not just the row.  
How do I know if this is the case, how can I optimize this procedure, 
and how should I be doing this differently?  ...


Thoughts?


SELECT ... FOR UPDATE should only be locking the rows returned by your 
the select statement, in this case the one row.  You can check what 
locks exist on a table (and their type) with the pg_locks system view.




Is that correct? Documentation section 12.3.1. Table-Level Locks states 
'The list below shows the available lock modes ...Remember that all of 
these lock modes are table-level locks, even if the name contains the 
word "row"'.


I wonder why bother with the SELECT statement at all. Why not just go 
straight to the UPDATE statement with something like


   UPDATE queue SET
 status = in_new_status,
 ts_start = NOW(),
 ts_end = NULL,
 hostname = COALESCE(in_hostname, hostname)
   WHERE tcq_id = (SELECT tcq_id  FROM queue q WHERE q.status = 
in_status ORDER BY tcq_id ASC LIMIT 1);


He may need to trap an exception for the "not found" case, but what's 
the big deal with that?


UPDATE statements acquire a ROW EXCLUSIVE on the table, which conflicts, 
among other things, with ROW EXCLUSIVE, so it will block other UPDATE 
statements initiated by other transactions.




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] User-Friendly TimeZone List

2007-08-15 Thread A.M.



On Aug 15, 2007, at 13:27 , Naz Gassiep wrote:


Hi all,
   I am still, after quite some time, wrangling over the time zone  
system in my app. I have sorted out all the internal handling,  
however I am still uncertain as to what the best way to get the  
user to select their time zone is.


   I was thinking of having users just select their timezones from  
a list which was populated with the contents of the query:

select * from pg_timezone_names ;
which would look something like this.
http://mrnaz.com/tztest.html

This however is problematic for a number of reasons:

1. The timezone list there isn't exactly user friendly, there are  
many Etc/* timezones there, as well as others that would be  
potentially confusing for users who are trying to select the  
timezone they are in.
2. If a timezone is removed from the list for whatever reason, then  
the system will be left with users who have selected a timezone  
that is no longer a valid choice in the list.


   The ideal situation would be to maintain my own persistent list  
of timezones (the way Microsoft maintains their own user-friendly  
list that they use for Windows) that maps to the timezones embedded  
in the PG back end, but I haven't the resources to pull this off.  
Has anyone else worked on a mechanism to allow users to supply the  
timezone they are in, and if so, do you have any comments on how  
best to handle the apparently mutually exclusive problems of  
simplicity for users and accuracy in the back end?


   At the simple end of the I can't just have users only select  
from a list going from GMT-12 to GMT+12. At the complex end of the  
scale I'd just give them the output from the list and any that are  
deprecated will result in the user reverting to UTC until they  
reset a new timezone.


Don't forget that not all timezones are offset by integer hours from  
GMT. I, too, am battling timezone handling- I have been using various  
Perl modules to fill in the missing components such as better/multi- 
lingual timestamp parsing.


Cheers,
M

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] SELECT ... FOR UPDATE performance costs? alternatives?

2007-08-15 Thread Erik Jones



On Aug 15, 2007, at 2:39 PM, [EMAIL PROTECTED] wrote:


Erik Jones wrote:

On Aug 15, 2007, at 1:09 PM, D. Dante Lorenso wrote:

...to ensure that only one server is processing the queue item,  
so inside PGSQL, use SELECT ... FOR UPDATE to lock the row...
When my server is under severe load, however, this function  
begins to take a long time to execute and I begin to suspect that  
the FOR UPDATE lock might be locking the whole table and not just  
the row.  How do I know if this is the case, how can I optimize  
this procedure, and how should I be doing this differently?  ...


Thoughts?


SELECT ... FOR UPDATE should only be locking the rows returned by  
your the select statement, in this case the one row.  You can  
check what locks exist on a table (and their type) with the  
pg_locks system view.




Is that correct? Documentation section 12.3.1. Table-Level Locks  
states 'The list below shows the available lock modes ...Remember  
that all of these lock modes are table-level locks, even if the  
name contains the word "row"'.


You will notice that SELECT ... FOR UPDATE is not in that list.  It's  
covered in the next section on row level locks.


I wonder why bother with the SELECT statement at all. Why not just  
go straight to the UPDATE statement with something like


   UPDATE queue SET
 status = in_new_status,
 ts_start = NOW(),
 ts_end = NULL,
 hostname = COALESCE(in_hostname, hostname)
   WHERE tcq_id = (SELECT tcq_id  FROM queue q WHERE q.status =  
in_status ORDER BY tcq_id ASC LIMIT 1);


He may need to trap an exception for the "not found" case, but  
what's the big deal with that?


UPDATE statements acquire a ROW EXCLUSIVE on the table, which  
conflicts, among other things, with ROW EXCLUSIVE, so it will block  
other UPDATE statements initiated by other transactions.


That won't work because the update won't lock the row until the  
select returns.  So, if two process execute that at the same time  
they will both execute the subquery and return the same result, the  
first will update it and the second will then (redundantly) update it.


Erik Jones

Software Developer | Emma®
[EMAIL PROTECTED]
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] pg_dump on local Windows, pg_restore on Linux?

2007-08-15 Thread James B. Byrne

Date: Wed, 15 Aug 2007 20:31:45 +0200
From: "Leif B. Kristensen" <[EMAIL PROTECTED]>
To: pgsql-general@postgresql.org
Subject: Re: pg_dump on local Windows, pg_restore on Linux?
Message-ID: <[EMAIL PROTECTED]>

On Wednesday 15. August 2007, Phoenix Kiula wrote:
>
>This is great info, thanks. Could you let me know how I could change
>the global values of "LC_ALL"?  I am on Linux too, just CentOS, but I
>suppose it should be the same or similar?

See: man local and man 1p locale and man 1p localedef and /etc/sysconfig/i18n

Do NOT manually set LC_ALL on a CentOS system unless you know exactly what
you are doing and what programs will be impacted.  If set then LC_ALL
overrides all of the individual LC_ settings.

Regards,

-- 
***  E-Mail is NOT a SECURE channel  ***
James B. Byrnemailto:[EMAIL PROTECTED]
Harte & Lyne Limited  http://www.harte-lyne.ca
9 Brockley Drive  vox: +1 905 561 1241
Hamilton, Ontario fax: +1 905 561 0757
Canada  L8E 3C3

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] language interface in postgresql

2007-08-15 Thread Josh Tolley

On 8/15/07, Magnus Hagander <[EMAIL PROTECTED]> wrote:
> Trevor Talbot wrote:
> > On 8/14/07, Jasbinder Singh Bali <[EMAIL PROTECTED]> wrote:
> >
> >> Let me fine tune my question here. What I mean to say is the way we can
> >> write stored procedures in C, perl etc in Postgres specifying the language
> >> parameter at the end of stored procedure, compared to that, in SQL Server
> >> 2000 I've seen SP writing in pure SQL only.
> >> Can you write Stored Procedures in SQL Server 2000 or Oracle in different
> >> programing languages as well ?
> >
> > AFAIK SQL Server 2000 only has a C interface as the other option; CLR
> > hosting was added in SQL Server 2005.  Because the CLR is a virtual
> > machine that runs compiled bytecode, and compilers for all of the
> > available languages are not necessarily available at runtime, it
> > doesn't make sense to specify such code in source form.  The process
> > is more like creating a function in C in PostgreSQL (compile and load
> > a shared library).  Details here, if you're curious:
> > http://msdn2.microsoft.com/en-us/library/ms345136.aspx
> >
> > I don't know what Oracle supports.
>
> I believe Oracle support Java in the same way MSSQL supports .net, give
> or take.

I don't know specifics of what exactly you can do with it nor how
exactly you go about it, but I know at least Oracle 10g supports
extension in Java. We had grand plans to improve some sort of
processing by writing a comparison function in Java for one the Oracle
databases I've been cursed to associate with. I don't know of any
other languages supported by Oracle for stored procedures and the
like, though it does have an ECPG-like system for C, C++, COBOL,
FORTRAN, and PL/1 of all things.

- Josh

- Josh

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] Writing most code in Stored Procedures

2007-08-15 Thread Josh Tolley

On 8/15/07, Rohit <[EMAIL PROTECTED]> wrote:
> I have few queries regarding the use of Stored Procedures, Functions
> and Triggers in an RDBMS.
>
> (1) When to use Stored Procedure? Writing an INSERT query in a Stored
> Procedure is better or firing it from the application level?
>
> (2) Can a Trigger call a Stored Procedure?
>
> (3) What type of code must reside in application and what type in
> RDBMS?
>
> (4) Is it faster to work at application level or at the database level?

One of the major advantages of stored procedures over application code
is that stored procedures can get to the data much more quickly than
an application can, in general. An application needs to talk to
PostgreSQL through some sort of driver (JDBC, libpq, etc.) and
typically data need to traverse a network to get from the database
machine to the application server. Stored procedures, on the other
hand, don't have the overhead of either the network or the driver.
However, stored procedures can be more difficult to debug, profile,
etc., so they might not be the best for really complex logic. They
tend to be really useful if you're doing something that requires lots
and lots of queries to the database, and don't need anything else
(data from other sources, user interaction, etc.), because that takes
greatest advantage of their quick connection to the data. Richard
Huxton's point that stored procedures are typically best for data
integrity types of functions, whereas business logic should often be
in application code is an excellent rule of thumb.

-Josh

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Trigger Procedure Error: NEW used in query that is not in a rule

2007-08-15 Thread Javier Fonseca V.

The first time that I read it, the work-around sounded a little odd
(extreme!) to me... but actually it would really work!.

Thanks for the tip :)

*Javier*


On 8/15/07, Decibel! <[EMAIL PROTECTED]> wrote:
>
> On Sat, Aug 11, 2007 at 02:43:30AM -0500, Javier Fonseca V. wrote:
> > Hello.
> >
> > I'm doing a Trigger Procedure in pl/pgSQL.  It makes some kind of
> auditing.
> >
> > I think that it's working alright except for the next line:
> >
> > EXECUTE 'INSERT INTO ' || quote_ident(somedynamictablename) || ' SELECT
> > new.*';
> >
> > PostgreSQL keeps telling me: "ERROR: NEW used in query that is not in a
> > rule".  I think that this NEW problem is because of the scope of the
> EXECUTE
> > statement (outside the scope of the trigger), so it doesn't recognize
> the
> > NEW record.
>
> Sort-of... the issue is that EXECUTE hands the string off to the
> backend, which has no clue what "NEW" is; only the trigger procedure
> understands NEW.
>
> > Maybe I could fix it concatenating column names and the 'new' values but
> I
> > want to do my trigger as flexible as possible (I have several tables to
> > audit).
> >
> > Somebody has any suggestion?
>
> You could theoretically make the trigger entirely dynamic by having it
> pull the needed info out of the system catalogs... but I wouldn't want
> to see the performance of that... If you care about performance *at
> all*, I'd suggest writing some code that will generate the triggers for
> a given table for you. I don't expect it'd be much harder than writing a
> completely dynamic trigger.
> --
> Decibel!, aka Jim Nasby[EMAIL PROTECTED]
> EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)
>
>

Re: [GENERAL] Writing most code in Stored Procedures

2007-08-15 Thread Steve Manes


On 8/15/07, Rohit <[EMAIL PROTECTED]> wrote:

I have few queries regarding the use of Stored Procedures, Functions
and Triggers in an RDBMS.

(1) When to use Stored Procedure? Writing an INSERT query in a Stored
Procedure is better or firing it from the application level?

(2) Can a Trigger call a Stored Procedure?

(3) What type of code must reside in application and what type in
RDBMS?

(4) Is it faster to work at application level or at the database level?


I'm fairly hardcore about keeping as much business logic as I can in the 
database.  In fact, I only do SELECTs from the application, and usually 
via Views.  All inserts, updates and deletes are via procs.  I'm a 
proponent of separating application code from presentation and 
application code from database code.  And HTML from layout style, for 
that matter.


In addition to the other reasons you've gotten:

It lets me blackbox commonly used functions, such as a function to 
insert a normalized demographic record for a customer, an organization, 
a user, a guarantor, a physician, etc.


It isolates database logic so it can be debugged separately from the 
application.


It reduces the application's vulnerability to SQL injection, especially 
if another developer (never me ) forgets to massage potentially 
tainted user input.


Another is because I typically do my web application programming in PHP5 
but the offline scripts in Perl.  Both can call the same stored 
procedures so I don't have multiple copies of database code to maintain 
and debug.


Another is because I want transactions to start and end in the database, 
not in external application code which might crash before a COMMIT.


Another is because I'm a freelancer and generally hand off my 
applications to the client's tech department, which is often a junior 
level grasshopper who knows just enough SQL to be dangerous. Using 
stored procedures raises the bar on what they need to know about RDMBSes 
 before they start hacking working code.


And, yes, it's faster.  Particularly if business logic decisions have to 
be made in context with a transaction.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

1 2 >

1 - 100 of 119 matches

Mail list logo