[BUGS] Re: BUG #5480: Autovacuum interferes with operations (e.g. truncate) on very large databases

2010-05-29 Thread Greg Stark
On Fri, May 28, 2010 at 2:02 PM, Sakari Maaranen  wrote:
> The documentation says that the best way to use autovacuum is to let the
> database vacuum often. However when the tables are very large, this can take
> a very long time.
>

In addition to Tom's comments, as of 8.4 this is no longer true as
well. The running time of VACUUM should be related to the amount of
the table which has been dirtied, not the total size of the table.

It's not perfect (yet) though. If you have large indexes they have to
be scanned as well, and periodically a vacuum freeze has to happen
which does have to scan the whole table but that's at a much longer
interval than the frequent vacuums.

-- 
greg

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #5477: CREATE DOMAIN NOT NULL constraints not always enforced for INSERT with subquery

2010-05-29 Thread Kevin Grittner
Tom Lane  wrote:
 
< NOT NULL constraints at the domain level suck. Don't use 'em.
 
+1
 
As someone who uses domains very heavily, I can attest that the
semantics of that are very weak.  Whether a domain is nullable
depends almost entirely on the context of its use, which you can't
(and shouldn't try to) anticipate on its declaration.
 
-Kevin

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #5477: CREATE DOMAIN NOT NULL constraints not always enforced for INSERT with subquery

2010-05-29 Thread Bruce Momjian

Applied.

---

Bruce Momjian wrote:
> Tom Lane wrote:
> > "Matt Nourse"  writes:
> > > CREATE DOMAIN test_id_domain INT NOT NULL; 
> > > CREATE TABLE test_state(id test_id_domain PRIMARY KEY, display_value
> > > varchar(20) NOT NULL);
> > > CREATE TABLE test_city(state_id test_id_domain REFERENCES test_state(id));
> > 
> > > This produces an error as expected:
> > 
> > > INSERT INTO test_city(state_id) VALUES (NULL);
> > 
> > > This successfully inserts a NULL value into the state_id field:
> > 
> > > INSERT INTO test_city(state_id) VALUES ((SELECT id FROM test_state WHERE
> > > display_value = 'Nonexistent state'));
> > 
> > There are any number of ways you can get a similar result, for example
> > a LEFT JOIN.  To my mind, this demonstrates why not-null constraints
> > associated with datatypes are a fundamentally flawed concept.  If the
> > SELECT or LEFT JOIN can produce a null value, as it clearly can, then
> > it's nonsensical to think that the output column should be considered
> > to be of a NOT NULL domain type.  But what else should it be?  If we
> > smash domains to their base types when assigning result types of
> > queries, that will make many people unhappy.
> > 
> > Moral: NOT NULL constraints at the domain level suck.  Don't use 'em.
> 
> I have written up the following documentation patch to document this
> behavior.  It doesn't seem like something we want to fix, so I am not
> making it a TODO item.
> 
> -- 
>   Bruce Momjian  http://momjian.us
>   EnterpriseDB http://enterprisedb.com

[ text/x-diff is unsupported, treating like TEXT/PLAIN ]

> Index: doc/src/sgml/ref/create_domain.sgml
> ===
> RCS file: /cvsroot/pgsql/doc/src/sgml/ref/create_domain.sgml,v
> retrieving revision 1.34
> diff -c -c -r1.34 create_domain.sgml
> *** doc/src/sgml/ref/create_domain.sgml   3 Apr 2010 07:22:58 -   
> 1.34
> --- doc/src/sgml/ref/create_domain.sgml   28 May 2010 17:19:35 -
> ***
> *** 121,127 
> NOT NULL
> 
>  
> ! Values of this domain are not allowed to be null.
>  
> 
>
> --- 121,132 
> NOT NULL
> 
>  
> ! Values of this domain are normally prevented from being null.
> ! It is still possible for a domain with this constraint
> ! to take a null value if it is assigned a matching domain type
> ! that has become null, e.g. via a LEFT OUTER JOIN, or
> ! INSERT INTO tab (domcol) VALUES ((SELECT domcol FROM
> ! tab WHERE false)).
>  
> 
>

> 
> -- 
> Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-bugs

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


[BUGS] BUG #5483: PQescapeStringConn behaviour ??

2010-05-29 Thread Pascal Van Puymbroeck

The following bug has been logged online:

Bug reference:  5483
Logged by:  Pascal Van Puymbroeck
Email address:  p...@zenonpro.be
PostgreSQL version: 8.3.5 and 8.3.8
Operating system:   Centos 5.4
Description:PQescapeStringConn behaviour ??
Details: 

Don't know whether this bug/behaviour is normal or fixed in newer versions. 
Didn't have the time to prepare a new installation just to test this.

PQescapeStringConn is intended to facilitate the insertion of say a
streetname with a single quote like " d'alembert ".  Using this function on
normal en special strings like " tëxt " (an e with two dots on, quite
common in the dutch and french language) is no problem.

However if the " ë " character is at the end of a string like the name of a
country in dutch "Italië"  this PQescapeStringConn function adds two
spaces.  so "België" becomes "België  ".  The error is set to 1 which
indicates we can't trust the outcome.  But I don't see why he has a problem
with that character at the end of the string and not in the middle !

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #5469: regexp_matches() has poor behaviour and more poor documentation

2010-05-29 Thread Bruce Momjian

I have updated the patch, attached, to clarify that this returns text
arrays, and that you can force it to always return one row using
COALESCE() and a '|' pattern (the later suggested by Daniele Varrazzo).

---

Bruce Momjian wrote:
> Daniele Varrazzo wrote:
> > "If there is no match to the pattern, the function returns no rows" is
> > easily overlooked as "it returns null", or some other behaviour that
> > don't change the returned set. The point is, because the function is
> > listed in the string function, you would expect the function to
> > manipulate text, not the dataset. The function as it is is not safe to
> > be used in a construct
> > 
> > SELECT foo, bar, regexp_matches(bar, pattern) FROM table;
> > 
> > unless you really wanted:
> > 
> > SELECT foo, bar, regexp_matches(bar, pattern) FROM table WHERE bar
> > ~ pattern;
> > 
> > otherwise you have to take measures to be able to deal with records in
> > which the pattern is not matched, for example:
> > 
> > SELECT foo, bar, regexp_matches(bar, pattern || '|') FROM table;
> > 
> > the latter still doesn't work when bar is NULL: in this case the
> > record is dropped anyway, so I don't think it can be proposed as
> > general solution.
> > 
> > The characteristics of returning a set of text[] is useful when the
> > user wants all the matches, not only the first one: the behaviour is
> > selected specifying the flag 'g' as third argument.
> > 
> > >From this point of view, I hope it can be stated that in its current
> > form the regexp_matches() has not the most optimal interface. Please
> > accept my apology for the tone being too rude in my previous message.
> 
> I found the description in the documentation quite confusing also.  I
> have created the attached documention patch which is clearer about the
> behavior of regexp_matches().

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com
Index: doc/src/sgml/func.sgml
===
RCS file: /cvsroot/pgsql/doc/src/sgml/func.sgml,v
retrieving revision 1.513
diff -c -c -r1.513 func.sgml
*** doc/src/sgml/func.sgml	7 Apr 2010 06:12:52 -	1.513
--- doc/src/sgml/func.sgml	29 May 2010 20:55:23 -
***
*** 3445,3463 
 
  
  
!  The regexp_matches function returns all of the captured
!  substrings resulting from matching a POSIX regular expression pattern.
!  It has the syntax
   regexp_matches(string, pattern
   , flags ).
!  If there is no match to the pattern, the function returns
!  no rows.  If there is a match, the function returns a text array whose
   n'th element is the substring matching the
   n'th parenthesized subexpression of the pattern
   (not counting non-capturing parentheses; see below for
!  details).  If the pattern does not contain any parenthesized
!  subexpressions, then the result is a single-element text array containing
!  the substring matching the whole pattern.
   The flags parameter is an optional text
   string containing zero or more single-letter flags that change the
   function's behavior.  Flag g causes the function to find
--- 3445,3466 
 
  
  
!  The regexp_matches function returns a text array of
!  all of the captured substrings resulting from matching a POSIX
!  regular expression pattern.  It has the syntax
   regexp_matches(string, pattern
   , flags ).
!  The function can return no rows, one row, or multiple rows (see
!  the g flag below).  If the pattern
!  does not match, the function returns no rows.  If the pattern
!  contains no parenthesized subexpressions, then each row
!  returned is a single-element text array containing the substring
!  matching the whole pattern.  If the pattern contains parenthesized
!  subexpressions, the function returns a text array whose
   n'th element is the substring matching the
   n'th parenthesized subexpression of the pattern
   (not counting non-capturing parentheses; see below for
!  details).
   The flags parameter is an optional text
   string containing zero or more single-letter flags that change the
   function's behavior.  Flag g causes the function to find
***
*** 3490,3495 
--- 3493,3509 
  
 
  
+
+ It is possible to force regexp_matches() to always
+ return one row by using COALESCE() and an empty
+ | pattern;  this is particularly useful in a
+ SELECT target list  when you want all rows returned,
+ even non-matching ones:
+ 
+ SELECT col1, regexp_matches(COALESCE(col2, ''), '(bar)(beque)|') FROM tab;
+ 
+
+ 
  
   The regexp_split_to_table function splits a string using a POSIX
   regular expression pattern as a delimiter.  It has the syntax

-- 
Sent via pgsql-bugs mailin

Re: [BUGS] BUG #5483: PQescapeStringConn behaviour ??

2010-05-29 Thread Tom Lane
"Pascal Van Puymbroeck"  writes:
> However if the " ë " character is at the end of a string like the name of a
> country in dutch "Italië"  this PQescapeStringConn function adds two
> spaces.  so "België" becomes "België  ".  The error is set to 1 which
> indicates we can't trust the outcome.  But I don't see why he has a problem
> with that character at the end of the string and not in the middle !

This isn't tremendously surprising if you're feeding data to the
function that is not in the encoding it expects (which is what the error
flag almost certainly is telling you).  It may be able to detect the
problem in some cases and not others.

regards, tom lane

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


[BUGS] Re: BUG #5480: Autovacuum interferes with operations (e.g. truncate) on very large databases

2010-05-29 Thread Sakari A. Maaranen
2010/5/29 Greg Stark :
> On Fri, May 28, 2010 at 2:02 PM, Sakari Maaranen  wrote:
>> The documentation says that the best way to use autovacuum is to let the
>> database vacuum often. However when the tables are very large, this can take
>> a very long time.
>
> In addition to Tom's comments, as of 8.4 this is no longer true as
> well. The running time of VACUUM should be related to the amount of
> the table which has been dirtied, not the total size of the table.
>
> It's not perfect (yet) though. If you have large indexes they have to
> be scanned as well, and periodically a vacuum freeze has to happen
> which does have to scan the whole table but that's at a much longer
> interval than the frequent vacuums.
>
> --
> greg
>

In my case (at this time) the bulk import actually did dirty basically
the whole table. The failed bulk load was one huge transaction with
tens of millions of rows, and that transaction failed (because the
server process died).

I discussed this with Tom in detail, and he helped me find and
understand the cause of the crash. That was PostgreSQL running out of
memory because of large bulk operations on existing tables with
foreign key constraints. I was doing it in a way that PostgreSQL is
not really designed to support. Couldn't have solved it without Tom's
advice.

Tom Lane wrote:  "...you're doing a data-only restore into a table
with pre-existing foreign key constraints.  Each new row will create a
pending AFTER trigger event (to fire the FK check trigger) and after
enough rows get loaded you'll be out of memory.  Even if you didn't
run out of memory, firing all those triggers will take forever, so
this isn't the way to approach the task."

So, now I'm altering my target tables to drop the foreign key
constraints before update. Will alter again to rebuild them after the
operation. Not very convenient, but doesn't matter as long as it
works.

Later on, my planned use case is:

 -  a big database in use 24/7 (mostly reads) with hundreds of millions of rows

 -  large bulk updates applied weekly, potentially with millions (or
at least 100's of thousands) of changes. Or maybe smaller updates more
often, if that's an easier load profile for PostgreSQL.

This means there will be millions of dirty rows whenever the bulk
updates happen. Still the database needs to be in constant use, so I
can't really remove its foreign keys for the bulk loads. Would be
great if PostgreSQL could support very large batch updates without the
need to remove foreign keys. In my case a performance penalty is
acceptable, but a crash is not.

According to Tom, the current implementation can't really handle very
large batch updates on existing tables with foreign keys. Trying this
makes PostgreSQL server run out of memory when the dataset gets large
enough. I think a performance penalty would be acceptable (can't be
avoided), but the server shouldn't run out of memory like it does now.
Some people might consider this a feature, some see it as a bug...

For now, I can work around this on the client side by splitting the
updates into a million separate transactions instead of a single big
one. Will be slow, but it should work.

In conclusion, this bug was apparently not so much about vacuuming
after all. It was more about very large batch updates on pre-existing
tables with foreign key constraints causing too many pending AFTER
triggers thus exhausting all available memory and taking forever to
process. The hanging lock was probably just a side effect when the
system was running out of memory.

--
Br, Sakari

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #5469: regexp_matches() has poor behaviour and more poor documentation

2010-05-29 Thread Robert Haas
On Sat, May 29, 2010 at 5:00 PM, Bruce Momjian  wrote:
> I have updated the patch, attached, to clarify that this returns text
> arrays, and that you can force it to always return one row using
> COALESCE() and a '|' pattern (the later suggested by Daniele Varrazzo).

I don't find this part to be something we should include in the
documentation.  If we want to include a workaround, how about defining
a non-SRF that just calls the SRF and returns the first row?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Re: BUG #5480: Autovacuum interferes with operations (e.g. truncate) on very large databases

2010-05-29 Thread Robert Haas
On Sat, May 29, 2010 at 5:44 PM, Sakari A. Maaranen  wrote:
> For now, I can work around this on the client side by splitting the
> updates into a million separate transactions instead of a single big
> one. Will be slow, but it should work.

In general, it's better to group things into larger transactions - the
case where the pending trigger queue exhausts system memory is an
unfortunate exception.  You might want to think about, say, a thousand
transactions of a thousand records, instead of a million transactions
with one record each.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs