Re: [HACKERS] bug in ts_rank_cd

2010-12-22 Thread Sushant Sinha
Sorry for sounding the false alarm. I was not running the vanilla
postgres and that is why I was seeing that problem. Should have checked
with the vanilla one.

-Sushant

On Tue, 2010-12-21 at 23:03 -0500, Tom Lane wrote:
> Sushant Sinha  writes:
> > There is a bug in ts_rank_cd. It does not correctly give rank when the
> > query lexeme is the first one in the tsvector.
> 
> Hmm ... I cannot reproduce the behavior you're complaining of.
> You say
> 
> > select ts_rank_cd(to_tsvector('english', 'abc sdd'),
> > plainto_tsquery('english', 'abc'));   
> >  ts_rank_cd 
> > 
> >   0
> 
> but I get
> 
> regression=# select ts_rank_cd(to_tsvector('english', 'abc sdd'),
> regression(# plainto_tsquery('english', 'abc'));   
>  ts_rank_cd 
> 
> 0.1
> (1 row)
> 
> > The problem is that the Cover finding algorithm ignores the lexeme at
> > the 0th position,
> 
> As far as I can tell, there is no "0th position" --- tsvector counts
> positions from one.  The only way to see pos == 0 in the input to
> Cover() is if the tsvector has been stripped of position information.
> ts_rank_cd is documented to return 0 in that situation.  Your patch
> would have the effect of causing it to return some nonzero, but quite
> bogus, ranking.
> 
>   regards, tom lane



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How much do the hint bits help?

2010-12-22 Thread Simon Riggs
On Tue, 2010-12-21 at 17:42 -0500, Merlin Moncure wrote:

> *) is there community interest in a full patch that fills in the
> missing details not implemented here? 

You're thinking seems sound to me. We now have all-visible flags, fewer
xids, much better clog concurrency. Avoiding hint bits would also
noticeably reduce number of dirty writes, especially at checkpoint.

Hot Standby already ignores hint bits and I've not heard a single
complaint, so we are already doing this in the code.

I don't see any reason to believe that there is not an equally effective
optimisation that we can apply to bring performance back up, if it is
shown to drop in particular use cases.

I would vote to put this into 9.1 as a non-default option at restart,
opening the door to other features which hint bits are frustrating.
People can then choose between those features and the "power of hint
bits". I think many people would choose db block checksums.

If you need support, or direct help with the code, just ask. Am happy to
be your committer also.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] queriing the version of libpq

2010-12-22 Thread Magnus Hagander
On Mon, Dec 20, 2010 at 16:48, Magnus Hagander  wrote:
> On Thu, Dec 16, 2010 at 17:13, Tom Lane  wrote:
>> Magnus Hagander  writes:
>>> On Thu, Dec 16, 2010 at 17:07, Tom Lane  wrote:
 because if you're trying to link against an older libpq, the link will
 fail before you ever get to execute.  So let's have a less implausible
 use-case please.
>>
>>> Look back at the very start of the thread.
>>
>> OK, but let's document those examples instead.
>
> Something like this better?

I got some help from Bruce to look over the grammar in the docs (and
fix the broken ones), and have now applied and pushed this.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How much do the hint bits help?

2010-12-22 Thread Heikki Linnakangas

On 22.12.2010 15:21, Simon Riggs wrote:

On Tue, 2010-12-21 at 17:42 -0500, Merlin Moncure wrote:


*) is there community interest in a full patch that fills in the
missing details not implemented here?


You're thinking seems sound to me. We now have all-visible flags, fewer
xids, much better clog concurrency. Avoiding hint bits would also
noticeably reduce number of dirty writes, especially at checkpoint.


Yep.


Hot Standby already ignores hint bits and I've not heard a single
complaint, so we are already doing this in the code.


No, the XMIN/XMAX committed/invalid hint bits on each heap tuple are 
used during hot sandby just like during normal operation. We ignore the 
index tuples marked as dead during hot standby, but that's a different 
issue.



I would vote to put this into 9.1 as a non-default option at restart,
opening the door to other features which hint bits are frustrating.
People can then choose between those features and the "power of hint
bits". I think many people would choose db block checksums.


Making it optional would add some ifs in the critical paths, possibly 
making it slower.


My gut feeling is that a reasonable compromise is to set hint bits like 
we do today, but don't mark the page as dirty when only hint bits are 
set. That way you get the benefit of hint bits for tuples that are 
frequently accessed and stay in buffer cache. But you don't spend any 
extra I/O to set them. I'd really like to see a worst-case scenario 
benchmark of a patch that does that.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How much do the hint bits help?

2010-12-22 Thread Simon Riggs
On Wed, 2010-12-22 at 15:30 +0200, Heikki Linnakangas wrote:

> > I would vote to put this into 9.1 as a non-default option at restart,
> > opening the door to other features which hint bits are frustrating.
> > People can then choose between those features and the "power of hint
> > bits". I think many people would choose db block checksums.
> 
> Making it optional would add some ifs in the critical paths, possibly 
> making it slower.

Hardly. A server-start parameter is going to be constant during
execution and branch prediction will just snuff that away to nothing.

> My gut feeling is that a reasonable compromise is to set hint bits like 
> we do today, but don't mark the page as dirty when only hint bits are 
> set. That way you get the benefit of hint bits for tuples that are 
> frequently accessed and stay in buffer cache. But you don't spend any 
> extra I/O to set them. I'd really like to see a worst-case scenario 
> benchmark of a patch that does that.

That sounds great, but still prevents block checksums and that is a very
valuable feature for robustness. This isn't a discussion about hint
bits, its a discussion about opening the way for other features.

ISTM there are other ways of optimising any clog issues that may remain,
so clutching to this ancient optimisation has no further benefit for me.

Merlin's idea seems to me to be original, useful *and* reasonable.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL/MED - core functionality

2010-12-22 Thread Andrew Dunstan



On 12/21/2010 02:33 PM, Simon Riggs wrote:

On Wed, 2010-12-15 at 22:25 +0900, Shigeru HANADA wrote:


Attached are revised version of SQL/MED core functionality patches.

Looks very interesting new feature, well done.

Can I ask some questions about how this will work?
No particular order, just numbered for reference.


Answering a few of your questions as I understand the position, faute de 
mieux.

1. The docs don't actually say what a foreign table is. Is it a local
representation of foreign data? Or a local copy of foreign data? Or is
it a table created on a remote node?



It's an interface to data not managed by Postgres (or at least by this 
node). It might be a table on another Postgres node, it might be a file, 
it might be a table in another RDBMS, it might be a stream of some sort. 
I could imagine creating one over a SOAP call, or for an RSS feed. 
Someone has created one for a twitter feed, I believe. An LDAP FDW might 
also be useful (think: single sign on).



2. Will CREATE FOREIGN TABLE require a transactionid? It seems a good
replacement for temp tables on Hot Standby to be able to run a CREATE
FOREIGN TABLE using the file_fdw, then reuse the file again later.


How could it not require a txnid? It's going to write the definition to 
the catalog, isn't it?



3. Do we support CREATE TEMP FOREIGN TABLE? It seems desirable to be
able to move data around temporarily, as we do with normal tables.


That would definitely be a good thing to have.



7. Why does ANALYZE skip foreign tables? Surely its really important we
know things about a foreign table, otherwise we are going to optimize
things very badly.


Quite apart from other reasons, such as possible ephemerality of the 
data, the difficulty of taking a reasonable random sample from an 
arbitrary foreign data source seems substantial, and we surely don't 
want ANALYSE to have to run a full sequential scan of a foreign data source.




10. Can a foreign table be referenced by a FK?


I don't see how it can be. There would be no unique index to use.

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How much do the hint bits help?

2010-12-22 Thread Heikki Linnakangas

On 22.12.2010 15:59, Simon Riggs wrote:

On Wed, 2010-12-22 at 15:30 +0200, Heikki Linnakangas wrote:

My gut feeling is that a reasonable compromise is to set hint bits like
we do today, but don't mark the page as dirty when only hint bits are
set. That way you get the benefit of hint bits for tuples that are
frequently accessed and stay in buffer cache. But you don't spend any
extra I/O to set them. I'd really like to see a worst-case scenario
benchmark of a patch that does that.


That sounds great, but still prevents block checksums and that is a very
valuable feature for robustness.


It does? The problem with block checksums is that if you modify a page 
and don't have a corresponding WAL record for it, like a hint bit 
update, you can have a torn page so that the checksum doesn't match. 
Refraining from dirtying the page when a hint bit is updated avoids the 
problem. With that change, we only ever write pages to disk that have a 
WAL record associated with it, with full-page images as necessary to 
avoid torn pages.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How much do the hint bits help?

2010-12-22 Thread Simon Riggs
On Wed, 2010-12-22 at 16:22 +0200, Heikki Linnakangas wrote:
> On 22.12.2010 15:59, Simon Riggs wrote:
> > On Wed, 2010-12-22 at 15:30 +0200, Heikki Linnakangas wrote:
> >> My gut feeling is that a reasonable compromise is to set hint bits like
> >> we do today, but don't mark the page as dirty when only hint bits are
> >> set. That way you get the benefit of hint bits for tuples that are
> >> frequently accessed and stay in buffer cache. But you don't spend any
> >> extra I/O to set them. I'd really like to see a worst-case scenario
> >> benchmark of a patch that does that.
> >
> > That sounds great, but still prevents block checksums and that is a very
> > valuable feature for robustness.
> 
> It does? The problem with block checksums is that if you modify a page 
> and don't have a corresponding WAL record for it, like a hint bit 
> update, you can have a torn page so that the checksum doesn't match. 
> Refraining from dirtying the page when a hint bit is updated avoids the 
> problem. With that change, we only ever write pages to disk that have a 
> WAL record associated with it, with full-page images as necessary to 
> avoid torn pages.

Which then leads to a block CRC not matching the block in memory. Sure,
we can avoid CRC checking the hint bits, but that requires a much more
expensive and complex CRC check.

So what you suggest works only if we restrict CRC checking to blocks
incoming to the buffer cache, but leaves us unable to do CRC checks on
blocks once in the buffer cache. Since many blocks stay in cache almost
constantly, we're left with the situation that the most heavily used
parts of the database seldom get CRC checked.

Postgres needs CRC checking more than it needs hint bits.

I think we should allow this as an option, and if it proves to be an
issue during beta then we can remove it before we go live, assuming we
cannot get a reasonable alternate optimisation.

I think its important for Postgres to implement this in the same release
as sync rep. They complement each other: confirmed robustness. Exactly
the features we need to prove to the rest of the world to trust us with
their data.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL/MED - core functionality

2010-12-22 Thread Simon Riggs
On Wed, 2010-12-22 at 09:03 -0500, Andrew Dunstan wrote:

> Answering a few of your questions

Many thanks.

> > 7. Why does ANALYZE skip foreign tables? Surely its really important we
> > know things about a foreign table, otherwise we are going to optimize
> > things very badly.
> 
> Quite apart from other reasons, such as possible ephemerality of the 
> data, the difficulty of taking a reasonable random sample from an 
> arbitrary foreign data source seems substantial, and we surely don't 
> want ANALYSE to have to run a full sequential scan of a foreign data source.

I think we need something that estimates the size of a table, at least,
otherwise queries will be completely un-optimised.

> > 10. Can a foreign table be referenced by a FK?
> 
> I don't see how it can be. There would be no unique index to use.

That answers another question also.


Still many unanswered.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How much do the hint bits help?

2010-12-22 Thread Robert Haas
On Wed, Dec 22, 2010 at 9:52 AM, Simon Riggs  wrote:
> I think its important for Postgres to implement this in the same release
> as sync rep.

i.e. never, at the rate sync rep has been progressing for the last few months?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How much do the hint bits help?

2010-12-22 Thread Heikki Linnakangas

On 22.12.2010 16:52, Simon Riggs wrote:

On Wed, 2010-12-22 at 16:22 +0200, Heikki Linnakangas wrote:

On 22.12.2010 15:59, Simon Riggs wrote:

On Wed, 2010-12-22 at 15:30 +0200, Heikki Linnakangas wrote:

My gut feeling is that a reasonable compromise is to set hint bits like
we do today, but don't mark the page as dirty when only hint bits are
set. That way you get the benefit of hint bits for tuples that are
frequently accessed and stay in buffer cache. But you don't spend any
extra I/O to set them. I'd really like to see a worst-case scenario
benchmark of a patch that does that.


That sounds great, but still prevents block checksums and that is a very
valuable feature for robustness.


It does? The problem with block checksums is that if you modify a page
and don't have a corresponding WAL record for it, like a hint bit
update, you can have a torn page so that the checksum doesn't match.
Refraining from dirtying the page when a hint bit is updated avoids the
problem. With that change, we only ever write pages to disk that have a
WAL record associated with it, with full-page images as necessary to
avoid torn pages.


Which then leads to a block CRC not matching the block in memory.


What do you mean?

Do you envision that the CRC is calculated at every update, or only when 
a page is written out from the buffer cache? If the former, you could 
recalculate the CRC at a hint bit update too. If the latter, the hint 
bits are included in the page image that you checksum just like any 
other data.



So what you suggest works only if we restrict CRC checking to blocks
incoming to the buffer cache, but leaves us unable to do CRC checks on
blocks once in the buffer cache. Since many blocks stay in cache almost
constantly, we're left with the situation that the most heavily used
parts of the database seldom get CRC checked.


There's plenty of stuff in memory that's not covered by an 
application-level CRC. That's what ECC RAM is for. Updating the CRC at 
every update to a page seems really expensive, but it's an orthogonal 
issue to hint bits.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_ctl and port number detection

2010-12-22 Thread Bruce Momjian
Tom Lane wrote:
> Actually, if we're going to do this at all, we should do
> 
>   pid
>   datadir
>   port
>   socketdir
>   ... here be dragons ...
> 
> so that pg_ctl doesn't have to assume the server is running with a
> default value of unix_socket_dir.  Not sure what to put in the fourth
> line on Windows though ... maybe just leave it empty?

OK, here is a patch that adds the port number and optionally socket
directory location to postmaster.pid, and modifies pg_ctl to use that
information.  I throw an error on using Win32 with pre-9.1 servers
because we can't get the port number from that file.

This removes some crufty code from pg_ctl and removes dependency on
serveral user-configurable settings that we added as a work-around.

This will allow pg_ctl -w to work more reliabily than it did in the
past.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/doc/src/sgml/ref/pg_ctl-ref.sgml b/doc/src/sgml/ref/pg_ctl-ref.sgml
index 2c01e12..c526e8e 100644
*** /tmp/pgrevert.8079/5dqQCd_pg_ctl-ref.sgml	Wed Dec 22 10:10:23 2010
--- doc/src/sgml/ref/pg_ctl-ref.sgml	Wed Dec 22 10:06:33 2010
*** PostgreSQL documentation
*** 348,368 
 
  Wait for the startup or shutdown to complete.
  Waiting is the default option for shutdowns, but not startups.
  When waiting for shutdown, pg_ctl waits for
  the server to remove its PID file.
! When waiting for startup, pg_ctl repeatedly
! attempts to connect to the server via psql, and
! reports success when this is successful.
! pg_ctl will attempt to use the proper port for
! psql. If the environment variable
! PGPORT exists, that is used.  Otherwise,
! pg_ctl will see if a port has been set in the
! postgresql.conf file.  If not, it will use the
! default port that PostgreSQL was compiled
! with (5432 by default).
! When waiting, pg_ctl will
! return an exit code based on the success of the startup
! or shutdown.
 

   
--- 348,359 
 
  Wait for the startup or shutdown to complete.
  Waiting is the default option for shutdowns, but not startups.
+ When waiting for startup, pg_ctl repeatedly
+ attempts to connect to the server.
  When waiting for shutdown, pg_ctl waits for
  the server to remove its PID file.
! pg_ctl returns an exit code based on the
! success of the startup or shutdown.
 

   
*** PostgreSQL documentation
*** 442,469 
  
 
  
-
- PGHOST
- 
- 
-  
-   Default host name or Unix-domain socket location for  (used when waiting for startup).
-  
- 
-
- 
-
- PGPORT
- 
- 
-  
-   Default port number for 
-   (used when waiting for startup).
-  
- 
-
- 

  

--- 433,438 
*** PostgreSQL documentation
*** 506,523 
  
 
  
-
- postgresql.conf
- 
- 
-  
-   This file, located in the data directory, is parsed to find the
-   proper port to use with psql
-   when waiting for startup.
-  
- 
-
- 

   
  
--- 475,480 
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index 14ed914..deb2d58 100644
*** /tmp/pgrevert.8079/PNev2b_miscinit.c	Wed Dec 22 10:10:23 2010
--- src/backend/utils/init/miscinit.c	Wed Dec 22 10:06:33 2010
***
*** 33,38 
--- 33,39 
  #include "mb/pg_wchar.h"
  #include "miscadmin.h"
  #include "postmaster/autovacuum.h"
+ #include "postmaster/postmaster.h"
  #include "storage/fd.h"
  #include "storage/ipc.h"
  #include "storage/pg_shmem.h"
*** CreateLockFile(const char *filename, boo
*** 658,664 
  			   bool isDDLock, const char *refName)
  {
  	int			fd;
! 	char		buffer[MAXPGPATH + 100];
  	int			ntries;
  	int			len;
  	int			encoded_pid;
--- 659,665 
  			   bool isDDLock, const char *refName)
  {
  	int			fd;
! 	char		buffer[MAXPGPATH * 2 + 256];
  	int			ntries;
  	int			len;
  	int			encoded_pid;
*** CreateLockFile(const char *filename, boo
*** 868,876 
  	/*
  	 * Successfully created the file, now fill it.
  	 */
! 	snprintf(buffer, sizeof(buffer), "%d\n%s\n",
  			 amPostmaster ? (int) my_pid : -((int) my_pid),
! 			 DataDir);
  	errno = 0;
  	if (write(fd, buffer, strlen(buffer)) != strlen(buffer))
  	{
--- 869,877 
  	/*
  	 * Successfully created the file, now fill it.
  	 */
! 	snprintf(buffer, sizeof(buffer), "%d\n%s\n%d\n%s\n",
  			 amPostmaster ? (int) my_pid : -((int) my_pid),
! 			 DataDir, PostPortNumber, UnixSocketDir);
  	errno = 0;
  	if (write(fd, buffer, strlen(buffer)) != strlen(buffer))
  	{
*** RecordSharedMemoryInLockFile(unsigned lo
*** 

Re: [HACKERS] How much do the hint bits help?

2010-12-22 Thread Aidan Van Dyk
On Wed, Dec 22, 2010 at 9:52 AM, Simon Riggs  wrote:

> So what you suggest works only if we restrict CRC checking to blocks
> incoming to the buffer cache, but leaves us unable to do CRC checks on
> blocks once in the buffer cache. Since many blocks stay in cache almost
> constantly, we're left with the situation that the most heavily used
> parts of the database seldom get CRC checked.

With this statement, you just moved the goal posts on the checksumming
ideas.  In fact, you didn't just move the goal posts, you picked the
ball up and teleported it to another stadium.

I believe that most of the people talking about and wanting checksums
so far have been wanting them to verify I/O, not to verify that PG has
no bugs, that RAM is staying charged correctly, and that no stray bits
have been flipped, and that nobody else happens to be scribbling over
our shared buffers.

Being able to arbitrary (i.e at any point in time) prove that the
shared buffers contents are exactly what they should be may be a
worthy goal, but that's many orders of magnitude more difficult than
verifying that the bytes we read from disk are the ones we wrote to
disk.

a.



-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL/MED - core functionality

2010-12-22 Thread Andrew Dunstan



On 12/22/2010 10:00 AM, Simon Riggs wrote:

7. Why does ANALYZE skip foreign tables? Surely its really important we

know things about a foreign table, otherwise we are going to optimize
things very badly.

Quite apart from other reasons, such as possible ephemerality of the
data, the difficulty of taking a reasonable random sample from an
arbitrary foreign data source seems substantial, and we surely don't
want ANALYSE to have to run a full sequential scan of a foreign data source.

I think we need something that estimates the size of a table, at least,
otherwise queries will be completely un-optimised.




Perhaps we could allow FDWs to register a size estimation function. Does 
this need to be done on the first go round? Time is running a bit short.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How much do the hint bits help?

2010-12-22 Thread Simon Riggs
On Wed, 2010-12-22 at 17:01 +0200, Heikki Linnakangas wrote:
> On 22.12.2010 16:52, Simon Riggs wrote:
> > On Wed, 2010-12-22 at 16:22 +0200, Heikki Linnakangas wrote:
> >> On 22.12.2010 15:59, Simon Riggs wrote:
> >>> On Wed, 2010-12-22 at 15:30 +0200, Heikki Linnakangas wrote:
>  My gut feeling is that a reasonable compromise is to set hint bits like
>  we do today, but don't mark the page as dirty when only hint bits are
>  set. That way you get the benefit of hint bits for tuples that are
>  frequently accessed and stay in buffer cache. But you don't spend any
>  extra I/O to set them. I'd really like to see a worst-case scenario
>  benchmark of a patch that does that.
> >>>
> >>> That sounds great, but still prevents block checksums and that is a very
> >>> valuable feature for robustness.
> >>
> >> It does? The problem with block checksums is that if you modify a page
> >> and don't have a corresponding WAL record for it, like a hint bit
> >> update, you can have a torn page so that the checksum doesn't match.
> >> Refraining from dirtying the page when a hint bit is updated avoids the
> >> problem. With that change, we only ever write pages to disk that have a
> >> WAL record associated with it, with full-page images as necessary to
> >> avoid torn pages.
> >
> > Which then leads to a block CRC not matching the block in memory.

> Do you envision that the CRC is calculated at every update, or only when 
> a page is written out from the buffer cache? 

At every update, so there is a clear assertion that the CRC matches the
block.

> If the former, you could 
> recalculate the CRC at a hint bit update too. If the latter, the hint 
> bits are included in the page image that you checksum just like any 
> other data.

If we didn't have hint bits, we wouldn't need to recalculate the CRC
each time one was updated...

> > So what you suggest works only if we restrict CRC checking to blocks
> > incoming to the buffer cache, but leaves us unable to do CRC checks on
> > blocks once in the buffer cache. Since many blocks stay in cache almost
> > constantly, we're left with the situation that the most heavily used
> > parts of the database seldom get CRC checked.
> 
> There's plenty of stuff in memory that's not covered by an 
> application-level CRC. That's what ECC RAM is for. 

http://www.google.com/research/pubs/archive/35162.pdf

Google research shows that each DIMM has an 8% chance per annum of
uncorrectable memory errors, even on ECC.

If you have large RAM, like everybody now does, your incidence of this
type of error will be much higher than it was in previous years, so our
perception of what is necessary now to protect databases is out of date.

We have data under our care, and will be much more likely to receive
this kind of error because of the amount of RAM we use.

> Updating the CRC at 
> every update to a page seems really expensive, but it's an orthogonal 
> issue to hint bits.

Clearly, the frequency with which we set hint bits affects the frequency
we can sensibly update CRCs. It shouldn't be up to us to decide how much
protection a user wants to give their data.

There might be two or three settings that make sense, but clearly we
need to be able to limit hint-bit setting to allow us to have a usable
CRC check. So there is a very string connection between turning this
optimisation off and gaining CRC checking as a feature.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL/MED - core functionality

2010-12-22 Thread David Fetter
On Wed, Dec 22, 2010 at 03:00:16PM +, Simon Riggs wrote:
> On Wed, 2010-12-22 at 09:03 -0500, Andrew Dunstan wrote:
> 
> > Answering a few of your questions
> 
> Many thanks.
> 
> > > 7. Why does ANALYZE skip foreign tables? Surely its really
> > > important we know things about a foreign table, otherwise we are
> > > going to optimize things very badly.
> > 
> > Quite apart from other reasons, such as possible ephemerality of
> > the data, the difficulty of taking a reasonable random sample from
> > an arbitrary foreign data source seems substantial, and we surely
> > don't want ANALYSE to have to run a full sequential scan of a
> > foreign data source.
> 
> I think we need something that estimates the size of a table, at
> least, otherwise queries will be completely un-optimised.

The estimated size for a lot of things--streams of data, for
example--is infinity.  I suppose that's a good default for some cases.

> > > 10. Can a foreign table be referenced by a FK?
> > 
> > I don't see how it can be. There would be no unique index to use.
> 
> That answers another question also.

There might be some cases where we can say an expression on a set of
columns is unique, but not in the general case.  In the general case,
we can't even guarantee 1NF.

> Still many unanswered.

Will try these later today :)

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How much do the hint bits help?

2010-12-22 Thread Heikki Linnakangas

On 22.12.2010 17:31, Simon Riggs wrote:

On Wed, 2010-12-22 at 17:01 +0200, Heikki Linnakangas wrote:

Do you envision that the CRC is calculated at every update, or only when
a page is written out from the buffer cache?


At every update, so there is a clear assertion that the CRC matches the
block.


Umm, when do you check the CRC? Every time the page is locked? Every 
time it's updated? If don't verify the CRC, what is it good for?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How much do the hint bits help?

2010-12-22 Thread Heikki Linnakangas

On 22.12.2010 17:31, Simon Riggs wrote:

On Wed, 2010-12-22 at 17:01 +0200, Heikki Linnakangas wrote:

There's plenty of stuff in memory that's not covered by an
application-level CRC. That's what ECC RAM is for.


http://www.google.com/research/pubs/archive/35162.pdf

Google research shows that each DIMM has an 8% chance per annum of
uncorrectable memory errors, even on ECC.


You misread that paper. From summary:


  About a third of machines and over 8% of DIMMs in
our fleet saw at least one *correctable* error per year.


Emphasis mine.


Our
per-DIMM rates of correctable errors translate to an aver-
age of 25,000–75,000 FIT (failures in time per billion hours
of operation) per Mbit and a median FIT range of 778 –
25,000 per Mbit (median for DIMMs with errors), while pre-
vious studies report 200-5,000 FIT per Mbit. The number of
correctable errors per DIMM is highly variable, with some
DIMMs experiencing a huge number of errors, compared to
others. The annual incidence of uncorrectable errors was
1.3% per machine and 0.22% per DIMM.


So the real figure of uncorrectable errors is 0.22% per DIMM.

Anyway, unreliable RAM calls for more ECC bits in DIMMs, not invasive 
architectural changes to every single application in the system.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How much do the hint bits help?

2010-12-22 Thread Tom Lane
Aidan Van Dyk  writes:
> With this statement, you just moved the goal posts on the checksumming
> ideas.  In fact, you didn't just move the goal posts, you picked the
> ball up and teleported it to another stadium.

What he said.  I can't imagine that anyone will be interested in any
case other than "set the CRC immediately before writing, and check it
upon first reading the page in".  Maintaining it continuously while the
page is in shared memory is completely insane from a cost-versus-benefit
perspective.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How much do the hint bits help?

2010-12-22 Thread Simon Riggs
On Wed, 2010-12-22 at 10:45 -0500, Tom Lane wrote:
> Aidan Van Dyk  writes:
> > With this statement, you just moved the goal posts on the checksumming
> > ideas.  In fact, you didn't just move the goal posts, you picked the
> > ball up and teleported it to another stadium.
> 
> What he said.  I can't imagine that anyone will be interested in any
> case other than "set the CRC immediately before writing, and check it
> upon first reading the page in".  Maintaining it continuously while the
> page is in shared memory is completely insane from a cost-versus-benefit
> perspective.

If you insist on setting hint-bits, then that is probably true.

Many people experience almost no I/O these days, and there's a strong
correlation between people caring about their data and also being
willing to spend big $s on cache. We need to protect our users, however
much money they spent on cache; I would argue the more money they spent
on cache the harder we should be trying to protect them.

I'm sure it will take a little while for everybody to understand why a
full CRC implementation is both necessary and now possible. Paradigm
shifts of thought do seem like teleports, but they can be beneficial.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL/MED - core functionality

2010-12-22 Thread David Fetter
On Tue, Dec 21, 2010 at 07:33:04PM +, Simon Riggs wrote:
> On Wed, 2010-12-15 at 22:25 +0900, Shigeru HANADA wrote:
> 
> > Attached are revised version of SQL/MED core functionality patches.
> 
> Looks very interesting new feature, well done.

While, "read SQL:2008" is generally not super useful advice, in the
particular case of 6WD2_09_MED_2007-12, it's actually pretty clear.
Doubtless, as more competition arises, SQL/MED's specification will
get cloudier, but for now, it's a decent place to start.

> 4. In Hot Standby, we are creating many copies of the data tables on
> different servers.  That seems to break the concept that data is in
> only one place, when we assume that a foreign table is on only one
> foreign server.  How will we represent the concept that data is
> potentially available identically from more than one place?  Any
> other comments about how this will work with Hot Standby?

Your premise, that we're creating many copies, refers to a potential
optimization of SQL/MED which IMHO we should not even vaguely consider
until we've given SQL/MED a chance to shake out.  It's a basic
question about cache coherency, and we just don't need to go there on
the first round.  Possibly not even on the second.

> 5. In PL/Proxy, we have the concept that a table is sharded across
> multiple nodes.  Is that possible here?  Again, we seem to have the
> concept that a table is only ever in a single place.

It's not super relevant.

> 6. Can we do CREATE FOREIGN TABLE  AS SELECT ...
> I guess the answer depends on (1)

The answer to (1) is "reference to data not managed by the instance of
PostgreSQL doing the queries," so it should be possible, depending on
what capabilities the particular foreign data wrapper exposes.

> 7. Why does ANALYZE skip foreign tables? Surely its really important
> we know things about a foreign table, otherwise we are going to
> optimize things very badly.

The only thing known about a table may be the structure of its rows.
Streams would be one example of this.

> 8. Is the WHERE clause passed down into a ForeignScan?

Dunno.  I'd presume that predicate pushing would be a very important
part of how to get this working at a reasonable speed, but as I've
demonstrated with DBI-Link, it's not strictly necessary for base
functionality.

> 9. The docs for CHECK constraints imply that the CHECK is executed
> against any rows returned from FDW. Are we really going to execute that
> as an additional filter on each row retrieved?

Depends.  If there's some reliable way to push the CHECK to the
foreign data source, that is of course preferable.

> 10. Can a foreign table be referenced by a FK?

It's conceivable, at least in some cases.  The penalties could
potentially be quite high.

> 11. Can you create a DO INSTEAD trigger on a FOREIGN TABLE?

Don't see why not.  Again, bearing in mind that not all foreign data
sources are remotely similar in capability.

> 12. I think it would be useful for both review and afterwards to
> write the documentation section now, so we can begin to understand
> this.  Will there be a documentation section on writing a FDW also?
> There are enough open questions here that I think we need docs and a
> review guide, otherwise we'll end up with some weird missing
> feature, which would be a great shame.

Excellent idea.  Next on the agenda: carving out the needed resources
for this project.

> 13. How does this relate to dblink? Is that going to be replaced by
> this feature?

As I understand it, dblink is already using some of the
infrastructure.  For legacy reasons, I suspect dblink will be
maintained, even when we have a full-blown SQL/MED implementation.

> 14. How do we do scrollable cursors with foreign tables? Do we
> materialize them always? Or...

That will depend on the nature of the foreign data source.

> 15. In terms of planning queries, do we have a concept of additional
> cost per row on a foreign server?  How does the planner decide how costly
> retrieving data is from the FDW?

Dunno.  I return to my refrain of, "depends on the foreign data
source."  You could imagine that a future version of PostgreSQL would
have handy interfaces available to MED and others, where a CSV file
would be much less likely to.

> 16. If we cancel a query, is there an API call to send query cancel to
> the FDW and so on to the foreign server? Does that still work if we hot
> other kinds of ERROR, or FATAL?

Depends...

> 17. Can we request different types of transaction isolation on the
> foreign server, or do certain states get passed through from our
> session? e.g. if we are running a serializable transaction, does
> that state get passed through to the FDW, so it knows to request
> that on the foreign server?  That seems essential if we are going to
> make pg_dump work correctly.

Assuming that the foreign server even has a concept of transaction
isolation, which it may well not, we should be able to pass same.

> 18. Does pg_dump dump the data in the FDW 

Re: [HACKERS] SQL/MED - core functionality

2010-12-22 Thread Alvaro Herrera
Excerpts from David Fetter's message of mié dic 22 12:36:10 -0300 2010:
> On Wed, Dec 22, 2010 at 03:00:16PM +, Simon Riggs wrote:
> > On Wed, 2010-12-22 at 09:03 -0500, Andrew Dunstan wrote:

> > > Quite apart from other reasons, such as possible ephemerality of
> > > the data, the difficulty of taking a reasonable random sample from
> > > an arbitrary foreign data source seems substantial, and we surely
> > > don't want ANALYSE to have to run a full sequential scan of a
> > > foreign data source.
> > 
> > I think we need something that estimates the size of a table, at
> > least, otherwise queries will be completely un-optimised.
> 
> The estimated size for a lot of things--streams of data, for
> example--is infinity.  I suppose that's a good default for some cases.

Since we don't have streaming queries, this would seem rather pointless ...
Surely the FDW must be able to limit the resultset somehow.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How much do the hint bits help?

2010-12-22 Thread Aidan Van Dyk
On Wed, Dec 22, 2010 at 10:52 AM, Simon Riggs  wrote:

> I'm sure it will take a little while for everybody to understand why a
> full CRC implementation is both necessary and now possible. Paradigm
> shifts of thought do seem like teleports, but they can be beneficial.

But please don't deny the rest of us airbags while you keep working on
teleportation ;-)

a.

-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL/MED - core functionality

2010-12-22 Thread David Fetter
On Wed, Dec 22, 2010 at 12:54:06PM -0300, Alvaro Herrera wrote:
> Excerpts from David Fetter's message of mié dic 22 12:36:10 -0300 2010:
> > On Wed, Dec 22, 2010 at 03:00:16PM +, Simon Riggs wrote:
> > > On Wed, 2010-12-22 at 09:03 -0500, Andrew Dunstan wrote:
> 
> > > > Quite apart from other reasons, such as possible ephemerality of
> > > > the data, the difficulty of taking a reasonable random sample from
> > > > an arbitrary foreign data source seems substantial, and we surely
> > > > don't want ANALYSE to have to run a full sequential scan of a
> > > > foreign data source.
> > > 
> > > I think we need something that estimates the size of a table, at
> > > least, otherwise queries will be completely un-optimised.
> > 
> > The estimated size for a lot of things--streams of data, for
> > example--is infinity.  I suppose that's a good default for some cases.
> 
> Since we don't have streaming queries,

We don't, but other systems do.

> this would seem rather pointless ...  Surely the FDW must be able to
> limit the resultset somehow.

Using LIMIT. :)

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How much do the hint bits help?

2010-12-22 Thread Tom Lane
Heikki Linnakangas  writes:
> My gut feeling is that a reasonable compromise is to set hint bits like 
> we do today, but don't mark the page as dirty when only hint bits are 
> set. That way you get the benefit of hint bits for tuples that are 
> frequently accessed and stay in buffer cache. But you don't spend any 
> extra I/O to set them.

I think it's far more likely that that could be acceptable than the
radical method of removing hint bits altogether.

I have not looked into what's wrong with Merlin's test case, but my
thinking about it goes like this: we know that contention for buffer
lookup is significant at high loads, despite the facts that the accesses
are distributed across a lot of independently-usable buffers and we've
done much work to partition the lookup locks.  If we remove hint bits
and thereby force an access to clog for every tuple touch, we can expect
that the contention for clog access will be comparable to the worst case
for buffer access contention ... except that in many cases, it will be
distributed across far fewer pages and so the actual interference rate
will be far higher.  This will make our past experiences with "context
swap storms" look like a day at the beach.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How much do the hint bits help?

2010-12-22 Thread Simon Riggs
On Wed, 2010-12-22 at 17:42 +0200, Heikki Linnakangas wrote:
> On 22.12.2010 17:31, Simon Riggs wrote:
> > On Wed, 2010-12-22 at 17:01 +0200, Heikki Linnakangas wrote:
> >> There's plenty of stuff in memory that's not covered by an
> >> application-level CRC. That's what ECC RAM is for.
> >
> > http://www.google.com/research/pubs/archive/35162.pdf
> >
> > Google research shows that each DIMM has an 8% chance per annum of
> > uncorrectable memory errors, even on ECC.
> 
> You misread that paper. From summary:

I read the paper in detail before I posted. If you think that finding an
error in my quote disproves anything, you should read the whole paper. I
see this:

Conclusion 1
"... Nonetheless, the remaining incidence of 0.22% per DIMM
per year makes a crash-tolerant application layer indispens-
able for large-scale server farms."

What you are arguing for is a protection system that will reduce in
effectiveness as we add more cache.

What I am arguing in favour of is an option to allow people to protect
their data, whatever the size of their cache. I'm not forcing you or
anyone to use it, but I think its an important option to be offering to
our users. 

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How much do the hint bits help?

2010-12-22 Thread Merlin Moncure
On Wed, Dec 22, 2010 at 10:55 AM, Aidan Van Dyk  wrote:
> On Wed, Dec 22, 2010 at 10:52 AM, Simon Riggs  wrote:
>
>> I'm sure it will take a little while for everybody to understand why a
>> full CRC implementation is both necessary and now possible. Paradigm
>> shifts of thought do seem like teleports, but they can be beneficial.
>
> But please don't deny the rest of us airbags while you keep working on
> teleportation ;-)

well, simon's point that hint bits complicate checksum may nor may not
be the case, but no hint bits = less i/o = less checksumming (unless
you checksum around the hint bits).  This lowers the expense of doing
it, which is nice.  Maybe that doesn't matter in the end, we'll see.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How much do the hint bits help?

2010-12-22 Thread Tom Lane
Merlin Moncure  writes:
> well, simon's point that hint bits complicate checksum may nor may not
> be the case, but no hint bits = less i/o = less checksumming (unless
> you checksum around the hint bits).

I think you're optimistically assuming the extra clog accesses don't
cost any I/O.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How much do the hint bits help?

2010-12-22 Thread Merlin Moncure
On Wed, Dec 22, 2010 at 10:59 AM, Tom Lane  wrote:
> Heikki Linnakangas  writes:
>> My gut feeling is that a reasonable compromise is to set hint bits like
>> we do today, but don't mark the page as dirty when only hint bits are
>> set. That way you get the benefit of hint bits for tuples that are
>> frequently accessed and stay in buffer cache. But you don't spend any
>> extra I/O to set them.
>
> I think it's far more likely that that could be acceptable than the
> radical method of removing hint bits altogether.
>
> I have not looked into what's wrong with Merlin's test case, but my
> thinking about it goes like this: we know that contention for buffer
> lookup is significant at high loads, despite the facts that the accesses
> are distributed across a lot of independently-usable buffers and we've
> done much work to partition the lookup locks.  If we remove hint bits
> and thereby force an access to clog for every tuple touch, we can expect
> that the contention for clog access will be comparable to the worst case
> for buffer access contention ... except that in many cases, it will be
> distributed across far fewer pages and so the actual interference rate
> will be far higher.  This will make our past experiences with "context
> swap storms" look like a day at the beach.

right.  note I'm not suggesting they they should actually be removed,
at least not yet.  I was just playing around and noticed that the cost
of not having them is not immediately obvious in highly synthetic
tests.  The cost of clog access in best case scenario appears to be
near zero, which I thought was interesting enough to point out.  What
I'm after here is the worst case scenario, how likely it is to happen,
and looking into possible remedies (if any).

I'm going to do lots more testing over the holidays.  I'm fishing for
ideas on good ways to flesh things out more.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How much do the hint bits help?

2010-12-22 Thread Merlin Moncure
On Wed, Dec 22, 2010 at 11:06 AM, Tom Lane  wrote:
> Merlin Moncure  writes:
>> well, simon's point that hint bits complicate checksum may nor may not
>> be the case, but no hint bits = less i/o = less checksumming (unless
>> you checksum around the hint bits).
>
> I think you're optimistically assuming the extra clog accesses don't
> cost any I/O.

right, but clog is much more highly packed which is both a good and a
bad thing.  my conjecture here is that jamming the clog files is
actually good, because that keeps them 'hot' and more than compensates
the extra heap i/o.  the extra lock of course is scary.

here's the thing, compared to the 90's when they were put in, the
transaction space has shrunk by half and we put gigabytes, not
megabytes of memory into servers.  what does this mean for the clog?
that's what i'm after.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How much do the hint bits help?

2010-12-22 Thread Merlin Moncure
On Wed, Dec 22, 2010 at 11:12 AM, Merlin Moncure  wrote:
> On Wed, Dec 22, 2010 at 11:06 AM, Tom Lane  wrote:
>> Merlin Moncure  writes:
>>> well, simon's point that hint bits complicate checksum may nor may not
>>> be the case, but no hint bits = less i/o = less checksumming (unless
>>> you checksum around the hint bits).
>>
>> I think you're optimistically assuming the extra clog accesses don't
>> cost any I/O.
>
> right, but clog is much more highly packed which is both a good and a
> bad thing.  my conjecture here is that jamming the clog files is
> actually good, because that keeps them 'hot' and more than compensates
> the extra heap i/o.  the extra lock of course is scary.

er, should have said, plus less heap i/o compensates the extra clog i/o.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL/MED - core functionality

2010-12-22 Thread Simon Riggs
On Wed, 2010-12-22 at 07:36 -0800, David Fetter wrote:
> > 
> > > > 7. Why does ANALYZE skip foreign tables? Surely its really
> > > > important we know things about a foreign table, otherwise we are
> > > > going to optimize things very badly.
> > > 
> > > Quite apart from other reasons, such as possible ephemerality of
> > > the data, the difficulty of taking a reasonable random sample from
> > > an arbitrary foreign data source seems substantial, and we surely
> > > don't want ANALYSE to have to run a full sequential scan of a
> > > foreign data source.
> > 
> > I think we need something that estimates the size of a table, at
> > least, otherwise queries will be completely un-optimised.
> 
> The estimated size for a lot of things--streams of data, for
> example--is infinity.  I suppose that's a good default for some cases.

A fairly gross estimate will be all that is required.

Without it, we'd better be looking at some advanced query-cancel
support, cos the "17 hours and still going" events of previous SQL
gateway products seems to be looming in my rear view mirror.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How much do the hint bits help?

2010-12-22 Thread Simon Riggs
On Wed, 2010-12-22 at 10:59 -0500, Tom Lane wrote:
> Heikki Linnakangas  writes:
> > My gut feeling is that a reasonable compromise is to set hint bits like 
> > we do today, but don't mark the page as dirty when only hint bits are 
> > set. That way you get the benefit of hint bits for tuples that are 
> > frequently accessed and stay in buffer cache. But you don't spend any 
> > extra I/O to set them.
> 
> I think it's far more likely that that could be acceptable than the
> radical method of removing hint bits altogether.

I haven't argued to remove them, just have an option to not set them.

> I have not looked into what's wrong with Merlin's test case, but my
> thinking about it goes like this: we know that contention for buffer
> lookup is significant at high loads, despite the facts that the accesses
> are distributed across a lot of independently-usable buffers and we've
> done much work to partition the lookup locks.  If we remove hint bits
> and thereby force an access to clog for every tuple touch, we can expect
> that the contention for clog access will be comparable to the worst case
> for buffer access contention ... except that in many cases, it will be
> distributed across far fewer pages and so the actual interference rate
> will be far higher.  This will make our past experiences with "context
> swap storms" look like a day at the beach.

I think you're right, but I also think there are other ways we could
optimise that other than hint bits. 

For example, the single item cache might be changed, or we might
buffer/batch clog updates, or we might use a hash table of known aborted
transactions etc.

As Merlin points out, we don't have much evidence for their value or
lack of value, so we need a parameter to allow wide scale testing.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How much do the hint bits help?

2010-12-22 Thread Tom Lane
Merlin Moncure  writes:
> I'm going to do lots more testing over the holidays.  I'm fishing for
> ideas on good ways to flesh things out more.

Based on the analogy to past bufmgr contention problems, I'd suggest
going back through the archives to look for the test cases associated
with context swap storm discussions.  The cases themselves might not
be quite right for this, but they'd at least show a structure for
stressing things at the tuple-access level.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How much do the hint bits help?

2010-12-22 Thread David Fetter
On Wed, Dec 22, 2010 at 04:00:30PM +, Simon Riggs wrote:
> On Wed, 2010-12-22 at 17:42 +0200, Heikki Linnakangas wrote:
> > On 22.12.2010 17:31, Simon Riggs wrote:
> > > On Wed, 2010-12-22 at 17:01 +0200, Heikki Linnakangas wrote:
> > >> There's plenty of stuff in memory that's not covered by an
> > >> application-level CRC. That's what ECC RAM is for.
> > >
> > > http://www.google.com/research/pubs/archive/35162.pdf
> > >
> > > Google research shows that each DIMM has an 8% chance per annum of
> > > uncorrectable memory errors, even on ECC.
> > 
> > You misread that paper. From summary:
> 
> I read the paper in detail before I posted. If you think that finding an
> error in my quote disproves anything, you should read the whole paper. I
> see this:
> 
> Conclusion 1
> "... Nonetheless, the remaining incidence of 0.22% per DIMM
> per year makes a crash-tolerant application layer indispens-
> able for large-scale server farms."
> 
> What you are arguing for is a protection system that will reduce in
> effectiveness as we add more cache.
> 
> What I am arguing in favour of is an option to allow people to protect
> their data, whatever the size of their cache. I'm not forcing you or
> anyone to use it, but I think its an important option to be offering to
> our users. 

For what version of PostgreSQL are you proposing that we provide this
protection?  Let's assume that it's before 10.0 so we can get some
idea of how this will arise :)

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strncmp->memcmp when we know the shorter length

2010-12-22 Thread Noah Misch
On Tue, Dec 21, 2010 at 10:15:41PM -0500, Robert Haas wrote:
> A little benchmarking reveals that on my system (MacOS X 10.6.5) it
> appears that strncmp() is faster for a 4 character string, but
> memcmp() is faster for a 5+ character string.

Good call; I hadn't considered that possibility.

> So I think most of
> these are pretty clear wins, but I have reverted the changes to
> src/backend/tsearch because I'm not entirely confident that lexemes
> and affixes will be long enough on average for this to be a win there.
>  Please feel free to resubmit that part with performance results
> showing that it works out to a win.  Some of the ltree changes
> produced compiler warnings, so I omitted those also.  Committed the
> rest.

Thanks for the quick review and commit.  I'm not acquainted with the performance
significance of the tsearch and ltree call sites.  Leaving those as-is makes
sense to me.

nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plperlu problem with utf8

2010-12-22 Thread David E. Wheeler
On Dec 21, 2010, at 8:19 PM, Alex Hunsaker wrote:

> And here is v3, fixes the above and also makes sure to properly
> encode/decode SPI arguments.  Tested on a latin1 database with latin1
> columns and utf8 with utf8 columns.  Also passes make installcheck (of
> course) and changes one or two things to make plperl.c warning free.

Awesome. Would you add it to 
https://commitfest.postgresql.org/action/commitfest_view?id=9 please?

Best,

David
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wCTE behaviour

2010-12-22 Thread Peter Eisentraut
On sön, 2010-11-14 at 04:45 +0200, Marko Tiikkaja wrote:
> .. and a wild patch appears.
> 
> This is almost exactly the patch from 2010-02 without 
> CommandCounterIncrement()s.  It's still a bit rough around the edges
> and 
> needs some more comments, but I'm posting it here anyway.

Is this the patch of record?  There are no changes to the documentation
included.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wCTE behaviour

2010-12-22 Thread Peter Eisentraut
On tis, 2010-12-21 at 13:20 -0800, David Fetter wrote:
> On Tue, Dec 21, 2010 at 11:14:31PM +0200, Peter Eisentraut wrote:
> > On sön, 2010-11-14 at 04:45 +0200, Marko Tiikkaja wrote:
> > > On 2010-11-12 8:25 PM +0200, I wrote:
> > > > I'm going to take some time off this weekend to get a patch with this
> > > > behaviour to the next commitfest.
> > > 
> > > .. and a wild patch appears.
> > > 
> > > This is almost exactly the patch from 2010-02 without 
> > > CommandCounterIncrement()s.  It's still a bit rough around the edges and 
> > > needs some more comments, but I'm posting it here anyway.
> > 
> > To pick up an earlier thread again, has any serious thought been given
> > to adapting the SQL2001/DB2 syntax instead of our own?
> 
> Yes, and it's a good deal more limited and less intuitive than ours.

Less intuitive, possibly, but how is it more limited?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] "writable CTEs"

2010-12-22 Thread Peter Eisentraut
As a side note, I think the term "writable CTE" is a misnomer.  The CTE
is not writable.  The CTE is the result of a write operation.

A writable CTE would look like this:

WITH foo AS (SELECT ...) UPDATE foo SET ...

a bit like an updatable view.

AFAICT, the current patch doesn't use the term, so there is no problem,
but just for those who are preparing propaganda and such.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wCTE behaviour

2010-12-22 Thread Marko Tiikkaja

On 2010-12-22 8:24 PM, Peter Eisentraut wrote:

On sön, 2010-11-14 at 04:45 +0200, Marko Tiikkaja wrote:

.. and a wild patch appears.

This is almost exactly the patch from 2010-02 without
CommandCounterIncrement()s.  It's still a bit rough around the edges
and
needs some more comments, but I'm posting it here anyway.


Is this the patch of record?  There are no changes to the documentation
included.


I've kept the documentation as a separate patch, but I haven't touched 
it in a very long time.  I will work on the documentation if there's a 
chance of the patch getting accepted for 9.1.  This arrangement makes 
more sense to me and I'm sure others will agree.



Regards,
Marko Tiikkaja

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] "writable CTEs"

2010-12-22 Thread Marko Tiikkaja

On 2010-12-22 8:28 PM, Peter Eisentraut wrote:

As a side note, I think the term "writable CTE" is a misnomer.  The CTE
is not writable.  The CTE is the result of a write operation.

A writable CTE would look like this:

WITH foo AS (SELECT ...) UPDATE foo SET ...

a bit like an updatable view.

AFAICT, the current patch doesn't use the term, so there is no problem,
but just for those who are preparing propaganda and such.


I think I've used "DML WITH" in the patch, but I don't like that either. 
 Naming this feature seems to be quite a challenge.


I'd prefer something short but easily understandable, but those two 
might be mutually exclusive.



Regards,
Marko Tiikkaja

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] "writable CTEs"

2010-12-22 Thread Richard Broersma
On Wed, Dec 22, 2010 at 10:44 AM, Marko Tiikkaja
 wrote:
> I'd prefer something short but easily understandable, but those two might be
> mutually exclusive.

Volatile CTE's doesn't add any more clarity either. Maybe "Round Trip
Reduction" CTE's. :)

-- 
Regards,
Richard Broersma Jr.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] "writable CTEs"

2010-12-22 Thread Kevin Grittner
Marko Tiikkaja  wrote:
 
> I think I've used "DML WITH" in the patch, but I don't like that
> either.  Naming this feature seems to be quite a challenge.
> 
> I'd prefer something short but easily understandable, but those
> two might be mutually exclusive.
 
How about?:
 
DML CTEs
DML-based CTEs
RETURNING-based CTEs
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] "writable CTEs"

2010-12-22 Thread David Fetter
On Wed, Dec 22, 2010 at 12:54:39PM -0600, Kevin Grittner wrote:
> Marko Tiikkaja  wrote:
>  
> > I think I've used "DML WITH" in the patch, but I don't like that
> > either.  Naming this feature seems to be quite a challenge.
> > 
> > I'd prefer something short but easily understandable, but those
> > two might be mutually exclusive.
>  
> How about?:
>  
> DML CTEs
> DML-based CTEs
> RETURNING-based CTEs

When I designed the feature, I'd planned to be able to put most kinds
of statement inside or outside the CTE, not just DML writes.  You can
imagine cases for DCL (GRANT/REVOKE based on a catalog query) or DDL
(partition management), and I did.

We could call them, "Expanded CTEs," but that only freezes the prior
norm making them read-only, so I think "Writeable CTEs" captures it
pretty well.

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plperlu problem with utf8

2010-12-22 Thread Alex Hunsaker
On Wed, Dec 22, 2010 at 11:24, David E. Wheeler  wrote:
> On Dec 21, 2010, at 8:19 PM, Alex Hunsaker wrote:
>
>> And here is v3, [ ...]

> Awesome. Would you add it to 
> https://commitfest.postgresql.org/action/commitfest_view?id=9 please?

Nah, I was willing to spend a couple of hours playing with different
encodings and thinking about ways to break it.  But adding it to a
commit feast? Puhlease.  In other news, Ive added it to the
commitfest.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How much do the hint bits help?

2010-12-22 Thread Heikki Linnakangas

On 22.12.2010 18:12, Merlin Moncure wrote:

On Wed, Dec 22, 2010 at 11:06 AM, Tom Lane  wrote:

Merlin Moncure  writes:

well, simon's point that hint bits complicate checksum may nor may not
be the case, but no hint bits = less i/o = less checksumming (unless
you checksum around the hint bits).


I think you're optimistically assuming the extra clog accesses don't
cost any I/O.


right, but clog is much more highly packed which is both a good and a
bad thing.


As a sidenote: note that the clog is not currently CRC'd.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wCTE behaviour

2010-12-22 Thread Tom Lane
Marko Tiikkaja  writes:
> On 2010-12-22 8:24 PM, Peter Eisentraut wrote:
>> Is this the patch of record?  There are no changes to the documentation
>> included.

> I've kept the documentation as a separate patch, but I haven't touched 
> it in a very long time.  I will work on the documentation if there's a 
> chance of the patch getting accepted for 9.1.  This arrangement makes 
> more sense to me and I'm sure others will agree.

Well, it's difficult to review a documentation-free patch.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How much do the hint bits help?

2010-12-22 Thread Josh Berkus

> right -- see the attached clog_stress.sql above.  It creates a script
> that inserts records in blocks of 1, deletes half of them, and
> vacuums.  Neither the execution of the script nor a seq scan following
> its execution showed an interesting performance difference (which I am
> arbitrarily calling 5% in either direction).  Like I said though, I
> don't trust the patch or the results yet.

Given that DBT2 stressed the bufrmgr contention pretty well, it seems
like it'd be worth trying this for hint bits in the test servers.  We
should see if Mark Wong can do this in the new year.

I might be able to test on some client workloads.  We'll see; currently
I lack the harness to simulate a high level of client contention.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How much do the hint bits help?

2010-12-22 Thread Mark Kirkwood

On 23/12/10 05:06, Merlin Moncure wrote:

On Wed, Dec 22, 2010 at 10:59 AM, Tom Lane  wrote:

Heikki Linnakangas  writes:

My gut feeling is that a reasonable compromise is to set hint bits like
we do today, but don't mark the page as dirty when only hint bits are
set. That way you get the benefit of hint bits for tuples that are
frequently accessed and stay in buffer cache. But you don't spend any
extra I/O to set them.

I think it's far more likely that that could be acceptable than the
radical method of removing hint bits altogether.

I have not looked into what's wrong with Merlin's test case, but my
thinking about it goes like this: we know that contention for buffer
lookup is significant at high loads, despite the facts that the accesses
are distributed across a lot of independently-usable buffers and we've
done much work to partition the lookup locks.  If we remove hint bits
and thereby force an access to clog for every tuple touch, we can expect
that the contention for clog access will be comparable to the worst case
for buffer access contention ... except that in many cases, it will be
distributed across far fewer pages and so the actual interference rate
will be far higher.  This will make our past experiences with "context
swap storms" look like a day at the beach.

right.  note I'm not suggesting they they should actually be removed,
at least not yet.  I was just playing around and noticed that the cost
of not having them is not immediately obvious in highly synthetic
tests.  The cost of clog access in best case scenario appears to be
near zero, which I thought was interesting enough to point out.  What
I'm after here is the worst case scenario, how likely it is to happen,
and looking into possible remedies (if any).

I'm going to do lots more testing over the holidays.  I'm fishing for
ideas on good ways to flesh things out more.




Certainly having a choice about configuring them would be a good 
addition in itself, e.g  for data warehousing use the hint bits can be a 
considerable impediment so the *ability* to not have them would be a 
huge advantage.


if I have time over the early new year I'll do some testing too.

Cheers

Mark


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CRC checks WAS: How much do the hint bits help?

2010-12-22 Thread Josh Berkus

> I believe that most of the people talking about and wanting checksums
> so far have been wanting them to verify I/O, not to verify that PG has
> no bugs, that RAM is staying charged correctly, and that no stray bits
> have been flipped, and that nobody else happens to be scribbling over
> our shared buffers.

I agree that this should be our first goal.  Yes, we want to protect
users against memory errors as well.  However, that's a much tougher
feature to implement; I've done some hashing this out with engineers on
other DBMSes and nobody has good answers right now.  The overhead of
what Simon proposes would be enormous, and few users would be interested
in paying that cost.

Doing a CRC check-on-write, as well as checking for format corruption
before write would catch a majority of real-world problems.  Please
don't hold that up in pursuit of the bit-flipping problem, which
*nobody* has solved.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How much do the hint bits help?

2010-12-22 Thread Josh Berkus

> Certainly having a choice about configuring them would be a good
> addition in itself, e.g  for data warehousing use the hint bits can be a
> considerable impediment so the *ability* to not have them would be a
> huge advantage.

Would need to be a restart option, no?

Regarding the contention which Tom expects: the extra load on the CLOG
would be 100% reads, no?  If it's *all* reads, why would we have any
more contention than we have now?

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How much do the hint bits help?

2010-12-22 Thread Tom Lane
Josh Berkus  writes:
> Regarding the contention which Tom expects: the extra load on the CLOG
> would be 100% reads, no?  If it's *all* reads, why would we have any
> more contention than we have now?

Read involves sharelock which still causes contention.  Those bufmgr
contention storms we saw before were completely independent of whether
the pages were accessed for read or for write.

Another thing to keep in mind is that the current clog access code is
designed on the assumption that there's considerable locality of access
to pg_clog, ie, you usually only need to consult it for recent XIDs
because older ones have been hinted.  Turn off hint bits, that behavior
goes out the window.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How much do the hint bits help?

2010-12-22 Thread Dimitri Fontaine
Josh Berkus  writes: > I might be able to test 
on some client workloads.  We'll see; currently > I lack the 
harness to simulate a high level of client contention.   We're 
pretty successful in doing that with Tsung, even against large 
clusters of plproxy nodes.   http://tsung.erlang-projects.org/

 http://archives.postgresql.org/pgsql-admin/2008-12/msg00032.php

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How much do the hint bits help?

2010-12-22 Thread Mark Kirkwood

On 23/12/10 10:54, Tom Lane wrote:

Josh Berkus  writes:

Regarding the contention which Tom expects: the extra load on the CLOG
would be 100% reads, no?  If it's *all* reads, why would we have any
more contention than we have now?

Read involves sharelock which still causes contention.  Those bufmgr
contention storms we saw before were completely independent of whether
the pages were accessed for read or for write.

Another thing to keep in mind is that the current clog access code is
designed on the assumption that there's considerable locality of access
to pg_clog, ie, you usually only need to consult it for recent XIDs
because older ones have been hinted.  Turn off hint bits, that behavior
goes out the window.


Would a larger (or configurable) clog cache help with this tho?

Cheers

Mark

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How much do the hint bits help?

2010-12-22 Thread Simon Riggs
On Wed, 2010-12-22 at 22:08 +0200, Heikki Linnakangas wrote:
> On 22.12.2010 18:12, Merlin Moncure wrote:
> > On Wed, Dec 22, 2010 at 11:06 AM, Tom Lane  wrote:
> >> Merlin Moncure  writes:
> >>> well, simon's point that hint bits complicate checksum may nor may not
> >>> be the case, but no hint bits = less i/o = less checksumming (unless
> >>> you checksum around the hint bits).
> >>
> >> I think you're optimistically assuming the extra clog accesses don't
> >> cost any I/O.
> >
> > right, but clog is much more highly packed which is both a good and a
> > bad thing.
> 
> As a sidenote: note that the clog is not currently CRC'd.

Good point, thanks for mentioning it.

With 64kB of clog buffers and potentially 8 GB of shared_buffers, which
is about 10^5 more RAM for shared_buffers. So a protection mechanism for
shared_buffers will trap about 99.999% of RAM errors.

We might say that an error in clog could have a serious effect, and I
would agree. I don't see a way around that though, except for a CRC
check when we write to disk.

My understanding is that the context switch storms were because of the
I/O involved with thrashing the clog buffers. (Well, actually, I think
it was subtrans, but sane difference). To solve that, we could just swap
them out to shared_buffers with usage = 5 rather than evict them.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How much do the hint bits help?

2010-12-22 Thread Merlin Moncure
On Wed, Dec 22, 2010 at 4:54 PM, Tom Lane  wrote:
> Josh Berkus  writes:
>> Regarding the contention which Tom expects: the extra load on the CLOG
>> would be 100% reads, no?  If it's *all* reads, why would we have any
>> more contention than we have now?
>
> Read involves sharelock which still causes contention.  Those bufmgr
> contention storms we saw before were completely independent of whether
> the pages were accessed for read or for write.
>
> Another thing to keep in mind is that the current clog access code is
> designed on the assumption that there's considerable locality of access
> to pg_clog, ie, you usually only need to consult it for recent XIDs
> because older ones have been hinted.  Turn off hint bits, that behavior
> goes out the window.

That's not always going to be the case though.  In olap-ish
environments you will see cases of scans over many records that come
from a single transaction.  This is also the case where hint bits can
really drill you -- you insert a bunch of records, log the bits,
delete, log the bits, and vacuum eventually.  I started investigating
this on behalf of a friend who is experiencing basically the worst
case with regularity.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Database file copy

2010-12-22 Thread Srini Raghavan
Hello,

[Tried the general forum, didn't hear from anyone so far, trying this forum 
now, 
please review, thanks]


We are looking to distribute postgres databases to our customers along with our 
application. We are currently evaluating postgres version 8.4.4. The database 
can be of size 25 gb (compressed files fits in few dvds, the product is 
distributed on dvds). The pg_restore of this database takes several hours on 
the 
low end machines running windows os. The pg_restore is run during our product 
install, and the current install time projection is not acceptable. Our 
customers can purchase different databases over a period of time, and the 
application makes transactional updates to the databases after installation. 
Hence, copying the entire data folder instead of using the pg_restore is not an 
option, as the transactional updates will be lost.

I have read the documentation and the few posts available that discourages file 
copy based restore of individual databases, but, I have found a way to do this. 
I would appreciate if the experts can read and advise if the approach will 
work, 
given our environment and usage boundaries.

Master Postgres instance (this is where we create the data, we have complete 
control of this environment):
1. Create the database and populate data.
2. Set vacuum_freeze_table_age to 0 in the postgresql.conf
3. Run vacuum full - this will reset the row xid to the FrozenXid
4. Shutdown postgres and take a copy of the files for the given database.

In the deploy instance at the customer site:
1. Create the new database.
2. Shutdown postgres instance and copy the database files created in the master 
instance to the database specific folder.
3. Start postgres instance.

We don't use table row oids. If the cluster wide oid collides with the oid in 
the copied database files during subsequent ddl operations, postgres resolves 
this by skipping to the next available oid. There will be a delay to find the 
next available oid, which is acceptable in our case, as the ddl operations at 
the customer site are rare.  And, the vacuum full with vacuum_freeze_table_age 
set to 0 on the master instance takes care of the xmin, allowing transactions 
to 
be visible, and for further transactions at the customer site to continue 
without colliding. 


I have tested this and it works, and I am continuing to test it more. I would 
like for validation of this idea from the experts and the community to make 
sure 
I haven't overlooked something obvious that might cause issues.

Thank you,
Srini



  

Re: [HACKERS] knngist - 0.8

2010-12-22 Thread Bruce Momjian
Tom Lane wrote:
> Robert Haas  writes:
> > 2010/9/13 Teodor Sigaev :
> >> [updated patch]
> 
> > I realize I'm repeating myself, but...  this patch needs
> > documentation.  It's not optional.
> 
> I've applied all of this, and written documentation for all of it,
> except for the contrib/btree_gist additions which still need to be
> redone for the revised API (and then documented!).  My patience ran out
> somewhere around there, so I'm marking that part as returned with
> feedback.
> 
> What we have at this point (pending contrib/btree_gist fixes) is
> nearest-neighbor searching capability for point columns.  And
> trigram-based nearest-neighbor for text strings, if you install
> contrib/pg_trgm.  That doesn't seem like a lot of return for the
> amount of work that went into it.  Are there plans to add KNN support
> for any other standard types?

I was thinking /contrib/fuzzystrmatch could use it to find the words
that mostly closely match a string.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Revert default wal_sync_method to fdatasync on Linux 2.6.33+

2010-12-22 Thread Bruce Momjian
Josh Berkus wrote:
> On 12/6/10 6:13 PM, Tom Lane wrote:
> > Josh Berkus  writes:
> >> OK, patch coming then.  Right now test_fsync aborts when O_DIRECT fails.
> >>  What should I have it do instead?
> > 
> > Report that it fails, and keep testing the other methods.
> 
> Patch attached.  Includes a fair amount of comment cleanup, since
> existing comments did not meet our current project standards.  Tests all
> 6 of the methods we support separately.
> 
> Some questions, though:
> 
> (1) Why are we doing the open_sync different-size write test?  AFAIK,
> this doesn't match any behavior which PostgreSQL has.

I did that so we could see the impact of doing 2 8k writes that were
both fsync'ed vs doing one 16k write and then fsync:

Compare open_sync with different sizes:
open_sync 16k write 201.323/second
2 open_sync 8k writes   332.466/second

We often write multiple 8k WAL pages and then fsync on commit.

> (2) In this patch, I'm stepping down the number of loops which
> fsync_writethrough does by 90%.  The reason for that was that on the
> platforms where I tested writethrough (desktop machines), doing 10,000
> loops took 15-20 *minutes*, which seems hard on the user.  Would be easy
> to revert if you think it's a bad idea.
>   Possibly auto-sizing the number of loops based on the first fsync test
> might be a good idea, but seems like going a bit too far.

Sure, I recently increased the number, probably too much.

> (3) Should the multi-descriptor test be using writethrough on platforms
> which support it?

Uh, I didn't think that would matter because the test is to test kernel
behavior of writing to one file descriptor and fsyncing using another.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Revert default wal_sync_method to fdatasync on Linux 2.6.33+

2010-12-22 Thread Bruce Momjian
Tom Lane wrote:
> Josh Berkus  writes:
> > Making it support O_DIRECT would be possible but more complex; I don't
> > see the point unless we think we're going to have open_sync_with_odirect
> > as a seperate option.
> 
> Whether it's complex or not isn't really the issue.  The issue is that
> what test_fsync is testing had better match what the backend does, or
> people will be making choices based on not-comparable test results.
> I think we should have test_fsync just automatically fold in O_DIRECT
> the same way the backend does.

The problem is that O_DIRECT was not implemented in macros but rather
down in the code:

if (!XLogIsNeeded() && !am_walreceiver)
o_direct_flag = PG_O_DIRECT;

Which means if we just export the macros, we would still not have caught
this.  I would like to share all the defines --- I am just saying it
isn't trivial.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch BUG #5103: "pg_ctl -w (re)start" fails with custom unix_socket_directory

2010-12-22 Thread Bruce Momjian
Alvaro Herrera wrote:
> Excerpts from Quan Zongliang's message of mar dic 21 18:36:11 -0300 2010:
> > On Mon, 29 Nov 2010 10:29:17 -0300
> > Alvaro Herrera  wrote:
> > 
> 
> > > I think the way this should work is that you call postmaster with a new
> > > switch and it prints out its configuration, after reading the
> > > appropriate config file(s).  That way it handles all the little details
> > > such as figuring out the correct config file, hadle include files, etc.
> > > This output would be presumably easier to parse and more trustworthy.
> > 
> > Sorry for my late reply.
> > 
> > I will check the source of postmaster.
> 
> Actually Bruce Momjian is now working on a different fix:
> unix_socket_directory would be added to postmaster.pid, allowing pg_ctl
> to find it.

Yes, I will apply this patch tomorrow and it will be in 9.1.  Thanks for
the report.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pl/python improvements

2010-12-22 Thread Jan Urbański
On 08/12/10 22:41, Peter Eisentraut wrote:
> On tis, 2010-12-07 at 23:56 +0100, Jan Urbański wrote:
>> Peter suggested having a mail/patch per feature and the way I intend
>> to do that is instead of having a dozen branches, have one and after
>> I'm done rebase it interactively to produce incremental patches that
>> apply to master, each one implementing one feature.
> 
> Fair enough if you want to do it that way, but I'd encourage you to just
> send in any self-contained features/changes that you have finished.

I finished work on the PL/Python improvements. There 9 separate patches,
with some interdependencies.

The features implemented are:
 * general refactoring, dropping the global error state, using dynahash
instead of PyDicts for procedure caching, etc
 * a validator function
 * executing SPI calls in subtransactions
 * starting explicit subtransactions to have atomic behaviour of
multiple SPI calls
 * providing custom exceptions for SPI errors, so you can catch only
UniqueViolations and not have to muck around with SQLCODE
 * invalidate functions accepting composite types if the type changes
after the functions has been defined
 * traceback support
 * table functions (ability to return RECORD and SETOF RECORD)
 * using custom parsers for datatypes + an example application with
dict<->hstore parsing

The dependencies are:

refactor -> validator
 -> SPI in subxacts
  -> explicit subxacts
 -> custom exceptions for SPI
 -> invalidate composites
 -> tracebacks
 -> table functions
 -> custom parsers

so everything depends on the refactoring patch, and the SPI changes are
incremental.

I will generate these patches and send them in in separate threads/add
them to the commitfest app before Christmas (I hope). Meanwhile the code
is available at https://github.com/wulczer/postgres. You will find 10
branches there, 9 correspond to these features, and the "plpython"
branch is the sum of them all.

>From now on I will stop rebasing these branches, and instead just commit
fixes to the appropriate branch and merge upwards (from refactor to
everything, from spi-in-subxacts to explicit-subxacts and to
custom-spi-exceptions, etc). I'll also try to independently maintain the
plpython branch with the sum to make sure nothing has been left out as
the feature branches get merged into mainline.

Here's to hoping that this will not turn into a conflict hell.

Cheers,
Jan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] knngist - 0.8

2010-12-22 Thread Robert Haas
On Wed, Dec 22, 2010 at 8:04 PM, Bruce Momjian  wrote:
> Tom Lane wrote:
>> Robert Haas  writes:
>> > 2010/9/13 Teodor Sigaev :
>> >> [updated patch]
>>
>> > I realize I'm repeating myself, but...  this patch needs
>> > documentation.  It's not optional.
>>
>> I've applied all of this, and written documentation for all of it,
>> except for the contrib/btree_gist additions which still need to be
>> redone for the revised API (and then documented!).  My patience ran out
>> somewhere around there, so I'm marking that part as returned with
>> feedback.
>>
>> What we have at this point (pending contrib/btree_gist fixes) is
>> nearest-neighbor searching capability for point columns.  And
>> trigram-based nearest-neighbor for text strings, if you install
>> contrib/pg_trgm.  That doesn't seem like a lot of return for the
>> amount of work that went into it.  Are there plans to add KNN support
>> for any other standard types?
>
> I was thinking /contrib/fuzzystrmatch could use it to find the words
> that mostly closely match a string.

Seems unlikely to be feasible.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] english parser in text search: support for multiple words in the same position

2010-12-22 Thread Sushant Sinha
Just a reminder that this patch is discussing  how to break url, emails etc
into its components.

On Mon, Oct 4, 2010 at 3:54 AM, Tom Lane  wrote:

> [ sorry for not responding on this sooner, it's been hectic the last
>  couple weeks ]
>
> Sushant Sinha  writes:
> >> I looked at this patch a bit.  I'm fairly unhappy that it seems to be
> >> inventing a brand new mechanism to do something the ts parser can
> >> already do.  Why didn't you code the url-part mechanism using the
> >> existing support for compound words?
>
> > I am not familiar with compound word implementation and so I am not sure
> > how to split a url with compound word support. I looked into the
> > documentation for compound words and that does not say much about how to
> > identify components of a token.
>
> IIRC, the way that that works is associated with pushing a sub-state
> of the state machine in order to scan each compound-word part.  I don't
> have the details in my head anymore, though I recall having traced
> through it in the past.  Look at the state machine actions that are
> associated with producing the compound word tokens and sub-tokens.
>

I did look around for compound word support in postgres. In particular, I
read the documentation and code in tsearch/spell.c that seems to implement
the compound word support.

So in my understanding the way it works is:

1. Specify a dictionary of words in which each word will have applicable
prefix/suffix flags
2. Specify a flag file that provides prefix/suffix operations on those flags
3. flag z indicates that a word in the dictionary can participate in
compound word splitting
4. When a token matches words specified in the dictionary (after applying
affix/suffix operations), the matching words are emitted as sub-words of the
token (i.e., compound word)

If my above understanding is correct, then I think it will not be possible
to implement url/email splitting using the compound word support.

The main reason is that the compound word support requires the
"PRE-DETERMINED" dictionary of words. So to split a url/email we will need
to provide a list of *all possible* host names and user names. I do not
think that is a possibility.

Please correct me if I have mis-understood something.

-Sushant.