[HACKERS] some notes about implementation of SQL/PSM in PostgreSQL

2010-11-15 Thread Pavel Stehule
Hello

I try to implement SQL/PSM in PostgreSQL. I experimented with
rewriting a plpgsql - you can check this implementation
http://postgres.cz/index.php/SQL/PSM_Manual.

I though so we can have some common PL stack. But I don't think it
now. It means a some more uglification of current code or uglification
of our SQL/PSM implementation.

Why? PL/pgSQL language is extremely adjusted to PostgreSQL - it's some
combination of strong static language based on relative high dynamic
interpret.

For example:
* variables are static (left part of assign statement)
* assign statement and expressions are "pseudo static" - type is
specified when query is first evaluated, and cannot be changes later
* there are not a strong relation between left part and right part of
assign statement - PLpgSQL use a cast or IO cast when it's necessary

PL/pgSQL needs to know all objects before first using - it doesn't to
use a "reflection" on SQL level - see different a usage of plpgsql FOR
IN query and PL/SQL FOR IN query

Current implementation has some large advantages  or disadvantages - I
like PLpgSQL and I am sure so it is impossible to change current
behave now. This is just recapitulation:

+ there is a less dependency between functions - a function has to
exists just in moment when it is exactly executed
+ there isn't a complication with recompilation of functions - all
work is transparent
+ current implementation is fast - use a three level for access to
variables scalar, row, record
+ there is zero relation between tables and functions
+ functions is a dynamic - respect a changes in search_path (sometimes)
+ language is relative rich and allow very comfortable work with dynamic SQL
+ there are not standard - we are able to enhance language

- current implementation is too complex - there are a three different
access to variables
- there is relative big overhead when functions is starting first time
- there are lot of SQL parsing
- current validation checking  only syntax - it doesn't check a
references to SQL objects - so some errors can be found too late
- the zero relation between expression and variable means a lot of
casts and some IO casts
- older code can be quietly broken by ALTER TABLE ADD COLUMN statement
- there can be a collision between PL and SQL identifiers
- functions is too dynamic (search_path can be a security hole)
- there are a still some dependency issues - somewhere is necessary to
logout/loging
- isn't possible to call plpgsql function like coroutines - so there
are not possibility to effectively limit SRF functions
- there is issue with blind SQL optimalization - bad plans
- there is issue with slow varlena data type modifications in large loops
- there is issue with iteration over record

I don't thing so all these issues can be solved - some needs a
dependency between functions and it means a problem with updates on
production's servers. Solving a other can means a breaking
compatibility etc..

Because SQL/PSM is different language and because it's not implemented
yet, we can to address some issues - absolutely not all
* simpler interpret - there must not be a different access to scalar,
row or record variable
* language can be a more dynamic or more static - current level of
PLpgSQL is little bit strange - with large impact on interpret
complexity
* more static behave can accelerate a first start - SQL are analyzed once time
* we can use a simple interpret where is possible to store state in
every moment and call PSM function like couroutine (stack can be
stored inside fn_extra).

I am thinking about some low level PL language - like PLSPI or some
similar - SQL/PSM functions should be compiled to this low level
language. This language can be simply parsed (first start will be
faster) and can be simply serialized and deserialized - it can be
important for people who afraid about their source codes - they can
compile to PLPSM and they can distribute only this pcode.

There are a two issues of standard

1. it doesn't to know RETURN NEXT, and what I know all SQL/PSM
implementation doesn't support similar construct - so It's should not
be in pgPSM too.

2. there are very specific statement "FOR IN query" - the PL variables
are created automatically based on query result. This isn't possible
in PL/pgSQL, because we must not to analyze query in validation time.
This is real problem - it can have a impact on pg_dump, because it
needs to dump functions with empty body first, and then dump full
function again.

Because PL/pgSQL has a long history and we cannot to leave it, I am
inclined to think so better variant for implementation of SQL/PSM is
starting with a different runtime. Probably only one interesting part
that can be shared is expression evaluator. I am thinking, so this
moving this to SPI can be interesting for all PL.

Regards

Pavel Stehule

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] MULTISET and additional functions for ARRAY

2010-11-15 Thread Itagaki Takahiro
On Mon, Nov 15, 2010 at 14:37, Pavel Stehule  wrote:
> Maybe can be useful to implement a searching on sorted array.
> You can hold a flag if multiset is sorted or not.

Are you suggesting to add an IS_SORTED bit to for each ArrayType value?
It might be possible, but I'm not sure how much it is useful.

Another issue for sorting is that we have 4 kinds of sorting: ASC/DESC
and NULLS FIRST/LAST. I always used the default sort order (ASC NULLS LAST)
in the patch, but we might need to support all of them when we export the
functionality. Then, we would need 2 bits for IS_SORTED flags.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix for seg picksplit function

2010-11-15 Thread Alexander Korotkov
With help of Oleg I found, that line "*left = *right = FirstOffsetNumber;"
was needed only for 7.X compatibility, and it isn't needed any more.
Also, I've replaced "i - 1" by "i - FirstOffsetNumber" in array filling.
I believe it's more correct way, because it'll work correctly in the case
when FirstOffsetNumber alters.


With best regards,
Alexander Korotkov.
*** a/contrib/seg/seg.c
--- b/contrib/seg/seg.c
***
*** 292,329  gseg_penalty(GISTENTRY *origentry, GISTENTRY *newentry, float *result)
  	return (result);
  }
  
  
  
  /*
  ** The GiST PickSplit method for segments
! ** We use Guttman's poly time split algorithm
  */
  GIST_SPLITVEC *
  gseg_picksplit(GistEntryVector *entryvec,
  			   GIST_SPLITVEC *v)
  {
! 	OffsetNumber i,
! j;
! 	SEG		   *datum_alpha,
! 			   *datum_beta;
  	SEG		   *datum_l,
  			   *datum_r;
! 	SEG		   *union_d,
! 			   *union_dl,
! 			   *union_dr;
! 	SEG		   *inter_d;
! 	bool		firsttime;
! 	float		size_alpha,
! size_beta,
! size_union,
! size_inter;
! 	float		size_waste,
! waste;
! 	float		size_l,
! size_r;
  	int			nbytes;
! 	OffsetNumber seed_1 = 1,
! seed_2 = 2;
  	OffsetNumber *left,
  			   *right;
  	OffsetNumber maxoff;
--- 292,333 
  	return (result);
  }
  
+ /*
+  * Auxiliary structure for picksplit method.
+  */
+ typedef struct
+ {
+ 	int index;
+ 	SEG *data;
+ } PickSplitSortItem;
  
+ /*
+  * Compare function for PickSplitSortItem based on seg_cmp.
+  */
+ static int
+ sort_item_cmp(const void *a, const void *b)
+ {
+ 	PickSplitSortItem *i1 = (PickSplitSortItem *)a;
+ 	PickSplitSortItem *i2 = (PickSplitSortItem *)b;
+ 	return seg_cmp(i1->data, i2->data);
+ }
  
  /*
  ** The GiST PickSplit method for segments
! ** Algorithm based on sorting. Incoming array of segs is sorting using seg_cmp
! ** function. After that first half of segs goes to the left datum, and the
! ** second half of segs goes to the right datum.
  */
  GIST_SPLITVEC *
  gseg_picksplit(GistEntryVector *entryvec,
  			   GIST_SPLITVEC *v)
  {
! 	OffsetNumber i;
  	SEG		   *datum_l,
  			   *datum_r;
! 	PickSplitSortItem	*sortItems;
  	int			nbytes;
! 	OffsetNumber seed_2;
  	OffsetNumber *left,
  			   *right;
  	OffsetNumber maxoff;
***
*** 332,443  gseg_picksplit(GistEntryVector *entryvec,
  	fprintf(stderr, "picksplit\n");
  #endif
  
! 	maxoff = entryvec->n - 2;
! 	nbytes = (maxoff + 2) * sizeof(OffsetNumber);
  	v->spl_left = (OffsetNumber *) palloc(nbytes);
  	v->spl_right = (OffsetNumber *) palloc(nbytes);
  
! 	firsttime = true;
! 	waste = 0.0;
! 
! 	for (i = FirstOffsetNumber; i < maxoff; i = OffsetNumberNext(i))
  	{
! 		datum_alpha = (SEG *) DatumGetPointer(entryvec->vector[i].key);
! 		for (j = OffsetNumberNext(i); j <= maxoff; j = OffsetNumberNext(j))
! 		{
! 			datum_beta = (SEG *) DatumGetPointer(entryvec->vector[j].key);
! 
! 			/* compute the wasted space by unioning these guys */
! 			/* size_waste = size_union - size_inter; */
! 			union_d = seg_union(datum_alpha, datum_beta);
! 			rt_seg_size(union_d, &size_union);
! 			inter_d = seg_inter(datum_alpha, datum_beta);
! 			rt_seg_size(inter_d, &size_inter);
! 			size_waste = size_union - size_inter;
! 
! 			/*
! 			 * are these a more promising split that what we've already seen?
! 			 */
! 			if (size_waste > waste || firsttime)
! 			{
! waste = size_waste;
! seed_1 = i;
! seed_2 = j;
! firsttime = false;
! 			}
! 		}
  	}
  
  	left = v->spl_left;
  	v->spl_nleft = 0;
  	right = v->spl_right;
  	v->spl_nright = 0;
  
! 	datum_alpha = (SEG *) DatumGetPointer(entryvec->vector[seed_1].key);
! 	datum_l = seg_union(datum_alpha, datum_alpha);
! 	rt_seg_size(datum_l, &size_l);
! 	datum_beta = (SEG *) DatumGetPointer(entryvec->vector[seed_2].key);
! 	datum_r = seg_union(datum_beta, datum_beta);
! 	rt_seg_size(datum_r, &size_r);
! 
! 	/*
! 	 * Now split up the regions between the two seeds.	An important property
! 	 * of this split algorithm is that the split vector v has the indices of
! 	 * items to be split in order in its left and right vectors.  We exploit
! 	 * this property by doing a merge in the code that actually splits the
! 	 * page.
! 	 *
! 	 * For efficiency, we also place the new index tuple in this loop. This is
! 	 * handled at the very end, when we have placed all the existing tuples
! 	 * and i == maxoff + 1.
! 	 */
! 
! 	maxoff = OffsetNumberNext(maxoff);
! 	for (i = FirstOffsetNumber; i <= maxoff; i = OffsetNumberNext(i))
  	{
! 		/*
! 		 * If we've already decided where to place this item, just put it on
! 		 * the right list.	Otherwise, we need to figure out which page needs
! 		 * the least enlargement in order to store the item.
! 		 */
! 
! 		if (i == seed_1)
! 		{
! 			*left++ = i;
! 			v->spl_nleft++;
! 			continue;
! 		}
! 		else if (i == seed_2)
! 		{
! 			*right++ = i;
! 			v->spl_nright++;
! 			continue;
! 		}
! 
! 		/* okay, which page needs least enlargement? */
! 		datum_alpha = (SEG *) DatumGetPointer(entryvec->vect

Re: [HACKERS] MULTISET and additional functions for ARRAY

2010-11-15 Thread Pavel Stehule
2010/11/15 Itagaki Takahiro :
> On Mon, Nov 15, 2010 at 14:37, Pavel Stehule  wrote:
>> Maybe can be useful to implement a searching on sorted array.
>> You can hold a flag if multiset is sorted or not.
>
> Are you suggesting to add an IS_SORTED bit to for each ArrayType value?
> It might be possible, but I'm not sure how much it is useful.

It's depend on implementation. You can use this flag for large set and
operation MEMBER OF. When you know, so array is sorted, then you can
use a fast search.

>
> Another issue for sorting is that we have 4 kinds of sorting: ASC/DESC
> and NULLS FIRST/LAST. I always used the default sort order (ASC NULLS LAST)
> in the patch, but we might need to support all of them when we export the
> functionality. Then, we would need 2 bits for IS_SORTED flags.
>

yes, probably

Pavel

> --
> Itagaki Takahiro
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MULTISET and additional functions for ARRAY

2010-11-15 Thread Andres Freund
On Monday 15 November 2010 09:47:15 Itagaki Takahiro wrote:
> On Mon, Nov 15, 2010 at 14:37, Pavel Stehule  
wrote:
> > Maybe can be useful to implement a searching on sorted array.
> > You can hold a flag if multiset is sorted or not.
> 
> Are you suggesting to add an IS_SORTED bit to for each ArrayType value?
> It might be possible, but I'm not sure how much it is useful.
I actually wished for that several times and solved it by putting up 
is_sorted() constraints everywhere, but its a pain and restricts generallity.

(One was e.g. for implementing an efficent array_union() aggregate).

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Improved parallel make support

2010-11-15 Thread Bernd Helmle



--On 14. November 2010 11:08:13 -0500 Robert Haas  
wrote:



+1.  The current master branch fails to build on my (rather new) Mac with
make -j2.  I could upgrade my toolchain but it seems like more trouble
than it's worth, not to mention a possible obstacle to new users and
developers.


The same here, too. And it doesn't matter if you use the shipped make 
(3.81) or the one from macports (currently 3.82), both are failing with:


ld: file not found: ../../../../../../src/backend/postgres
collect2: ld returned 1 exit status
make[3]: *** [ascii_and_mic.so] Error 1
make[2]: *** [all-ascii_and_mic-recurse] Error 2
make[1]: *** [all-backend/utils/mb/conversion_procs-recurse] Error 2
make[1]: *** Waiting for unfinished jobs

--
Thanks

Bernd

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] How to rename each field in ROW expression?

2010-11-15 Thread Itagaki Takahiro
We can rename each field in VALUES clause:

=# SELECT * FROM (VALUES(123, 'ABC', NULL)) AS t(a,b,c);
  a  |  b  | c
-+-+---
 123 | ABC |
(1 row)

But I cannot find ways for ROW expression to do the same thing:

=# SELECT ROW(123, 'ABC', NULL) AS (a,b,c);
ERROR:  syntax error at or near "("
LINE 1: SELECT ROW(123, 'ABC', NULL) AS (a,b,c);
^
=# SELECT (ROW(123, 'ABC', NULL)).*;
ERROR:  record type has not been registered

Is it possible to change names fields in ROW?
We can use CREATE TYPE AS on ahead, but I'd like to
change names of ROW expression in ad-hoc queries.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Per-column collation

2010-11-15 Thread Pavel Stehule
Hello

I am checking a patch. I found a problem with initdb

[postg...@pavel-stehule postgresql]$ /usr/local/pgsql/bin/initdb -D
/usr/local/pgsql/data/
could not change directory to "/home/pavel/src/postgresql"
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale en_US.utf8.
The default database encoding has accordingly been set to UTF8.
The default text search configuration will be set to "english".

fixing permissions on existing directory /usr/local/pgsql/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 24MB
creating configuration files ... ok
creating template1 database in /usr/local/pgsql/data/base/1 ... ok
initializing pg_authid ... ok
initializing dependencies ... ok
creating system views ... ok
loading system objects' descriptions ... ok
creating collations ...initdb: locale name has non-ASCII characters,
skipped: bokm�linitdb: locale name has non-ASCII characters, skipped:
fran�aiscould not determine encoding for locale "hy_AM.armscii8":
codeset is "ARMSCII-8"
could not determine encoding for locale "ka_GE": codeset is "GEORGIAN-PS"
could not determine encoding for locale "ka_GE.georgianps": codeset is
"GEORGIAN-PS"
could not determine encoding for locale "kk_KZ": codeset is "PT154"
could not determine encoding for locale "kk_KZ.pt154": codeset is "PT154"
could not determine encoding for locale "tg_TJ": codeset is "KOI8-T"
could not determine encoding for locale "tg_TJ.koi8t": codeset is "KOI8-T"
could not determine encoding for locale "thai": codeset is "TIS-620"
could not determine encoding for locale "th_TH": codeset is "TIS-620"
could not determine encoding for locale "th_TH.tis620": codeset is "TIS-620"
could not determine encoding for locale "vi_VN.tcvn": codeset is "TCVN5712-1"
FATAL:  invalid byte sequence for encoding "UTF8": 0xe56c27
child process exited with exit code 1
initdb: removing contents of data directory "/usr/local/pgsql/data

tested on fedora 13

[postg...@pavel-stehule local]$ locale -a| wc -l
731

Regards

Pavel Stehule



2010/11/15 Peter Eisentraut :
> Here is the next patch in this epic series. [0]
>
> I have addressed most of the issues pointed out in previous reviews and
> removed all major outstanding problems that were marked in the code.  So
> it might just almost really work.
>
> The documentation now also covers everything that's interesting, so
> newcomers can start with that.
>
>
> For those who have previously reviewed this, two major changes:
>
> * The locales to be loaded are now computed by initdb, no longer during
> the build process.
>
> * The regression test file has been removed from the main test set.  To
> run it, use
>
> make check MULTIBYTE=UTF8 EXTRA_TESTS=collate
>
>
> Stuff that still cannot be expected to work:
>
> * no CREATE COLLATION yet, maybe later
>
> * no support for regular expression searches
>
> * not text search support
>
> These would not be release blockers, I think.
>
>
> [0] 
> http://archives.postgresql.org/message-id/1284583568.4696.20.ca...@vanquo.pezone.net
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Amazon now supporting GPU focused EC2 instances

2010-11-15 Thread Greg Stark
I keep wondering if there's a role for GPUs in Postgres and haven't
figure out how to integrate them yet but the day when we'll be
expected to exploit them seems to be getting nearer:

http://aws.typepad.com/aws/2010/11/new-ec2-instance-type-the-cluster-gpu-instance.html

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Amazon now supporting GPU focused EC2 instances

2010-11-15 Thread Thom Brown
On 15 November 2010 11:26, Greg Stark  wrote:

> I keep wondering if there's a role for GPUs in Postgres and haven't
> figure out how to integrate them yet but the day when we'll be
> expected to exploit them seems to be getting nearer:
>
>
> http://aws.typepad.com/aws/2010/11/new-ec2-instance-type-the-cluster-gpu-instance.html
>
>  


Is this somewhere OpenCL is an option?

-- 
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935


Re: [HACKERS] Amazon now supporting GPU focused EC2 instances

2010-11-15 Thread Greg Stark
On Mon, Nov 15, 2010 at 11:37 AM, Thom Brown  wrote:
> Is this somewhere OpenCL is an option?
>

Sure.

Personally I wonder whether the context switching is fast enough to
handle a multi-user system. In the past graphics cards have always
been targeted at the idea of a single-user workstation playing games
where the game wants to talk directly to the graphics card. But
increasingly that's not true, with things like opengl based window
managers and so on. Perhaps things have changed enough that it would
be conceivable to have dozens of separate processes all downloading
snippets of code and time-sharing the GPU now.

The other problem is that Postgres is really not architected in a way
to make this easy. Since our data types are all flexible pluggable
sets of functions it's unclear whether any of them match the data
types that GPUs know about. The obvious algorithm to toss to the GPU
would be sorting -- but that would only work for floats and even then
it's not clear to me that the float semantics on the GPU necessarily
match those of the host processor.

I've seen papers on doing relational joins using GPUs and I'm sure
there are other algorithms we wonderful stuff we could do. But if it
comes at the cost of being able to handle arbitrary join clauses it'll
be a tough sacrifice to make.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Amazon now supporting GPU focused EC2 instances

2010-11-15 Thread Yeb Havinga

On 2010-11-15 12:37, Thom Brown wrote:
On 15 November 2010 11:26, Greg Stark > wrote:


I keep wondering if there's a role for GPUs in Postgres and haven't
figure out how to integrate them yet but the day when we'll be
expected to exploit them seems to be getting nearer:


http://aws.typepad.com/aws/2010/11/new-ec2-instance-type-the-cluster-gpu-instance.html


Is this somewhere OpenCL is an option?

There is a talk about PgOpenCL upcoming -
http://www.postgresql.eu/events/schedule/pgday2010/session/56-introduction-to-pgopencl-a-new-procedural-language-for-postgresql-unlocking-the-power-of-the-gpgpu/

Maybe I've sent this link earlier, but the VLDB 2008 paper Parallelizing 
Query Optimization (http://www.vldb.org/pvldb/1/1453882.pdf) might be 
interesting: not much IO between CPU and GPU. (hmm how much catalog 
access is necessary for cost estimization)). I suspect the biggest 
challenge is rewriting essential parts into a SIMD algorithm.


regards,
Yeb Havinga



Re: [HACKERS] changing MyDatabaseId

2010-11-15 Thread Alvaro Herrera
Excerpts from Tom Lane's message of lun nov 15 02:41:40 -0300 2010:

> I believe also that there are probably race conditions in several of
> the steps you listed; in particular there is certainly a risk involved
> in changing the database-we-advertise-being-connected-to versus a
> concurrent DROP DATABASE.  Maybe that's easily soluble, but I'm not sure
> about it.

Perhaps it would help to first disconnect from the original database and
not reconnect to any one in particular, i.e. stay in a state like the
autovac launcher is in.  Then cleanup as per Robert's list, then switch
to a valid database ID.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] changing MyDatabaseId

2010-11-15 Thread Robert Haas
On Mon, Nov 15, 2010 at 12:41 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> Looking through the code, it appears to me that we'd need to do the
>> following (not necessarily in this order):
>
> Don't forget
>
> 9. Unload loadable modules that do not exist according to the new
> database's catalogs; eg we don't want postgis trying to run when
> its supporting tables don't exist in the selected database.

I might be missing something here, but I wasn't aware that the
catalogs had any say in what loadable modules exist.  Obviously their
contents will determine whether such loadable modules ever get invoked
as a result of actions at the SQL level, but if they don't it may not
matter much that they're loaded but idle.

> 10. Somehow persuade remaining loadable modules to discard whatever
> state they have that might be database-local.

Ouch.

> We don't have workable APIs for either of those operations ATM.

Somewhat independently of this problem, it would be nice to be able to
unload a loadable module, and I think that we currently don't support
that at all.  One thing (the only thing?) that makes that a
show-stopper is that the way we manage hook function chaining makes it
impossible for a module to reliably unload itself.  If you load both
auto_explain and pg_stat_statements, for example, you end up with a
tangle of pointers that can't be easily unwound.  I don't have a real
clear idea how to fix it, but the status quo certainly doesn't seem
optimal.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-15 Thread Robert Haas
On Mon, Nov 15, 2010 at 2:15 AM, Heikki Linnakangas
 wrote:
>>> Can you elaborate?
>>
>> Weak memory ordering means that stores into shared memory initiated by
>> one processor are not guaranteed to be observed to occur in the same
>> sequence by another processor.  This implies first that the latch code
>> could malfunction all by itself, if two processes manipulate a latch at
>> about the same time, and second (probably much less likely) that there
>> could be a malfunction involving a process that's waited on a latch not
>> seeing the shared-memory status updates that another process did "before"
>> setting the latch.
>>
>> This is not at all hypothetical --- my first attempt at rewriting the
>> sinval signaling code, a couple years back, failed on PPC machines in
>> the buildfarm because of exactly this type of issue.
>
> Hmm, SetLatch only sets one flag, so I don't see how it could malfunction
> all by itself. And I would've thought that declaring the Latch variable
> "volatile" prevents rearrangements.

It's not a question of code rearrangement.  Suppose at time zero, the
latch is unset, but owned.  At approximately the same time, SetLatch()
is called in one process and WaitLatch() in another process.
SetLatch() sees that the latch is not set and sends SIGUSR1 to the
other process.  The other process receives the signal but, since
waiting is not yet set, it ignores the signal.  It then drains the
self-pipe and examines latch->is_set.  But as it turns out, the update
by the process which called SetLatch() isn't yet visible to this
process, because this process has a copy of those bytes in some
internal cache that isn't guaranteed to be fully coherent.  So even
though SetLatch() already changed latch->is_set to true, it still
looks false here.  Therefore, we go to sleep on the latch.

At this point, we are very likely screwed.  If we're lucky, yet a
third process will come along, also see the latch as still unset (even
though it is), and set it again, waking up the owner.  But if we're
unlucky, by the time that third process comes along, the memory update
will have become visible everywhere and all future calls to SetLatch()
will exit quickly, leaving the poor shmuck who waited on the latch
sleeping for all eternity.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-15 Thread Heikki Linnakangas

On 15.11.2010 15:22, Robert Haas wrote:

On Mon, Nov 15, 2010 at 2:15 AM, Heikki Linnakangas
  wrote:

Can you elaborate?


Weak memory ordering means that stores into shared memory initiated by
one processor are not guaranteed to be observed to occur in the same
sequence by another processor.  This implies first that the latch code
could malfunction all by itself, if two processes manipulate a latch at
about the same time, and second (probably much less likely) that there
could be a malfunction involving a process that's waited on a latch not
seeing the shared-memory status updates that another process did "before"
setting the latch.

This is not at all hypothetical --- my first attempt at rewriting the
sinval signaling code, a couple years back, failed on PPC machines in
the buildfarm because of exactly this type of issue.


Hmm, SetLatch only sets one flag, so I don't see how it could malfunction
all by itself. And I would've thought that declaring the Latch variable
"volatile" prevents rearrangements.


It's not a question of code rearrangement.


Rearrangement of code, rearrangement of CPU instructions, or 
rearrangement of the order the changes in the memory become visible to 
other processes. The end result is the same.



 Suppose at time zero, the
latch is unset, but owned.  At approximately the same time, SetLatch()
is called in one process and WaitLatch() in another process.
SetLatch() sees that the latch is not set and sends SIGUSR1 to the
other process.  The other process receives the signal but, since
waiting is not yet set, it ignores the signal.  It then drains the
self-pipe and examines latch->is_set.  But as it turns out, the update
by the process which called SetLatch() isn't yet visible to this
process, because this process has a copy of those bytes in some
internal cache that isn't guaranteed to be fully coherent.  So even
though SetLatch() already changed latch->is_set to true, it still
looks false here.  Therefore, we go to sleep on the latch.


Surely marking the latch pointer volatile would force the store to 
is_set to be flushed, if not immediately, at least before the kill() 
system call. No?


Looking at Tom's patch in sinvaladt.c, the problem there was that the 
the store of the shared maxMsgNum variable could become visible to other 
processes after the store of the message itself. Using a volatile 
pointer for maxMsgNum would not have helped with that, because the 
operations on other variables might still be rearranged with respect to 
the store of maxMsgNum. SetLatch is simpler, there is only one variable 
(ok, two, but your scenario didn't involve a change in owner_pid). It 
seems safe to assume that the store becomes visible before the system call.


Tom's other scenario, where changing some other variable in shared 
memory might not have become visible to other processes when SetLatch() 
runs, seems more plausible (if harder to run into in practice). But if 
the variable is meant to be examined by other processes, then you should 
use a lock to protect it. Otherwise you'll have concurrency issues 
anyway. Or at least use a volatile pointer, I believe it's safe to 
assume that two operations using a volatile pointer will not be 
rearranged wrt. each other.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-15 Thread Robert Haas
On Mon, Nov 15, 2010 at 8:45 AM, Heikki Linnakangas
 wrote:
>> It's not a question of code rearrangement.
>
> Rearrangement of code, rearrangement of CPU instructions, or rearrangement
> of the order the changes in the memory become visible to other processes.
> The end result is the same.

I'll let Tom speak to this, because he understands it better than I
do, but I don't think this is really true.  Rearrangement of code is
something that the compiler has control over, and volatile addresses
that issue by preventing the compiler from making certain assumptions,
but the order in which memory operations become visible is a result of
the architecture of the memory bus, and the compiler doesn't directly
do anything about that.  It won't for example emit a cmpxchg
instruction rather than a simple store just because you declared the
variable volatile.

>>  Suppose at time zero, the
>> latch is unset, but owned.  At approximately the same time, SetLatch()
>> is called in one process and WaitLatch() in another process.
>> SetLatch() sees that the latch is not set and sends SIGUSR1 to the
>> other process.  The other process receives the signal but, since
>> waiting is not yet set, it ignores the signal.  It then drains the
>> self-pipe and examines latch->is_set.  But as it turns out, the update
>> by the process which called SetLatch() isn't yet visible to this
>> process, because this process has a copy of those bytes in some
>> internal cache that isn't guaranteed to be fully coherent.  So even
>> though SetLatch() already changed latch->is_set to true, it still
>> looks false here.  Therefore, we go to sleep on the latch.
>
> Surely marking the latch pointer volatile would force the store to is_set to
> be flushed, if not immediately, at least before the kill() system call. No?

I don't think so.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4-vintage problem in postmaster.c

2010-11-15 Thread Alvaro Herrera
Excerpts from Tom Lane's message of sáb nov 13 19:07:50 -0300 2010:
> Stefan Kaltenbrunner  writes:
> > On 11/13/2010 06:58 PM, Tom Lane wrote:
> >> Just looking at it, I think that the logic in canAcceptConnections got
> >> broken by somebody in 8.4, and then broken some more in 9.0: in some
> >> cases it will return an "okay to proceed" status without having checked
> >> for TOOMANY children.  Was this system possibly in PM_WAIT_BACKUP or
> >> PM_HOT_STANDBY state?  What version was actually running?
> 
> > I don't have too many details on the actual setup (working on that) but 
> > the box in question is running 8.4.2 and had no issues before the 
> > upgrade to 8.4 (ie 8.3 was reported to work fine - so a 8.4+ breakage 
> > looks plausible).
> 
> Well, this failure would certainly involve a flood of connection
> attempts, so it's possible it's a pre-existing bug that they just did
> not happen to trip over before.  But the sequence of events that I'm
> thinking about is a smart shutdown attempt (SIGTERM to postmaster)
> while an online backup is in progress, followed by a flood of
> near-simultaneous connection attempts while the backup is still active.

As far as I could gather from Stefan's description, I think this is
pretty unlikely.  It seems to me that the "too many children" error
message is very common in the 8.3 setup already, and the only reason
they have a problem on 8.4 is that it crashes instead.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] duplicate connection failure messages

2010-11-15 Thread Alvaro Herrera
Excerpts from Bruce Momjian's message of sáb nov 13 22:36:31 -0300 2010:

> OK, I found out how to get the IP address with the attached patch.  The
> problem is that only pghost is set, never pghostaddr.  I am not even
> sure how that would get set for this code because my tests show it is
> not:

This doesn't work for IPv6 addresses, though.

pghostaddr is specified by the user on the command line as an
optimization to avoid DNS lookups IIRC, which is why you don't see the
code setting it.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-15 Thread Tom Lane
Heikki Linnakangas  writes:
>>> Hmm, SetLatch only sets one flag, so I don't see how it could malfunction
>>> all by itself. And I would've thought that declaring the Latch variable
>>> "volatile" prevents rearrangements.
>> 
>> It's not a question of code rearrangement.

Precisely.  "volatile" prevents the compiler from rearranging the
instruction sequence in a way that would *issue* stores out-of-order.
However, that doesn't prevent the hardware from *executing* the stores
out-of-order from the point of view of a different processor.  As Robert
noted, the risk cases here come from caching; in particular, that a
dirty cache line might get flushed to main memory later than some other
dirty cache line.  There are some architectures that guarantee that this
won't happen (no doubt at significant expenditure of gates).  There are
others that don't, preferring to optimize the single-processor case.
On those, you need an "msync" type of instruction to force dirty cache
lines out to main memory between any two stores whose effects had better
become visible to another processor in a certain order.  I'm not sure if
it's universal, but on PPC there are actually two different sync
concepts involved, a write barrier that does the above and a read
barrier that ensures the reading processor is up-to-date.

> I believe it's safe to 
> assume that two operations using a volatile pointer will not be 
> rearranged wrt. each other.

This is entirely wrong, so far as cross-processor visibility of changes
is concerned.  Whether it should be true in some ideal reading of the C
spec is not relevant: there are common architectures in which it is not
true.

The window for trouble is normally pretty small, and in particular any
kernel call or context swap is usually going to force an msync.  So I'm
not sure that there are any risk spots in practice right now with
SetLatch.  But if we expand our use of it, we can be 100% certain we'll
get bit eventually.

> Tom's other scenario, where changing some other variable in shared 
> memory might not have become visible to other processes when SetLatch() 
> runs, seems more plausible (if harder to run into in practice). But if 
> the variable is meant to be examined by other processes, then you should 
> use a lock to protect it.

In that case, of what use is the latch stuff?  The whole point with that
(or at least a lot of the point) is to not have to take locks.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSI update

2010-11-15 Thread Kevin Grittner
David Fetter  wrote:
 
> That it's not ready for commit this minute does not mean that it
> shouldn't be in the CF this month.  Delaying the first review of
> the patch until the next CF pretty much ensures that we'll miss
> 9.1 with it, so please add to the current CF :)
 
Well, I had a version which compiled without warning and passed the
regular regression tests before the deadline, but it would have been
rather disingenuous to post it.  I'm in the middle of reworking the
guts of it and there's a lot of wet paint at the moment.  Most of
the dcheck tests are failing, which isn't a surprise because I
haven't finished re-implementing everything which was working before
this weekend.  I'm not sure what a reviewer could say right now
other than to point out all the loose ends fluttering in the breeze.
 
To be clear -- before this weekend I had something which worked
correctly in all respects except, as pointed out by Heikki and Jeff,
it was vulnerable to filling its in-memory structures if there was a
long-running transaction concurrent with a lot of short-lived
transactions.  At that point the only options, as the patch stood,
were to refuse to start new serializable transactions or to start
killing off the oldest active serializable transactions.
 
This issue has been on the Wiki page as an R&D "What should we do
about this?" item since the 25th of January, but it took the
discussion around the review in the last CF to lead to a design for
a solution.  Then my father landed in the hospital after a cardiac
arrest, an in-law died, and I got a nasty toothache which ultimately
needed a root canal last week.  On top of that, I've been assigned
some high priority tasks at work which relegate work on this patch
to nights and weekends for the time being.  Dan has also been
unavailable, although for happier reasons -- he got married and took
a vacation.  Overall, though, a tough few weeks for making progress
on implementing the designed solution.
 
It's likely to be two to four weeks before I have something in a
condition which it would make sense for anyone to spend time on. 
Anyone who wants to watch the progress is welcome to track the git
repo.  I pushed this weekend's work in this commit, wet paint and
all:
 
http://git.postgresql.org/gitweb?p=users/kgrittn/postgres.git;a=commitdiff;h=208ade2acb6c34177c4bfd49e1f240fb751b8be1
 
That went in an hour and a half before the CF deadline, but I just
didn't feel right putting it into the CF in that shape.  I'm most of
the way through replacing the conflict pointers with conflict lists,
which I decided was a prerequisite for solving the memory management
issues in an effective way.  I have yet to start work on the memory
management issues themselves.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-15 Thread Robert Haas
On Mon, Nov 15, 2010 at 9:51 AM, Tom Lane  wrote:
> Heikki Linnakangas  writes:
 Hmm, SetLatch only sets one flag, so I don't see how it could malfunction
 all by itself. And I would've thought that declaring the Latch variable
 "volatile" prevents rearrangements.
>>>
>>> It's not a question of code rearrangement.
>
> Precisely.  "volatile" prevents the compiler from rearranging the
> instruction sequence in a way that would *issue* stores out-of-order.
> However, that doesn't prevent the hardware from *executing* the stores
> out-of-order from the point of view of a different processor.  As Robert
> noted, the risk cases here come from caching; in particular, that a
> dirty cache line might get flushed to main memory later than some other
> dirty cache line.  There are some architectures that guarantee that this
> won't happen (no doubt at significant expenditure of gates).

And in fact if this (interesting!) video is any indication, that
problem is only going to get worse as core counts go up.  This guy
built a lock-free, wait-free hash table implementation that can run on
a system with hundreds of cores.  I'm just guessing here, but I
strongly suspect that keeping memory in full sync across that many
processors would just kill performance, so they shrug their shoulders
and don't.  The application programmer gets to pick up the pieces.

http://video.google.com/videoplay?docid=2139967204534450862#

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MULTISET and additional functions for ARRAY

2010-11-15 Thread Tom Lane
Itagaki Takahiro  writes:
> Another issue for sorting is that we have 4 kinds of sorting: ASC/DESC
> and NULLS FIRST/LAST.

We have a lot more kinds than that.  See USING.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to rename each field in ROW expression?

2010-11-15 Thread Tom Lane
Itagaki Takahiro  writes:
> Is it possible to change names fields in ROW?
> We can use CREATE TYPE AS on ahead, but I'd like to
> change names of ROW expression in ad-hoc queries.

Why?  It's an anonymous type, you shouldn't care about names.
If you do, make a real named rowtype.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-15 Thread Heikki Linnakangas

On 15.11.2010 16:51, Tom Lane wrote:

Heikki Linnakangas  writes:

I believe it's safe to
assume that two operations using a volatile pointer will not be
rearranged wrt. each other.


This is entirely wrong, so far as cross-processor visibility of changes
is concerned.


Ok.

In SetLatch, is it enough to add the SpinLockAcquire() call *after* 
checking that is_set is not already set? Ie. still do the quick exit 
without holding a lock. Or do we need a memory barrier operation before 
the fetch, to ensure that we see if the other process has just cleared 
the flag with ResetLatch() ? Presumable ResetLatch() needs to call 
SpinLockAcquire() anyway to ensure that other processes see the clearing 
of the flag.



Tom's other scenario, where changing some other variable in shared
memory might not have become visible to other processes when SetLatch()
runs, seems more plausible (if harder to run into in practice). But if
the variable is meant to be examined by other processes, then you should
use a lock to protect it.


In that case, of what use is the latch stuff?  The whole point with that
(or at least a lot of the point) is to not have to take locks.


The use case for a latch is to wake up another process to examine other 
piece of shared memory (or a file or something else), and take action 
based on that other state if needed. Access to that other piece of 
shared memory needs locking or some other means of concurrency control, 
regardless of the mechanism used to wake up the other process.


Take the walsender latches for example. The "other piece of shared 
memory" is the current WAL flush location. The latch is used to wake up 
a walsender after flushing some WAL. The latch itself doesn't protect 
the access to the WAL flush pointer in any way, GetFlushRecPtr() uses a 
spinlock for that.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSI update

2010-11-15 Thread Greg Smith

Kevin Grittner wrote:

That went in an hour and a half before the CF deadline, but I just
didn't feel right putting it into the CF in that shape.


Then keep on working on it and we can revisit its state when you're 
happy with it.  The purpose of the CommitFest cut-off is not to block 
work on long-term development just because a deadline passed; it's to 
make sure patches which might otherwise never get a review are looked at 
eventually.  Finding a reviewer for these larger and complicated patches 
is a very different sort of job than finding one for an average patch 
anyway.


I could use a brief reminder of how this bit fits into the "serializable 
lock consistency" patch that's already sitting into the CF queue as 
"Ready for Committer" though.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services and Supportwww.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSI update

2010-11-15 Thread Alvaro Herrera
Excerpts from Greg Smith's message of lun nov 15 12:32:25 -0300 2010:

> I could use a brief reminder of how this bit fits into the "serializable 
> lock consistency" patch that's already sitting into the CF queue as 
> "Ready for Committer" though.

I remind you, though, that the most painful point that caused the
commitfest process to come into existance was the HOT patch, which was
very large and intrusive and didn't get any review until very late in
the cycle.  So getting some review earlier than at the last minute is
important.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] unlogged tables

2010-11-15 Thread Marti Raudsepp
On Sun, Nov 14, 2010 at 02:16, Robert Haas  wrote:
> Here is a series of three patches related to unlogged tables.

Just wondering, have you thought of any mechanisms how application
code might detect that an unlogged table was truncated due to restart?
While polling with something like "SELECT 1 FROM table LIMIT 1" might
work, it's an awful hack.

One obvious use case for these unlogged tables would be materalized
views. I think it would be useful to execute e.g. a TRUNCATE trigger
so that an the view could be initialized. If an exclusive lock were
passed on to the trigger procedure, this could even be done in a
race-condition-free manner as far as I can tell.

Would there be a problem with invoking this trigger from the session
that first touches the table?

Regards,
Marti

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSI update

2010-11-15 Thread Kevin Grittner
Greg Smith  wrote:
 
> I could use a brief reminder of how this bit fits into the
> "serializable lock consistency" patch that's already sitting into
> the CF queue as "Ready for Committer" though.
 
Florian's patch prevents an "integrity inversion" that PostgreSQL
has had for ages, where attempting to prevent serialization
anomalies under snapshot isolation through SELECT FOR UPDATE/SHARED
gave you less protection than in the less strict levels.  This has
been a particular nuisance to those trying to convert from Oracle,
where using the more strict isolation level always gives *more*
integrity protection, not less.  That patch appears to strengthen
PostgreSQL integrity guarantees around SELECT FOR UPDATE/SHARED to
match Oracle.
 
One difference between that and the SSI patch is that with Florian's
patch you still need to recognize potential interactions which could
generate anomalies and explicitly code to prevent them, while in SSI
this is sorted out by the database engine automatically at run time.
Another difference is that Florian's patch introduces blocking to
protect integrity, while SSI does not.
 
There are definitely use cases for both.  In general, SSI is a "big
shop" oriented solution, while the explicit coding may be more
appropriate in many environments where you have just a few
programmers working with dozens or hundreds of database transaction
types rather than thousands.  The latter is also likely to be a
popular choice for those converting from other databases, even for
big shops, where they already have a working scheme using explicit
locking.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Extensions

2010-11-15 Thread Dimitri Fontaine
Hi,

I've been trying to summarise the last month of work into a single wiki
page, to ease approaching the review of this now quite large patch. It
touches lots of things but the design and goal are simple enough to fit
in my mind, that should be no problem for you guys here :)

  http://wiki.postgresql.org/wiki/Extensions

Oh, don't forget about the git repository, that's the easiest way to get
at the code and try it. The branches, their content and dependencies are
detailed in the wiki page, and here they are too, as a kick start.

  http://git.postgresql.org/gitweb?p=postgresql-extension.git;a=summary

  cfparser \
} extension \ 
  pg_execute_from_file()   / } alter_extension
/
 set_schema/

  The extension patch has been only produced with the cfparser and
  pg_execute_from_file() bits, to ease testing. The set schema has been
  produced separately and is independent, the alter_extension patch is
  an incremental patch atop both the extension and the set_schema patch,
  to ease discussion of APIs.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-15 Thread Tom Lane
Heikki Linnakangas  writes:
> In SetLatch, is it enough to add the SpinLockAcquire() call *after* 
> checking that is_set is not already set? Ie. still do the quick exit 
> without holding a lock. Or do we need a memory barrier operation before 
> the fetch, to ensure that we see if the other process has just cleared 
> the flag with ResetLatch() ? Presumable ResetLatch() needs to call 
> SpinLockAcquire() anyway to ensure that other processes see the clearing 
> of the flag.

Hmm ... I just remembered the reason why we didn't use a spinlock in
these functions already.  Namely, that it's unsafe for a signal handler
to try to acquire a spinlock that the interrupted code might be holding.
So I think a bit more thought is needed here.  Maybe we need to bite the
bullet and do memory barriers ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Custom code int(32|64) => text conversions out of performance reasons

2010-11-15 Thread Robert Haas
On Sun, Oct 31, 2010 at 5:41 PM, Andres Freund  wrote:
> While at it:

These words always make me a bit frightened when reviewing a patch,
since it's generally simpler if a single patch only does one thing.
However, in this case...

> * I remove the outdated
> -- NOTE: int[24] operators never check for over/underflow!
> -- Some of these answers are consequently numerically incorrect.
> warnings in the regressions tests.

...this part looks obviously OK, so I have committed it.

The rest is attached as a residual patch, except that I reverted this change:

> * I renamed pg_[il]toa to pg_s(16|32|64)toa - I found the names confusing. Not
> sure if its worth it.

I notice that int8out isn't terribly consistent with int2out and
int4out, in that it does an extra copy.   Maybe that's justified given
the greater potential memory wastage, but I'm not certain.  One
approach might be to pick some threshold value and allocate a buffer
in one of two sizes based on how large the value is relative to that
cutoff.  But that might also be a stupid idea, not sure.

It would speed things up for me if you or someone else could take a
quick pass over what remains here and fix the formatting and
whitespace to be consistent with our general project style, and make
the comment headers more consistent among the functions being
added/modified.

I think the new regression tests look good.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


custom-int248-string-conversion-routines.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSI update

2010-11-15 Thread Robert Haas
On Mon, Nov 15, 2010 at 10:56 AM, Kevin Grittner
 wrote:
> Greg Smith  wrote:
>
>> I could use a brief reminder of how this bit fits into the
>> "serializable lock consistency" patch that's already sitting into
>> the CF queue as "Ready for Committer" though.
>
> [explanation]

The short version is that they are independent.  The serializable lock
consistency patch, I think, a very important one, although I am not
entirely sure that I have the skill to do it justice.  (I'm kinda
hoping Tom will commit it, but If he doesn't have time I may have to
take a crack at it.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Amazon now supporting GPU focused EC2 instances

2010-11-15 Thread Jeroen Vermeulen

On 2010-11-15 18:49, Greg Stark wrote:


I've seen papers on doing relational joins using GPUs and I'm sure
there are other algorithms we wonderful stuff we could do. But if it
comes at the cost of being able to handle arbitrary join clauses it'll
be a tough sacrifice to make.


Perhaps the coolest use of all is as an intermediate filtering stage for 
spatial joins, using collision detection.  Draw your data and your 
search region (slightly enlarged) as objects and ask the graphics card 
if the search region collides with anything.  Much like it might ask "is 
this player character bumping into any walls?"


IIRC it's been prototyped in Oracle with impressive results.  Using 
OpenGL, so quite portable between GPUs.



Jeroen

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSI update

2010-11-15 Thread Kevin Grittner
Alvaro Herrera  wrote:
 
> I remind you, though, that the most painful point that caused the
> commitfest process to come into existance was the HOT patch, which
> was very large and intrusive and didn't get any review until very
> late in the cycle.  So getting some review earlier than at the
> last minute is important.
 
I've tried to avoid that by keeping a Wiki up-to-date with design,
status, and issues publicly available.  I've been pushing the code
to a public git repo as it's been developed, since January.  I
avoided discussing issues like the one causing the current
refactoring on-list for several months because Tom complained that
such discussions were distracting from the effort to get 9.0 out the
door, but I submitted a WIP patch to the first 9.1 CF and a patch I
had hopes of seeing committed to the second 9.1 CF.
 
I've missed submitting to this CF, but it would be entirely wrong to
say that the patch has had no review.  Joe Conway did a round of
review, although much of the discussion was off-list.  This resulted
in a commit of a portion of the patch which could be usefully split
out.  Heikki Linnakangas didn't officially sign up as a reviewer,
but he posted a lot of questions and suggestions showing that he had
looked at the code.  Jeff Davis gave it an official review,
ultimately resulting in a disposition of "Returned with Feedback". 
There is the one big issue of degrading gracefully in the face of a
long-running transaction concurrent with many shorter ones, which I
am attempting to address now.
 
All of the above reviews have resulted in changes to the code which
I feel are improvements on the initial effort.
 
I've been doing everything I can thing of to avoid having this be a
last minute submission while still "playing nice" with the
community.  Any suggestions on what else I can do to improve this
are welcome.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Custom code int(32|64) => text conversions out of performance reasons

2010-11-15 Thread Andres Freund
On Monday 15 November 2010 17:12:25 Robert Haas wrote:
> It would speed things up for me if you or someone else could take a
> quick pass over what remains here and fix the formatting and
> whitespace to be consistent with our general project style, and make
> the comment headers more consistent among the functions being
> added/modified.
will do.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] unlogged tables

2010-11-15 Thread Robert Haas
On Mon, Nov 15, 2010 at 10:54 AM, Marti Raudsepp  wrote:
> On Sun, Nov 14, 2010 at 02:16, Robert Haas  wrote:
>> Here is a series of three patches related to unlogged tables.
>
> Just wondering, have you thought of any mechanisms how application
> code might detect that an unlogged table was truncated due to restart?
> While polling with something like "SELECT 1 FROM table LIMIT 1" might
> work, it's an awful hack.
>
> One obvious use case for these unlogged tables would be materalized
> views. I think it would be useful to execute e.g. a TRUNCATE trigger
> so that an the view could be initialized. If an exclusive lock were
> passed on to the trigger procedure, this could even be done in a
> race-condition-free manner as far as I can tell.
>
> Would there be a problem with invoking this trigger from the session
> that first touches the table?

Yeah, this infrastructure doesn't really allow that.  The truncate
happens way too early on in startup to execute any user-provided code.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] unlogged tables

2010-11-15 Thread Tom Lane
Marti Raudsepp  writes:
> Would there be a problem with invoking this trigger from the session
> that first touches the table?

Other than security?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL/MED estimated time of arrival?

2010-11-15 Thread Eric Davies
With Informix VTI, indexing is the same for native tables as for 
virtual tables, except the interpretation of the 32 bit rowid is left 
up to the developer. When you define the VTI class, you optionally 
supply a method that can fetch data based on a 32 bit rowid, and it's 
the responsibility of your non-indexed scanning methods to provide 
rowids along with the row tuple.


Having local indexes can be very useful if you have a user that 
issues queries like:

   select count(*) from some_external_table where  ;
With VTI, the "count" aggregate doesn't get pushed down, meaning that 
without a local index, your scanning method has to return as many 
tuples as match the where clause, which can be very slow.


Local indexes also affords the opportunity of using specialized 
indexes built into the database. My guess is that without some form 
of rowids being passed back and forth, you couldn't define 
non-materialized views of virtual tables that could be indexed.


That said, we implemented our own btree-like index that used the 
pushed down predicates because fetching data one row at a time wasn't 
desirable with our design choices, and we wanted to support virtual 
tables with more than 4 billion rows.


Eric
At 07:41 PM 11/14/2010, Shigeru HANADA wrote:

On Fri, 12 Nov 2010 08:27:54 -0800
Eric Davies  wrote:
> Thank you for the time estimate and the interface discussion. It
> sounds like the PostgreSQL SQL/MED code will be very useful when it
> is done. Our product provides read-only access to files, so
> updates/inserts/deletes aren't an issue for us.
>
> One thing that is not clear to me is indexing support. Will it be
> possible to index a SQL/MED table as if it were a regular table?

No, SQL/MED would not support indexing foreign tables, at least in
first version.  Because it would be difficult to use common row id for
various FDWs.  To support indexing foreign tables might need to change
common structure of index tuple to be able to hold virtual row-id, not
ItemPointerData.

Instead, FDW can handle expressions which are parsed from WHERE clause
and JOIN condition of original SQL, and use them to optimize scanning.
For example, FDW for PostgreSQL pushes some conditions down to remote
side to decrease result tuples to be transferred.  I hope this idea
helps you.

>  What
> would be the equivalent of Informix's row ids?

Answer to the second question would be "ItemPointerData".  It consists
of a block number and an offset in the block, and consume 6 bytes for
each tuple.  With this information, PostgreSQL can access to a data
tuple directly.  Actual definition is:

typedef struct ItemPointerData
{
BlockIdData ip_blkid;
OffsetNumber ip_posid;
} ItemPointer;

Does Informix uses common row-id (AFAIK it's 4 bytes integer) for
both of virtual tables and normal tables?

Regards,
--
Shigeru Hanada


**
Eric Davies, M.Sc.
Senior Programmer Analyst
Barrodale Computing Services Ltd.
1095 McKenzie Ave., Suite 418
Victoria BC V8P 2L5
Canada

Tel: (250) 704-4428
Web: http://www.barrodale.com
Email: e...@barrodale.com
**




Re: [HACKERS] unlogged tables

2010-11-15 Thread Marti Raudsepp
On Mon, Nov 15, 2010 at 18:25, Tom Lane  wrote:
> Marti Raudsepp  writes:
>> Would there be a problem with invoking this trigger from the session
>> that first touches the table?
>
> Other than security?

Right, I guess that would only make sense with SECURITY DEFINER.

On Mon, Nov 15, 2010 at 18:22, Robert Haas  wrote:
> On Mon, Nov 15, 2010 at 10:54 AM, Marti Raudsepp  wrote:
>> Just wondering, have you thought of any mechanisms how application
>> code might detect that an unlogged table was truncated due to restart?

> Yeah, this infrastructure doesn't really allow that.  The truncate
> happens way too early on in startup to execute any user-provided code.

The truncate itself can be performed early and set a flag somewhere
that would invoke a trigger on the first access. I suppose it cannot
be called a "truncate trigger" then.

Or maybe provide hooks for pgAgent instead?

Do you see any alternatives to be notified of unlogged table
truncates? Without notification, this feature would seem to have
limited usefulness.

Regards,
Marti

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] unlogged tables

2010-11-15 Thread Robert Haas
On Mon, Nov 15, 2010 at 12:02 PM, Marti Raudsepp  wrote:
> On Mon, Nov 15, 2010 at 18:25, Tom Lane  wrote:
>> Marti Raudsepp  writes:
>>> Would there be a problem with invoking this trigger from the session
>>> that first touches the table?
>>
>> Other than security?
>
> Right, I guess that would only make sense with SECURITY DEFINER.
>
> On Mon, Nov 15, 2010 at 18:22, Robert Haas  wrote:
>> On Mon, Nov 15, 2010 at 10:54 AM, Marti Raudsepp  wrote:
>>> Just wondering, have you thought of any mechanisms how application
>>> code might detect that an unlogged table was truncated due to restart?
>
>> Yeah, this infrastructure doesn't really allow that.  The truncate
>> happens way too early on in startup to execute any user-provided code.
>
> The truncate itself can be performed early and set a flag somewhere
> that would invoke a trigger on the first access. I suppose it cannot
> be called a "truncate trigger" then.
>
> Or maybe provide hooks for pgAgent instead?
>
> Do you see any alternatives to be notified of unlogged table
> truncates? Without notification, this feature would seem to have
> limited usefulness.

Well, you're only monitoring for a server restart.  That's probably
something you need a way to monitor for anyway.  I don't think we have
a function that exposes the time of the last server restart at the SQL
level, but maybe we should.  You can monitor for it by watching the
logs, of course.

This is really intended for things like caches of session information
where loss is annoying (because users have to log back into the
webapp, or whatever) but not so critical that we want to take a
performance penalty to prevent it.  It will also be helpful to people
who want to make PG run very very quickly even at the risk of data
loss, as in the recent discussion on -performance and some
conversations I had at PG West; it provides a more structured, and
hopefully also more performant, alternative to turning off fsync,
full_page_writes, and synchronous commit.  For some such apps, it may
be sufficient to check for truncating at each reconnect, which will be
a whole lot easier than what they have to do now (which is rebuild the
entire cluster every time PG restarts).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] unlogged tables

2010-11-15 Thread Aidan Van Dyk
On Mon, Nov 15, 2010 at 11:22 AM, Robert Haas  wrote:

> Yeah, this infrastructure doesn't really allow that.  The truncate
> happens way too early on in startup to execute any user-provided code.

But you could use the very feature of unlogged tables to know if
you've "initialized" some unlogged table by using an unlogged table to
note the initilization.

If the value you expect isn't in your "noted" table, you know that
it's been truncated...

Sure, it's "app side", but the hole point of unlogged tables it to
allow optimzations when the "appside" knows the data's dispensable and
rebuild-able.

a.

-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Count backend self-sync calls

2010-11-15 Thread Robert Haas
On Sun, Nov 14, 2010 at 8:31 PM, Robert Haas  wrote:
> On Sun, Nov 14, 2010 at 7:19 PM, Greg Smith  wrote:
>>> But if this is generating a lot of log data or adding a lot of
>>> overhead, then you have bigger problems anyway:
>>>
>>> +               elog(DEBUG1, "Unable to forward fsync request, executing
>>> directly");
>>>
>>
>> The argument against this log line even existing is that if the field is
>> added to pg_stat_bgwriter, that's probably how you want to monitor this data
>> anyway.
>
> I'll remove it if you really want it gone, but personally I think it's
> useful to have.  I've more than once had to debug a problem given a
> PostgreSQL log file with the debug level cranked up and not a whole
> lot else.  Rare events that cause performance to tank are worth
> logging, IMHO.
>
>> I started out touching code that called it just "sync", but then crossed to
>> other code that called it "fsync", and made the external UI use that name.
>>  No objections to sorting that out within my patch so it's consistent.
>
> OK, I'll do that before I commit it.

Committed with (I think) all the changes discussed, plus a catversion bump.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Rethinking hint bits WAS: Protecting against unexpected zero-pages: proposal

2010-11-15 Thread Josh Berkus
Greg, Tom,

> We *already* have separate bitmap outside the table for transaction
> commit bits. It's the clog.

You didn't read my whole e-mail.  I talk about the CLOG further down.

> Josh is ignoring the proposal that is on the table and seems actually
> workable, which is to consult the visibility map during index-only
> scans.  For mostly-static tables this would save trips to the heap for
> very little extra I/O.  The hard part is to make the VM reliable, but
> that is not obviously harder than making separately-stored hint bits
> reliable.

No, I'm not.  I'm pointing out that it doesn't unblock the other 4
features/improvements I mentioned, *all* of which would be unblocked by
not storing the hint bits in the table, whatever means we use to do so.
 You, for your part, are consistently ignoring these other issues.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Rethinking hint bits WAS: Protecting against unexpected zero-pages: proposal

2010-11-15 Thread Tom Lane
Josh Berkus  writes:
> No, I'm not.  I'm pointing out that it doesn't unblock the other 4
> features/improvements I mentioned, *all* of which would be unblocked by
> not storing the hint bits in the table, whatever means we use to do so.
>  You, for your part, are consistently ignoring these other issues.

I'm not ignoring them; I just choose to work on other issues, since
there is no viable proposal for fixing them.  I don't intend to put
my time into dead ends.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Rethinking hint bits WAS: Protecting against unexpected zero-pages: proposal

2010-11-15 Thread Josh Berkus

> I'm not ignoring them; I just choose to work on other issues, since
> there is no viable proposal for fixing them.  I don't intend to put
> my time into dead ends.

So, that's a "show me a patch and we'll talk"?  Understood, then.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MULTISET and additional functions for ARRAY

2010-11-15 Thread David Fetter
On Mon, Nov 15, 2010 at 10:13:40AM -0500, Tom Lane wrote:
> Itagaki Takahiro  writes:
> > Another issue for sorting is that we have 4 kinds of sorting: ASC/DESC
> > and NULLS FIRST/LAST.
> 
> We have a lot more kinds than that.  See USING.

USING pretty much gives us no chance of optimizing at all.  Could we
maybe see about optimizing the 99% case, which those two bits cover?

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Rethinking hint bits WAS: Protecting against unexpected zero-pages: proposal

2010-11-15 Thread Robert Haas
On Mon, Nov 15, 2010 at 2:06 PM, Josh Berkus  wrote:
>> I'm not ignoring them; I just choose to work on other issues, since
>> there is no viable proposal for fixing them.  I don't intend to put
>> my time into dead ends.
>
> So, that's a "show me a patch and we'll talk"?  Understood, then.

Or even just a proposal.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in plpython's Python Generators

2010-11-15 Thread Tom Lane
=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?=  writes:
> On 24/10/10 00:32, Jan Urbański wrote:
>> On 21/10/10 20:48, Alvaro Herrera wrote:
>>> ... and presumably somebody can fix the real bug that Jean-Baptiste hit,
>>> too.

>> AFAICS the error comes from PLy_function_handler disconnecting from SPI
>> after calling into the Python code and then going ahead and reading the
>> result from the iterator.

> Here's a patch with a fix for that bug.

Applied back to 8.2.  Thanks for the patch.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Instrument checkpoint sync calls

2010-11-15 Thread Robert Haas
On Sun, Nov 14, 2010 at 7:04 PM, Greg Smith  wrote:
> It might.  One trade-off is that if you're looking at the sync write detail,
> the summary comes out in a similar form.  And it was easy to put in
> here--I'd have to return some new data out of the sync phase call in order
> for that to show up in the main log.  If there's general buy-in on the idea,
> I could do all of that.

+1 for that approach.

> What I was alluding to is that I know gettimeofday executes fast on my Linux
> system here, so even if I did measure the overhead and showed it's near zero
> that doesn't mean it will be so on every platform.  The "how long does it
> take to find out the current time on every supported PostgreSQL platform?"
> question is one I'd like to have an answer to, but it's hard to collect
> properly.  All I know is that I don't have any system where it's slow to
> properly test again here.

I would be very surprised if we can find a system where gettimeofday()
takes a significant amount of time compared with fsync().  It might be
(probably is) too expensive to stick into code paths that are heavily
CPU-bounded, but surely the cost here is going to be dwarfed by the
fsync(), no?  Unless maybe there's no I/O to be done anyway, but that
case doesn't seem important to optimize for.  I think making this code
dependent on ifdef DEBUG_FSYNC is going to make in order of magnitude
less useful, so I'd really rather not do that unless we have some hard
evidence that there is a real problem to worry about here.  Making it
conditional on log_checkpoints seems entirely sufficient to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MULTISET and additional functions for ARRAY

2010-11-15 Thread Tom Lane
David Fetter  writes:
> On Mon, Nov 15, 2010 at 10:13:40AM -0500, Tom Lane wrote:
>> Itagaki Takahiro  writes:
>>> Another issue for sorting is that we have 4 kinds of sorting: ASC/DESC
>>> and NULLS FIRST/LAST.

>> We have a lot more kinds than that.  See USING.

> USING pretty much gives us no chance of optimizing at all.  Could we
> maybe see about optimizing the 99% case, which those two bits cover?

The question is why support more than *one* kind, if you're only
supporting a subset.  I don't see the value of messing with stuff like
NULLS FIRST if you're not going to go all the way.  What's more, the
alleged use for this is strictly as an internal optimization in multiset
representation, so there's no reason to support more than one sort
order.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Rethinking hint bits WAS: Protecting against unexpected zero-pages: proposal

2010-11-15 Thread Tom Lane
Robert Haas  writes:
> On Mon, Nov 15, 2010 at 2:06 PM, Josh Berkus  wrote:
>>> I'm not ignoring them; I just choose to work on other issues, since
>>> there is no viable proposal for fixing them.  I don't intend to put
>>> my time into dead ends.

>> So, that's a "show me a patch and we'll talk"?  Understood, then.

> Or even just a proposal.

Well, he did have a proposal ... it just wasn't very credible.  Moving
the hint bits around is at best a zero-sum game; it seems likely to
degrade cases we now handle well more than it improves cases we don't.
I think what we need is a fundamentally new idea, and I've not seen one.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Instrument checkpoint sync calls

2010-11-15 Thread Tom Lane
Robert Haas  writes:
> I would be very surprised if we can find a system where gettimeofday()
> takes a significant amount of time compared with fsync().  It might be
> (probably is) too expensive to stick into code paths that are heavily
> CPU-bounded, but surely the cost here is going to be dwarfed by the
> fsync(), no?  Unless maybe there's no I/O to be done anyway, but that
> case doesn't seem important to optimize for.

I'm not sure I buy that --- the whole point of spread checkpoints is
that we hope the I/O happens before we actually call fsync.

> Making it
> conditional on log_checkpoints seems entirely sufficient to me.

But I'll agree with that.  If you're turning on log_checkpoints then
you've given the system permission to indulge in extra overhead for
monitoring.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Instrument checkpoint sync calls

2010-11-15 Thread Greg Smith

So my task list is:

0) Rebase against the HEAD that just code related to this touched today

1) Assume that log_checkpoints is sufficient control over whether the 
timing overhead added is worth collecting, and therefore remove the 
half-baked idea of also wrapping with a compile-time option.


2) Have the sync summary returned upwards, so it can be put onto the 
same line as the rest of the rest of the log_checkpoint info.


All seems reasonable to me.  Will rev a new patch by tomorrow.

--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services and Supportwww.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Instrument checkpoint sync calls

2010-11-15 Thread Robert Haas
On Mon, Nov 15, 2010 at 2:48 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> I would be very surprised if we can find a system where gettimeofday()
>> takes a significant amount of time compared with fsync().  It might be
>> (probably is) too expensive to stick into code paths that are heavily
>> CPU-bounded, but surely the cost here is going to be dwarfed by the
>> fsync(), no?  Unless maybe there's no I/O to be done anyway, but that
>> case doesn't seem important to optimize for.
>
> I'm not sure I buy that --- the whole point of spread checkpoints is
> that we hope the I/O happens before we actually call fsync.

Possibly.  Even then, it's not guaranteed, because it might well be
that fsync() has to do more grovelling through system data structures
to determine that there is in fact no work to be done than what
gettimeofday() has to do to read off a counter somewhere...

>> Making it
>> conditional on log_checkpoints seems entirely sufficient to me.
>
> But I'll agree with that.  If you're turning on log_checkpoints then
> you've given the system permission to indulge in extra overhead for
> monitoring.

...but if we agree on the way forward then the details don't matter.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Count backend self-sync calls

2010-11-15 Thread Jeff Janes
On Sun, Nov 14, 2010 at 2:07 PM, Greg Smith  wrote:
> The attached patch adds a new field to pg_stat_bgwriter, counting the number
> of times backends execute their own fsync calls.  Normally, when a backend
> needs to fsync data, it passes a request to the background writer, which
> then absorbs the call into its own queue of work to do.  However, under some
> types of heavy system load, the associated queue can fill.  When this
> happens, backends are forced to do their own fsync call.  This is
> potentially much worse than when they do a regular write.
>
> The really nasty situation is when the background writer is busy because
> it's executing a checkpoint.  In that case, it's possible for the backend
> fsync calls to start competing with the ones the background writer is trying
> to get done,


Do you know where this competition is happening?  Is it on the
platters, or is it in the hard drive write cache (I thought high-end
hardware had tagged writes to avoid that), or in the kernel?

...
>
> DEBUG:  Absorbing 4096 fsync requests
> DEBUG:  Absorbing 150 fsync requests
> DEBUG:  Unable to forward fsync request, executing directly
> CONTEXT:  writing block 158638 of relation base/16385/16398
>
> Here 4096 is the most entries the BGW will ever absorb at once, and all 90
> of the missed sync calls are logged so you can see what files they came
> from.

Looking in src/backend/postmaster/bgwriter.c line 1071:

 * Note: we presently make no attempt to eliminate duplicate requests
 * in the requests[] queue.  The bgwriter will have to eliminate dups
 * internally anyway, so we may as well avoid holding the lock longer
 * than we have to here.

This makes sense if we just need to append to a queue.  But once the
queue is full and we are about to do a backend fsync, might it make
sense to do a little more work to look for dups?

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Improved parallel make support

2010-11-15 Thread Peter Eisentraut
On mån, 2010-11-15 at 11:13 +0100, Bernd Helmle wrote:
> 
> --On 14. November 2010 11:08:13 -0500 Robert Haas  
> wrote:
> 
> > +1.  The current master branch fails to build on my (rather new) Mac with
> > make -j2.  I could upgrade my toolchain but it seems like more trouble
> > than it's worth, not to mention a possible obstacle to new users and
> > developers.
> 
> The same here, too. And it doesn't matter if you use the shipped make 
> (3.81) or the one from macports (currently 3.82), both are failing with:
> 
> ld: file not found: ../../../../../../src/backend/postgres
> collect2: ld returned 1 exit status
> make[3]: *** [ascii_and_mic.so] Error 1
> make[2]: *** [all-ascii_and_mic-recurse] Error 2
> make[1]: *** [all-backend/utils/mb/conversion_procs-recurse] Error 2
> make[1]: *** Waiting for unfinished jobs

Untested, but the following should help you, by partially restoring the
old builder order on platforms that need it.

diff --git i/src/Makefile w/src/Makefile
index 0d4a6ee..2a5330a 100644
--- i/src/Makefile
+++ w/src/Makefile
@@ -28,6 +28,13 @@ SUBDIRS = \
 
 $(recurse)
 
+# On platforms that require the backend to be built before dynamically
+# loadable modules, partially constraint the build order for more
+# efficient and robust builds.
+ifdef BE_DLLLIBS
+all-backend/utils/mb/conversion_procs-recurse all-backend/replication/libpqwalreceiver-recurse all-pl-recurse: all-backend-recurse
+endif
+
 install: install-local
 
 install-local: installdirs-local

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Per-column collation

2010-11-15 Thread Pavel Stehule
Hello

2010/11/15 Peter Eisentraut :
> On mån, 2010-11-15 at 11:34 +0100, Pavel Stehule wrote:
>> I am checking a patch. I found a problem with initdb
>
> Ah, late night brain farts, it appears.  Here is a corrected version.
>
>

yes, it's ok now.

I see still a few issues:

a) default encoding for collate isn't same as default encoding of database

it's minimally not friendly - mostly used encoding is UTF8, but in
most cases users should to write locale.utf8.

b) there is bug - default collate (database collate is ignored)


postgres=# show lc_collate;
 lc_collate

 cs_CZ.UTF8
(1 row)

Time: 0.518 ms
postgres=# select * from jmena order by v;
 v
───
 Chromečka
 Crha
 Drobný
 Čečetka
(4 rows)

postgres=# select * from jmena order by v collate "cs_CZ.utf8";
 v
───
 Crha
 Čečetka
 Drobný
 Chromečka
(4 rows)

both result should be same.

isn't there problem in case sensitive collate name? When I use a
lc_collate value, I got a error message

postgres=# select * from jmena order by v collate "cs_CZ.UTF8";
ERROR:  collation "cs_CZ.UTF8" for current database encoding "UTF8"
does not exist
LINE 1: select * from jmena order by v collate "cs_CZ.UTF8";

problem is when table is created without explicit collate.

Regards

Pavel Stehule

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to rename each field in ROW expression?

2010-11-15 Thread Itagaki Takahiro
On Tue, Nov 16, 2010 at 00:18, Tom Lane  wrote:
> Why?  It's an anonymous type, you shouldn't care about names.
> If you do, make a real named rowtype.

If so, we cannot extract any fields in an anonymous type, right?
We cannot lookup fields with (an anonymous type).name because
the fields have no names.

=# SELECT (ROW(123, 'ABC', NULL)).*;
ERROR:  record type has not been registered

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq changes for synchronous replication

2010-11-15 Thread Robert Haas
On Mon, Sep 20, 2010 at 12:17 PM, Tom Lane  wrote:
> Personally I think this demonstrates that piggybacking replication
> data transfer on the COPY protocol was a bad design to start with.
> It's probably time to split them apart.

This appears to be the only obvious unresolved issue regarding this patch:

https://commitfest.postgresql.org/action/patch_view?id=412

I don't have a strong personal position on whether or not we should do
this, but it strikes me that Tom hasn't given much justification for
why he thinks we should do this, what benefit we'd get from it, or
what the design should look like.  So I guess the question is whether
Tom - or anyone - would like to make a case for a more serious
protocol overhaul, or whether we should just go with the approach
proposed here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix for seg picksplit function

2010-11-15 Thread Robert Haas
On Mon, Nov 15, 2010 at 4:06 AM, Alexander Korotkov
 wrote:
> With help of Oleg I found, that line "*left = *right = FirstOffsetNumber;"
> was needed only for 7.X compatibility, and it isn't needed any more.
> Also, I've replaced "i - 1" by "i - FirstOffsetNumber" in array filling.
> I believe it's more correct way, because it'll work correctly in the case
> when FirstOffsetNumber alters.

The loop that begins here:

for (i = 0; i < maxoff; i++)
{
/* First half of segs goes to the left datum. */
if (i < seed_2)

...looks like it should perhaps be broken into two separate loops.
That might also help tweak the logic in a way that eliminates this:

seg.c: In function ‘gseg_picksplit’:
seg.c:327: warning: ‘datum_r’ may be used uninitialized in this function
seg.c:326: warning: ‘datum_l’ may be used uninitialized in this function

But on a broader note, I'm not very certain the sorting algorithm is
sensible.  For example, suppose you have 10 segments that are exactly
'0' and 20 segments that are exactly '1'.  Maybe I'm misunderstanding,
but it seems like this will result in a 15/15 split when we almost
certainly want a 10/20 split.  I think there will be problems in more
complex cases as well.  The documentation says about the less-than and
greater-than operators that "These operators do not make a lot of
sense for any practical purpose but sorting."

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq changes for synchronous replication

2010-11-15 Thread Tom Lane
Robert Haas  writes:
> On Mon, Sep 20, 2010 at 12:17 PM, Tom Lane  wrote:
>> Personally I think this demonstrates that piggybacking replication
>> data transfer on the COPY protocol was a bad design to start with.
>> It's probably time to split them apart.

> This appears to be the only obvious unresolved issue regarding this patch:

> https://commitfest.postgresql.org/action/patch_view?id=412

> I don't have a strong personal position on whether or not we should do
> this, but it strikes me that Tom hasn't given much justification for
> why he thinks we should do this, what benefit we'd get from it, or
> what the design should look like.  So I guess the question is whether
> Tom - or anyone - would like to make a case for a more serious
> protocol overhaul, or whether we should just go with the approach
> proposed here.

I was objecting to v1 of the patch.  v2 seems somewhat cleaner --- it at
least avoids changing the behavior of libpq for normal COPY operation.
I'm still a bit concerned by the prospect of having to shove further
warts into the COPY data path in future, but maybe its premature to
complain about that when it hasn't happened yet.

Just in a quick scan, I don't have any objection to v2 except that the
protocol documentation is lacking.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Isn't HANDLE 64 bits on Win64?

2010-11-15 Thread Tom Lane
... and if so, isn't postmaster.c's code to transfer a HANDLE value to a
child process all wet?

sprintf(paramHandleStr, "%lu", (DWORD) paramHandle);

...

paramHandle = (HANDLE) atol(id);


BTW, it seems like it'd be a good thing if we had a Win64 machine in the
buildfarm.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq changes for synchronous replication

2010-11-15 Thread Robert Haas
On Mon, Nov 15, 2010 at 7:26 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Mon, Sep 20, 2010 at 12:17 PM, Tom Lane  wrote:
>>> Personally I think this demonstrates that piggybacking replication
>>> data transfer on the COPY protocol was a bad design to start with.
>>> It's probably time to split them apart.
>
>> This appears to be the only obvious unresolved issue regarding this patch:
>
>> https://commitfest.postgresql.org/action/patch_view?id=412
>
>> I don't have a strong personal position on whether or not we should do
>> this, but it strikes me that Tom hasn't given much justification for
>> why he thinks we should do this, what benefit we'd get from it, or
>> what the design should look like.  So I guess the question is whether
>> Tom - or anyone - would like to make a case for a more serious
>> protocol overhaul, or whether we should just go with the approach
>> proposed here.
>
> I was objecting to v1 of the patch.  v2 seems somewhat cleaner --- it at
> least avoids changing the behavior of libpq for normal COPY operation.
> I'm still a bit concerned by the prospect of having to shove further
> warts into the COPY data path in future, but maybe its premature to
> complain about that when it hasn't happened yet.

It's not an unreasonable complaint, but I don't have a very clear idea
what to do about it.

> Just in a quick scan, I don't have any objection to v2 except that the
> protocol documentation is lacking.

OK, I'll mark it Waiting on Author pending that issue.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] unlogged tables

2010-11-15 Thread Andy Colson

I am attempting to test this

https://commitfest.postgresql.org/action/patch_view?id=424

but I'm not sure which version of PG this should be applied to.  (it would be 
really neat, on here:
https://commitfest.postgresql.org/action/commitfest_view?id=8
if there was a note that said, this test this stuff against git tag X or branch 
Y or whatever)

I got the git:

git clone git://git.postgresql.org/git/postgresql.git

downloaded the patches, and applied them ok.  then did ./configure and make

after much spewage I got:

bufmgr.c: In function 'PrefetchBuffer':
bufmgr.c:126:10: error: 'struct RelationData' has no member named 'rd_istemp'
make[4]: *** [bufmgr.o] Error 1


Just to make sure everything was ok with the original, I reset:

git reset --hard HEAD^
./configure
make
and all was well.

so I tried again:
make clean
make maintainer-clean

patch -p1 < relpersistence-v1.patch
.. ok ..

but then...

$ patch -p1 < unlogged-tables-v1.patch
patching file doc/src/sgml/indexam.sgml
patching file doc/src/sgml/ref/create_table.sgml
patching file doc/src/sgml/ref/create_table_as.sgml
patching file src/backend/access/gin/gininsert.c
patching file src/backend/access/gist/gist.c
patching file src/backend/access/hash/hash.c
patching file src/backend/access/nbtree/nbtree.c
patching file src/backend/access/transam/xlog.c
patching file src/backend/catalog/catalog.c
patching file src/backend/catalog/heap.c
patching file src/backend/catalog/index.c
patching file src/backend/catalog/storage.c
patching file src/backend/parser/gram.y
patching file src/backend/storage/file/Makefile
patching file src/backend/storage/file/copydir.c
patching file src/backend/storage/file/fd.c
The next patch would create the file src/backend/storage/file/reinit.c,
which already exists!  Assume -R? [n]


That didnt happen the first time... I'm almost positive.

Not sure what I should do now.

-Andy

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Spread checkpoint sync

2010-11-15 Thread Robert Haas
On Sun, Nov 14, 2010 at 6:48 PM, Greg Smith  wrote:
> The second issue is that the delay between sync calls is currently
> hard-coded, at 3 seconds.  I believe the right path here is to consider the
> current checkpoint_completion_target to still be valid, then work back from
> there.  That raises the question of what percentage of the time writes
> should now be compressed into relative to that, to leave some time to spread
> the sync calls.  If we're willing to say "writes finish in first 1/2 of
> target, syncs execute in second 1/2", that I could implement that here.
>  Maybe that ratio needs to be another tunable.  Still thinking about that
> part, and it's certainly open to community debate.  The thing to realize
> that complicates the design is that the actual sync execution may take a
> considerable period of time.  It's much more likely for that to happen than
> in the case of an individual write, as the current spread checkpoint does,
> because those are usually cached.  In the spread sync case, it's easy for
> one slow sync to make the rest turn into ones that fire in quick succession,
> to make up for lost time.

I think the behavior of file systems and operating systems is highly
relevant here.  We seem to have a theory that allowing a delay between
the write and the fsync should give the OS a chance to start writing
the data out, but do we have any evidence indicating whether and under
what circumstances that actually occurs?  For example, if we knew that
it's important to wait at least 30 s but waiting 60 s is no better,
that would be useful information.

Another question I have is about how we're actually going to know when
any given fsync can be performed.  For any given segment, there are a
certain number of pages A that are already dirty at the start of the
checkpoint.  Then there are a certain number of additional pages B
that are going to be written out during the checkpoint.  If it so
happens that B = 0, we can call fsync() at the beginning of the
checkpoint without losing anything (in fact, we gain something: any
pages dirtied by cleaning scans or backend writes during the
checkpoint won't need to hit the disk; and if the filesystem dumps
more of its cache than necessary on fsync, we may as well take that
hit before dirtying a bunch more stuff).  But if B > 0, then we should
attempt the fsync() until we've written them all; otherwise we'll end
up having to fsync() that segment twice.

Doing all the writes and then all the fsyncs meets this requirement
trivially, but I'm not so sure that's a good idea.  For example, given
files F1 ... Fn with dirty pages needing checkpoint writes, we could
do the following: first, do any pending fsyncs for files not among F1
.. Fn; then, write all pages for F1 and fsync, write all pages for F2
and fsync, write all pages for F3 and fsync, etc.  This might seem
dumb because we're not really giving the OS a chance to write anything
out before we fsync, but think about the ext3 case where the whole
filesystem cache gets flushed anyway.  It's much better to dump the
cache at the beginning of the checkpoint and then again after every
file than it is to spew many GB of dirty stuff into the cache and then
drop the hammer.

I'm just brainstorming here; feel free to tell me I'm all wet.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Explain analyze getrusage tracking

2010-11-15 Thread Robert Haas
On Sun, Nov 14, 2010 at 9:33 PM, Greg Stark  wrote:
> This is an update to my earlier patch to add getrusage resource
> tracking to EXPLAIN ANALYZE.
>
> With this patch you get something like:
>
>                                                  QUERY PLAN
> --
>  Seq Scan on i  (cost=0.00..6919.44 rows=262144 width=101) (actual
> time=17.240..1123.751 rows=262144 loops=1)
>   Resources: sys=210.000ms user=430.000ms read=33.6MB
>   Buffers: shared read=4298
>  Total runtime: 1548.651 ms
> (4 rows)
>
> The main change is to make it work under Windows. At least I think the
> changes should make it work under Windows, I haven't been able to test
> it. Actually I'm not to happy with the way I did it, I would be more
> inclined to hack the getrusagestub,h definition of struct rusage to
> have an instr_time in it so that we can use the same macros directly.
> But that's more changes than I would be happy making without being
> able to compile them to test them.

I don't really think these changes to the INSTR macros make much
sense.  The macros don't really add much notational convenience;
they're mostly wrappers to make the WIN32 and non-WIN32 cases work
similarly for the instrumentation stuff, so hacking them up to use
them for this doesn't seem like it adds anything.  Just do whatever it
is you need to do, or define macros locally in explain.c.

It doesn't make much sense to me to normalize the memory for this
output to a variable unit when the other memory values we use in
explain.c are still going to be printed as kB.  I think we should just
print it in kB and call it good.  Alternatively, we could apply the
same normalization algorithm across the board, but I don't think
that's as good.

I continue to feel strongly that the choice of EXPLAIN format should
only affect the format, not the choice of information to be displayed.
 Using the verbose option to control how much data the resource option
prints is, I think, not a good idea.  If you want to have two modes,
one for partial rusage data and one for full rusage data, you can just
as easily implement EXPLAIN (RESOURCE [PARTIAL|FULL]).  I believe that
the existing grammar is adequate to support that; you'd just need to
write the appropriate DefElem-parsing code.  But personally I'd just
print the whole kit and kaboodle regardless.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Count backend self-sync calls

2010-11-15 Thread Greg Smith

Jeff Janes wrote:

Do you know where this competition is happening?  Is it on the
platters, or is it in the hard drive write cache (I thought high-end
hardware had tagged writes to avoid that), or in the kernel?
  


Kernel.  Linux systems with lots of memory will happily queue up 
gigabytes of memory in their write cache, only getting serious about 
writing it out to disk when demanded to by fsync.




This makes sense if we just need to append to a queue.  But once the
queue is full and we are about to do a backend fsync, might it make
sense to do a little more work to look for dups?
  


One of the paths I'd like to follow is experimenting with both sorting 
writes by file and looking for duplication in the queues.  I think a 
basic, simple sync spreading approach needs to get finished first 
through; this sort of thing would then be an optimization on top of it.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services and Supportwww.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Improved parallel make support

2010-11-15 Thread Robert Haas
On Mon, Nov 15, 2010 at 4:10 PM, Peter Eisentraut  wrote:
>> ld: file not found: ../../../../../../src/backend/postgres
>> collect2: ld returned 1 exit status
>> make[3]: *** [ascii_and_mic.so] Error 1
>> make[2]: *** [all-ascii_and_mic-recurse] Error 2
>> make[1]: *** [all-backend/utils/mb/conversion_procs-recurse] Error 2
>> make[1]: *** Waiting for unfinished jobs
>
> Untested, but the following should help you, by partially restoring the
> old builder order on platforms that need it.

Very odd, but this completely blew up the first time I tried it.

In file included from path.c:34:
pg_config_paths.h:2:11: error: missing terminating " character
In file included from path.c:34:
pg_config_paths.h:2: error: missing terminating " character
path.c:49: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’
before ‘static’

That file had a line in it that looked like this:

postgresql"

On a subsequent retry, I got:

gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing
-fwrapv -g -Werror  -bundle -multiply_defined suppress -o
dict_snowball.so dict_snowball.o api.o utilities.o
stem_ISO_8859_1_danish.o stem_ISO_8859_1_dutch.o
stem_ISO_8859_1_english.o stem_ISO_8859_1_finnish.o
stem_ISO_8859_1_french.o stem_ISO_8859_1_german.o
stem_ISO_8859_1_hungarian.o stem_ISO_8859_1_italian.o
stem_ISO_8859_1_norwegian.o stem_ISO_8859_1_porter.o
stem_ISO_8859_1_portuguese.o stem_ISO_8859_1_spanish.o
stem_ISO_8859_1_swedish.o stem_ISO_8859_2_romanian.o
stem_KOI8_R_russian.o stem_UTF_8_danish.o stem_UTF_8_dutch.o
stem_UTF_8_english.o stem_UTF_8_finnish.o stem_UTF_8_french.o
stem_UTF_8_german.o stem_UTF_8_hungarian.o stem_UTF_8_italian.o
stem_UTF_8_norwegian.o stem_UTF_8_porter.o stem_UTF_8_portuguese.o
stem_UTF_8_romanian.o stem_UTF_8_russian.o stem_UTF_8_spanish.o
stem_UTF_8_swedish.o stem_UTF_8_turkish.o -L../../../src/port
-L/opt/local/lib -Wl,-dead_strip_dylibs  -Werror  -bundle_loader
../../../src/backend/postgres
ld: file not found: ../../../src/backend/postgres
collect2: ld returned 1 exit status
make[2]: *** [dict_snowball.so] Error 1
make[1]: *** [all-backend/snowball-recurse] Error 2
make[1]: *** Waiting for unfinished jobs

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Improved parallel make support

2010-11-15 Thread Tom Lane
Robert Haas  writes:
> Very odd, but this completely blew up the first time I tried it.

> In file included from path.c:34:
> pg_config_paths.h:2:11: error: missing terminating " character

FWIW, I didn't replicate that, but I did get this during one attempt
with -j4:

/usr/bin/ranlib: archive member: libecpg.a(typename.o) size too large (archive \
member extends past the end of the file)
ar: internal ranlib command failed
make[5]: *** [libecpg.a] Error 1
make[5]: *** Deleting file `libecpg.a'
make[4]: *** [submake-ecpglib] Error 2
make[3]: *** [all-compatlib-recurse] Error 2
make[3]: *** Waiting for unfinished jobs
/usr/bin/ranlib: can't stat file output file: libecpg.a (No such file or direct\
ory)
ar: internal ranlib command failed
make[4]: *** [libecpg.a] Error 1
make[3]: *** [all-ecpglib-recurse] Error 2
make[2]: *** [all-ecpg-recurse] Error 2
make[1]: *** [all-interfaces-recurse] Error 2
make[1]: *** Waiting for unfinished jobs
In file included from gram.y:12101:
scan.c: In function 'yy_try_NUL_trans':
scan.c:16242: warning: unused variable 'yyg'
make: *** [all-src-recurse] Error 2

Examination of the stdout trace makes it appear that two independent
make runs were trying to build src/interfaces/ecpg/ecpglib/libecpg.a
concurrently.  I haven't dug into it but I suspect that there are
multiple dependency chains leading to ecpg/ecpglib/.  I wonder whether
what you saw was also the result of multiple recursion paths leading
to trying to build the same target at once.  If so, that's going to
put a rather serious crimp in the idea of constraining build order
by adding more dependencies.

> On a subsequent retry, I got:
> ld: file not found: ../../../src/backend/postgres
> collect2: ld returned 1 exit status
> make[2]: *** [dict_snowball.so] Error 1

Yeah, I got that too, but adding all-backend/snowball-recurse to the
set of dependencies proposed in Peter's patch made it go away.
A cursory search for other appearances of -bundle_loader in the
make output suggests that contrib/ and src/test/regress/ are also
at risk.  This leads me to the thought that concentrating knowledge
of this issue in src/Makefile is not the right way to go at it.
And, again, the more paths leading to a make attempt in the same
directory, the worse off we are as far as the first problem goes.
But surely the "make" guys recognized this risk and have a solution?
Otherwise parallel make would be pretty useless.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Improved parallel make support

2010-11-15 Thread Tom Lane
I tried another experiment, which was "make -j100 all" on my relatively
new Linux box (2 dual-core CPUs).  It blew up real good, as per attached
stderr output, which shows evidence of more missing dependencies as well
as some additional cases of concurrent attempts to build the same
target.

It's clear to me that we are very far from having a handle on what it'll
really take to run parallel builds safely, and I am therefore now of the
opinion that we ought to revert the patch.  Hypothetical gains in
parallelism are useless if we can't actually use parallel building
reliably.  We are currently worse off than before in terms of time to
build the system.

regards, tom lane

/usr/bin/ld: cannot find -lpgport
collect2: ld returned 1 exit status
make[3]: *** [refint.so] Error 1
make[2]: *** [../../../contrib/spi/refint.so] Error 2
make[2]: *** Waiting for unfinished jobs
path.c: In function 'get_html_path':
path.c:615: error: 'HTMLDIR' undeclared (first use in this function)
path.c:615: error: (Each undeclared identifier is reported only once
path.c:615: error: for each function it appears in.)
path.c: In function 'get_man_path':
path.c:624: error: 'MANDIR' undeclared (first use in this function)
make[3]: *** [path.o] Error 1
make[3]: *** Deleting file `path.o'
make[3]: *** Waiting for unfinished jobs
/usr/bin/ld: cannot find -lpgport
collect2: ld returned 1 exit status
make[3]: *** [autoinc.so] Error 1
make[2]: *** [../../../contrib/spi/autoinc.so] Error 2
make[2]: *** [submake-libpgport] Error 2
make[2]: *** Waiting for unfinished jobs
ln: creating symbolic link `libpgtypes.so.3': File exists
make[4]: *** [libpgtypes.so.3.2] Error 1
make[4]: *** Deleting file `libpgtypes.so.3.2'
make[3]: *** [all-pgtypeslib-recurse] Error 2
make[3]: *** Waiting for unfinished jobs
make[1]: *** [all-test/regress-recurse] Error 2
make[1]: *** Waiting for unfinished jobs
In file included from gram.y:12102:
scan.c: In function 'yy_try_NUL_trans':
scan.c:16246: warning: unused variable 'yyg'
ln: creating symbolic link `libpq.so.5': File exists
make[4]: *** [libpq.so.5.4] Error 1
make[4]: *** Deleting file `libpq.so.5.4'
make[3]: *** [submake-libpq] Error 2
make[2]: *** [all-pg_dump-recurse] Error 2
make[2]: *** Waiting for unfinished jobs
ln: creating symbolic link `libpq.so.5': File exists
make[6]: *** [libpq.so.5.4] Error 1
make[6]: *** Deleting file `libpq.so.5.4'
make[5]: *** [submake-libpq] Error 2
make[4]: *** [submake-ecpglib] Error 2
make[3]: *** [all-compatlib-recurse] Error 2
/usr/bin/ld: cannot open linker script file 
../../../src/interfaces/libpq/libpq.so: No such file or directory
collect2: ld returned 1 exit status
make[3]: *** [psql] Error 1
make[2]: *** [all-psql-recurse] Error 2
../../../src/interfaces/libpq/libpq.a(fe-secure.o): In function 
`pq_reset_sigpipe':
/home/tgl/pgsql/src/interfaces/libpq/fe-secure.c:1426: undefined reference to 
`pthread_sigmask'
../../../src/interfaces/libpq/libpq.a(fe-secure.o): In function 
`pq_block_sigpipe':
/home/tgl/pgsql/src/interfaces/libpq/fe-secure.c:1363: undefined reference to 
`pthread_sigmask'
collect2: ld returned 1 exit status
make[3]: *** [createdb] Error 1
make[3]: *** Waiting for unfinished jobs
../../../src/interfaces/libpq/libpq.a(fe-secure.o): In function 
`pq_reset_sigpipe':
/home/tgl/pgsql/src/interfaces/libpq/fe-secure.c:1426: undefined reference to 
`pthread_sigmask'
../../../src/interfaces/libpq/libpq.a(fe-secure.o): In function 
`pq_block_sigpipe':
/home/tgl/pgsql/src/interfaces/libpq/fe-secure.c:1363: undefined reference to 
`pthread_sigmask'
collect2: ld returned 1 exit status
make[3]: *** [createuser] Error 1
../../../src/interfaces/libpq/libpq.a(fe-secure.o): In function 
`pq_reset_sigpipe':
/home/tgl/pgsql/src/interfaces/libpq/fe-secure.c:1426: undefined reference to 
`pthread_sigmask'
../../../src/interfaces/libpq/libpq.a(fe-secure.o): In function 
`pq_block_sigpipe':
/home/tgl/pgsql/src/interfaces/libpq/fe-secure.c:1363: undefined reference to 
`pthread_sigmask'
collect2: ld returned 1 exit status
make[3]: *** [dropuser] Error 1
../../../src/interfaces/libpq/libpq.a(fe-secure.o): In function 
`pq_reset_sigpipe':
/home/tgl/pgsql/src/interfaces/libpq/fe-secure.c:1426: undefined reference to 
`pthread_sigmask'
../../../src/interfaces/libpq/libpq.a(fe-secure.o): In function 
`pq_block_sigpipe':
/home/tgl/pgsql/src/interfaces/libpq/fe-secure.c:1363: undefined reference to 
`pthread_sigmask'
collect2: ld returned 1 exit status
make[3]: *** [vacuumdb] Error 1
../../../src/interfaces/libpq/libpq.a(fe-secure.o): In function 
`pq_reset_sigpipe':
/home/tgl/pgsql/src/interfaces/libpq/fe-secure.c:1426: undefined reference to 
`pthread_sigmask'
../../../src/interfaces/libpq/libpq.a(fe-secure.o): In function 
`pq_block_sigpipe':
/home/tgl/pgsql/src/interfaces/libpq/fe-secure.c:1363: undefined reference to 
`pthread_sigmask'
collect2: ld returned 1 exit status
make[3]: *** [dropdb] Error 1
../../../src/interfaces/li