Re: [HACKERS] Overhauling GUCS

2008-06-10 Thread Heikki Linnakangas
Tom Lane wrote: Josh Berkus <[EMAIL PROTECTED]> writes: Oh, and wal_buffers, the default for which we should just change if it weren't for SHMMAX. Uh, why? On a workload of mostly small transactions, what value is there in lots of wal_buffers? None. But there's also little to no harm in hav

Re: [HACKERS] why copy tuple in the end of trigger when nothing changed in NEW OLD record variable

2008-06-10 Thread billy
Tom Lane, er, your explanation is reasonable. But at least the comment if (newtuple != tuple) /* modified by Trigger(s) */ { . is likely to misdirect us. It took me a few hours to figure it out. :-( === 2008-06-

Re: [HACKERS] why copy tuple in the end of trigger when nothing changed in NEW OLD record variable

2008-06-10 Thread Tom Lane
"billy" <[EMAIL PROTECTED]> writes: > I think we can add some judgment conditions in function > plpgsql_exec_trigger() to avoid this problem. I don't especially see the point of adding extra complexity here. AFAICS you are talking about avoiding one or two palloc/pfree cycles, which is surely

[HACKERS] why copy tuple in the end of trigger when nothing changed in NEW OLD record variable

2008-06-10 Thread billy
Hi pghackers: version 8.3.0 in function plpgsql_exec_trigger. in a trigger, if NEW is returned as the result and we do nothing to NEW. for example, we have a table like this: create table test (a int); insert into test values(1); and a trigger like:

Re: [HACKERS] Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics

2008-06-10 Thread Andrew Dunstan
Tom Lane wrote: This gets back to the discussions at PGCon about needing to have a more explicit representation of partitioning. Right now, for a many-partition table we spend huge amounts of time deriving the expected behavior from first principles, each time we make a plan. And even then we

Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-10 Thread ITAGAKI Takahiro
Josh Berkus <[EMAIL PROTECTED]> wrote: > I still see having 2 different settings: > > Synchronous: XID visibility is pushed to the master. Maintains synchronous > failover, and users are expected to run *1* master to *1* slave for most > installations. > > Asynchronous: replication stops on

Re: [HACKERS] RFD: ALTER COLUMN .. SET STORAGE COMPRESSED;

2008-06-10 Thread Dawid Kuroczko
On Tue, Jun 10, 2008 at 5:25 PM, Tom Lane <[EMAIL PROTECTED]> wrote: > "Dawid Kuroczko" <[EMAIL PROTECTED]> writes: >> As we already have four types of ALTER COLUMN .. SET STORAGE >> { PLAIN | EXTERNAL | EXTENDED | MAIN } I would like to add >> "COMPRESSED" which would force column compression (if

Re: [HACKERS] Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics

2008-06-10 Thread Tom Lane
Gregory Stark <[EMAIL PROTECTED]> writes: > It's possible that the second option I described -- teaching Append when to > use something other than sum() -- would only work in the cases where > constraint exclusion could be fixed though. In which case having fractional > row counts might actually be

Re: [HACKERS] Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics

2008-06-10 Thread Gregory Stark
"Tom Lane" <[EMAIL PROTECTED]> writes: > Gregory Stark <[EMAIL PROTECTED]> writes: >> The screw case I've seen is when you have a large partitioned table where >> constraint_exclusion fails to exclude the irrelevant partitions. You're going >> to get 0 rows from all but the one partition which con

Re: [HACKERS] Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics

2008-06-10 Thread Tom Lane
Gregory Stark <[EMAIL PROTECTED]> writes: > The screw case I've seen is when you have a large partitioned table where > constraint_exclusion fails to exclude the irrelevant partitions. You're going > to get 0 rows from all but the one partition which contains the 1 row you're > looking for. But sin

Re: [HACKERS] Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics

2008-06-10 Thread Gregory Stark
"Tom Lane" <[EMAIL PROTECTED]> writes: > (In fact, I don't think the plan would change, in this case. The reason > for the clamp to 1 row is to avoid foolish results for join situations.) The screw case I've seen is when you have a large partitioned table where constraint_exclusion fails to excl

Re: [HACKERS] Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics

2008-06-10 Thread Tom Lane
"Nathan Boley" <[EMAIL PROTECTED]> writes: >> (1) On what grounds do you assert the above? > For a table with 100 non-mcv rows, the planner estimates a result > set of cardinality 100/20 = 5, not 1. The real problem in that situation is that you need another twenty slots in the MCV li

Re: [HACKERS] Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics

2008-06-10 Thread Nathan Boley
>> Why? What if ( somehow ) we knew that there was a 90% chance that >> query would return an empty result set on a big table with 20 non-mcv >> distinct values. Currently the planner would always choose a seq scan, >> where an index scan might be better. > > (1) On what grounds do you assert the

Re: [HACKERS] Proposal: GiST constraints

2008-06-10 Thread Teodor Sigaev
In theory, any indexed value in index (for GiST, after compression) should fit into page at least. So are you saying we should dedicate one page multiplied by max_connections in shared memory? It's possible to do it that way, but Yes, we could. Storing index keys in shared memory allows minimiz

Re: [HACKERS] Automating our version-stamping a bit better

2008-06-10 Thread Tom Lane
Peter Eisentraut <[EMAIL PROTECTED]> writes: > Am Monday, 9. June 2008 schrieb Tom Lane: >> So while tagging the upcoming releases, I got annoyed once again about >> what a tedious, error-prone bit of donkeywork it is. > Could you explain what the problem is? Your script sounds like an ad hoc >

Re: [HACKERS] Automating our version-stamping a bit better

2008-06-10 Thread Peter Eisentraut
Am Monday, 9. June 2008 schrieb Tom Lane: > So while tagging the upcoming releases, I got annoyed once again about > what a tedious, error-prone bit of donkeywork it is. Could you explain what the problem is? Your script sounds like an ad hoc workaround for some problem, but I haven't seen the p

Re: [HACKERS] Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics

2008-06-10 Thread Tom Lane
"Nathan Boley" <[EMAIL PROTECTED]> writes: >>> If we query on values that aren't in the table, the planner will >>> always overestimate the expected number of returned rows because it ( >>> implicitly ) assumes that every query will return at least 1 record. >> >> That's intentional and should not

Re: [HACKERS] Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics

2008-06-10 Thread Nathan Boley
>> If we query on values that aren't in the table, the planner will >> always overestimate the expected number of returned rows because it ( >> implicitly ) assumes that every query will return at least 1 record. > > That's intentional and should not be changed. Why? What if ( somehow ) we knew t

Re: [HACKERS] Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics

2008-06-10 Thread Tom Lane
"Nathan Boley" <[EMAIL PROTECTED]> writes: > If we query on values that aren't in the table, the planner will > always overestimate the expected number of returned rows because it ( > implicitly ) assumes that every query will return at least 1 record. That's intentional and should not be changed.

Re: [HACKERS] Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics

2008-06-10 Thread Nathan Boley
>> >> > One more problem with low ndistinct values is that the condition might >> >> > very well >> >> > hit no rows at all. But Idea 1 will largely overestimate the number of >> >> > hits. >> >> Thats a good point, but I don't see a clear solution. Maybe we could > > I think that MCVs are the so

Re: [HACKERS] Overhauling GUCS

2008-06-10 Thread Ron Mayer
Gregory Stark wrote: "Joshua D. Drake" <[EMAIL PROTECTED]> writes: ...default_statistics_target?"..."Uhh 10." Ah, but we only ever hear about the cases where it's wrong of course. In other words even if we raised it to some optimal value we would still have precisely the same experience of see

Re: [HACKERS] Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics

2008-06-10 Thread Jeff Davis
On Tue, 2008-06-10 at 08:51 -0700, Nathan Boley wrote: > >> > One more problem with low ndistinct values is that the condition might > >> > very well > >> > hit no rows at all. But Idea 1 will largely overestimate the number of > >> > hits. > > Thats a good point, but I don't see a clear solutio

Re: [HACKERS] Overhauling GUCS

2008-06-10 Thread Tom Lane
Josh Berkus <[EMAIL PROTECTED]> writes: > Oh, and wal_buffers, the default for which we should just change if it > weren't for SHMMAX. Uh, why? On a workload of mostly small transactions, what value is there in lots of wal_buffers? regards, tom lane -- Sent via pgsql-ha

Re: [CORE] [HACKERS] Automating our version-stamping a bit better

2008-06-10 Thread Tom Lane
"Marc G. Fournier" <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> What I was thinking was just to have the script >> print out something like >> >> Tagged tree as 8.3.4 >> Don't forget to run autoconf 2.59 before committing > I like that one ... I've checked in a script to do this --- executin

Re: [HACKERS] Overhauling GUCS

2008-06-10 Thread Josh Berkus
On Tuesday 10 June 2008 09:37, Josh Berkus wrote: > Robert, > > > shared_buffers > > effective_cache_size > > default_stats_target > > work_mem > > maintainance_work_mem > > listen_address > > max_connections > > the fsm parameters > > checkpoint_segements > > random_page_cost > > My list is very s

Re: [HACKERS] Proposal: GiST constraints

2008-06-10 Thread Jeff Davis
On Tue, 2008-06-10 at 16:59 +0400, Teodor Sigaev wrote: > > This can be solved by my proposal, but I just don't know how it would > > apply to something like GIN, for instance. It could replace the unique > Hmm, your proposal isn't applicable to GIN, because GIN stores a lot of keys > for > only

Re: [HACKERS] Overhauling GUCS

2008-06-10 Thread Josh Berkus
Robert, > shared_buffers > effective_cache_size > default_stats_target > work_mem > maintainance_work_mem > listen_address > max_connections > the fsm parameters > checkpoint_segements > random_page_cost My list is very similar, execept that I drop random_page_cost and add synchronous_commit, au

Re: [HACKERS] Overhauling GUCS

2008-06-10 Thread Josh Berkus
Ron, > I wonder if the fastest way to generate the configurator > would be to simply ask everyone to post their tuned > postgresql.conf files along with a brief description of > the use case for that file. The we could group the > use-cases into various classes; and average the values > of the su

Re: [HACKERS] Timezone abbreviations - out but not in?

2008-06-10 Thread Tom Lane
"Dave Page" <[EMAIL PROTECTED]> writes: > Or just output offsets in every case :-p Which is what the default ISO datestyle does ... I believe pg_dump is careful to force ISO style for exactly this reason. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-ha

Re: [HACKERS] Timezone abbreviations - out but not in?

2008-06-10 Thread Tom Lane
Alvaro Herrera <[EMAIL PROTECTED]> writes: > ... A more useful restriction would be to only output > those that are in the set of input-acceptable abbreviations, but perhaps > this is not easy to implement. I think that is actually what Dave is suggesting, but I don't really agree with it. To me

Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-10 Thread Josh Berkus
All, > > For the slave to not interfere with the master at all, we would need to > > delay application of WAL files on each slave until visibility on that > > slave allows the WAL to be applied, but in that case we would have > > long-running transactions delay data visibility of all slave session

Re: [HACKERS] Timezone abbreviations - out but not in?

2008-06-10 Thread Dave Page
On Tue, Jun 10, 2008 at 4:51 PM, Alvaro Herrera <[EMAIL PROTECTED]> wrote: > Dave Page wrote: >> Right, but shouldn't we always output something we know we can read >> back in (unambiguously), assuming a server with no user defined >> abbreviations? > > That makes no sense because it amounts to say

Re: [HACKERS] Timezone abbreviations - out but not in?

2008-06-10 Thread Alvaro Herrera
Dave Page wrote: > On Tue, Jun 10, 2008 at 4:37 PM, Tom Lane <[EMAIL PROTECTED]> wrote: > > "Dave Page" <[EMAIL PROTECTED]> writes: > >> It seems like a bug that we happily output PKST as a timezone (in a > >> 'timestamp with time zone'), but won't accept it back in. > > > > [ shrug... ] The set o

Re: [HACKERS] Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics

2008-06-10 Thread Nathan Boley
>> > One more problem with low ndistinct values is that the condition might >> > very well >> > hit no rows at all. But Idea 1 will largely overestimate the number of >> > hits. Thats a good point, but I don't see a clear solution. Maybe we could look at past queries and keep track of how often

Re: [HACKERS] Timezone abbreviations - out but not in?

2008-06-10 Thread Tom Lane
"Dave Page" <[EMAIL PROTECTED]> writes: > On Tue, Jun 10, 2008 at 4:37 PM, Tom Lane <[EMAIL PROTECTED]> wrote: >> [ shrug... ] The set of timezone abbrevs recognized on input is >> user-configurable, so that situation will always be possible. > Right, but shouldn't we always output something we k

Re: [HACKERS] Timezone abbreviations - out but not in?

2008-06-10 Thread Dave Page
On Tue, Jun 10, 2008 at 4:37 PM, Tom Lane <[EMAIL PROTECTED]> wrote: > "Dave Page" <[EMAIL PROTECTED]> writes: >> It seems like a bug that we happily output PKST as a timezone (in a >> 'timestamp with time zone'), but won't accept it back in. > > [ shrug... ] The set of timezone abbrevs recognized

Re: [HACKERS] Timezone abbreviations - out but not in?

2008-06-10 Thread Tom Lane
"Dave Page" <[EMAIL PROTECTED]> writes: > It seems like a bug that we happily output PKST as a timezone (in a > 'timestamp with time zone'), but won't accept it back in. [ shrug... ] The set of timezone abbrevs recognized on input is user-configurable, so that situation will always be possible.

Re: [HACKERS] pg_statistics and sample size WAS: Overhauling GUCS

2008-06-10 Thread Josh Berkus
Greg, > The analogous case in our situation is not having 300 million distinct > values, since we're not gathering info on specific values, only the > buckets. We need, for example, 600 samples *for each bucket*. Each bucket > is chosen to have the same number of samples in it. So that means that

Re: [HACKERS] libpq support for arrays and composites

2008-06-10 Thread Merlin Moncure
On 6/10/08, Tom Lane <[EMAIL PROTECTED]> wrote: > "Merlin Moncure" <[EMAIL PROTECTED]> writes: > > IMO, support for binary is critical. Because of the interplay of the > > array and composite out formats, the number of backslashes grows > > exponentially (!) with nesting levels. This makes text f

Re: [HACKERS] RFD: ALTER COLUMN .. SET STORAGE COMPRESSED;

2008-06-10 Thread Alvaro Herrera
Dawid Kuroczko escribió: > Currently the TOASTing code does its magic when whole tuple is > larger than TOAST_TUPLE_TARGET which happens to be around 2KB. > > There are times though when one is willing to trade using (fast) CPU to > reduce amount of (slow) I/O. A data warehousing types of worklo

Re: [HACKERS] RFD: ALTER COLUMN .. SET STORAGE COMPRESSED;

2008-06-10 Thread Tom Lane
"Dawid Kuroczko" <[EMAIL PROTECTED]> writes: > As we already have four types of ALTER COLUMN .. SET STORAGE > { PLAIN | EXTERNAL | EXTENDED | MAIN } I would like to add > "COMPRESSED" which would force column compression (if column is > smaller than some minimun, I guess somwehwere between 16 and 3

Re: [HACKERS] libpq support for arrays and composites

2008-06-10 Thread Tom Lane
"Merlin Moncure" <[EMAIL PROTECTED]> writes: > On 6/8/08, Andrew Dunstan <[EMAIL PROTECTED]> wrote: >> Tom Lane wrote: >>> Are you intending that these operations support both text and binary >>> results? >> >> I'm a bit open on that. > IMO, support for binary is critical. Because of the interpl

Re: [HACKERS] libpq support for arrays and composites

2008-06-10 Thread Merlin Moncure
On 6/8/08, Andrew Dunstan <[EMAIL PROTECTED]> wrote: > Tom Lane wrote: > > Andrew Dunstan <[EMAIL PROTECTED]> writes: > > > One complicating factor I see is that there is no protocol level support > for anything other than simple objects - each data value is simply a stream > of bytes of a known le

[HACKERS] why copy tuple in the end of trigger when nothing changed in NEW, OLD record variable

2008-06-10 Thread 汪琦
hello, everyone: version 8.3.0 in function plpgsql_exec_trigger. in a trigger, if NEW is returned as the result and we do nothing to NEW. for example, we have a table like this: create table test (a int); insert into test values(1); and a trigge

[HACKERS] Timezone abbreviations - out but not in?

2008-06-10 Thread Dave Page
One of our guys in Pakistan noticed a problem with Slony that seems to have manifested itself since the last zic update. Slony uses timeofday() as the default value for a timestamp column: -- Executing query: SET timezone='Asia/Karachi'; SELECT timeofday()::timestamp with time zone; ERROR: inval

Re: [HACKERS] RFD: ALTER COLUMN .. SET STORAGE COMPRESSED;

2008-06-10 Thread Zdenek Kotala
Dawid Kuroczko napsal(a): Hello! Currently the TOASTing code does its magic when whole tuple is larger than TOAST_TUPLE_TARGET which happens to be around 2KB. There are times though when one is willing to trade using (fast) CPU to reduce amount of (slow) I/O. A data warehousing types of worklo

Re: [HACKERS] Proposal: GiST constraints

2008-06-10 Thread Teodor Sigaev
This can be solved by my proposal, but I just don't know how it would apply to something like GIN, for instance. It could replace the unique Hmm, your proposal isn't applicable to GIN, because GIN stores a lot of keys for only one value to be indexed. being inserted by other concurrent transac

Re: [HACKERS] Proposal: GiST constraints

2008-06-10 Thread Teodor Sigaev
I would like to consider adding constraints to GiST indexes. I think it is possible to add constraints that are more sophisticated than just UNIQUE. My use case is a non-overlapping constraint, but I think it's possible for this to be more general. Sounds good The idea is to make an array in sh

[HACKERS] RFD: ALTER COLUMN .. SET STORAGE COMPRESSED;

2008-06-10 Thread Dawid Kuroczko
Hello! Currently the TOASTing code does its magic when whole tuple is larger than TOAST_TUPLE_TARGET which happens to be around 2KB. There are times though when one is willing to trade using (fast) CPU to reduce amount of (slow) I/O. A data warehousing types of workload most notably. Rarely use

[HACKERS] Re: Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics

2008-06-10 Thread Zeugswetter Andreas OSB sIT
> > I think for low ndistinct values we will want to know the exact > > value + counts and not a bin. So I think we will want > additional stats rows > > that represent "value 'a1' stats". > > Isn't that what our most frequent values list does? Maybe ? Do we have the relevant stats for each ? But

Re: [HACKERS] Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics

2008-06-10 Thread Gregory Stark
"Zeugswetter Andreas OSB sIT" <[EMAIL PROTECTED]> writes: > I think for low ndistinct values we will want to know the exact > value + counts and not a bin. So I think we will want additional stats rows > that represent "value 'a1' stats". Isn't that what our most frequent values list does? --

Re: [HACKERS] Overhauling GUCS

2008-06-10 Thread Gregory Stark
"Josh Berkus" <[EMAIL PROTECTED]> writes: > Greg, > >> Speak to the statisticians. Our sample size is calculated using the same >> theory behind polls which sample 600 people to learn what 250 million >> people are going to do on election day. You do NOT need (significantly) >> larger samples for

Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-10 Thread Heikki Linnakangas
Gregory Stark wrote: Instead of zeroing bytes and depending on compression why not just pass an extra parameter to the archive command with the offset to the logical end of data. Because the archiver process doesn't have that information. -- Heikki Linnakangas EnterpriseDB http://www.ent

Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-10 Thread Gregory Stark
"Greg Smith" <[EMAIL PROTECTED]> writes: > On Mon, 9 Jun 2008, Tom Lane wrote: > >> It should also be pointed out that the whole thing becomes uninteresting >> if we get real-time log shipping implemented. So I see absolutely no >> point in spending time integrating pg_clearxlogtail now. > > Ther

Re: [HACKERS] Strange issue with GiST index scan taking far too long

2008-06-10 Thread Mark Cave-Ayland
Tom Lane wrote: So you are saying it is de-toasted 32880 times, in this case? If not, where are the repeated de-toastings happening? Inside the index support functions. I'm thinking we could fix this by forcibly detoasting values passed as index scan keys, but it's not quite clear where's the

Re: [HACKERS] Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics

2008-06-10 Thread Zeugswetter Andreas OSB sIT
> Obviously we run into problems when > a) we have a poor estimate for ndistinct - but then we have > worse problems > b) our length measure doesn't correspond well with ndistinct > in an interval One more problem with low ndistinct values is that the condition might very well hit no rows at all.

Re: [HACKERS] Strange issue with GiST index scan taking far too long

2008-06-10 Thread Mark Cave-Ayland
Tom Lane wrote: Well, yeah, because the first thing it does is pg_detoast_datum. Just as a cross-check, try changing it to copy the value without forcibly detoasting --- I'll bet it's still slow then. Yeah, that appears to be exactly the case. After some grepping through various header files