date:20130124

On 24 January 2013 07:25, Pavan Deolasee  wrote:
> On Thu, Jan 24, 2013 at 12:36 PM, Jeff Janes  wrote:
>> The docs on bgworker twice refer to "HOT standby".  I don't think that in
>> either case, the "hot" needs emphasis, and if it does making it look like an
>> acronym (one already used for something else) is probably not the way to do
>> it.
>
> I think it should it be "Hot Standby" for consistency. +1 for changing
> it from HOT to hot/Hot anyway

Patch applied. Thanks

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BUG #6510: A simple prompt is displayed using wrong charset

2013-01-24 Thread Craig Ringer

On 01/24/2013 01:06 AM, Alexander Law wrote:
> Hello,
> Please let me know if I can do something to get the bug fix
> (https://commitfest.postgresql.org/action/patch_view?id=902) committed.
> I would like to fix other bugs related to postgres localization, but I
> am not sure yet how to do it.

For anyone looking for the history, the 1st post on this topic is here:

http://www.postgresql.org/message-id/e1s3twb-0004oy...@wrigleys.postgresql.org

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] logical changeset generation v4 - Heikki's thoughts about the patch state

On 24 January 2013 01:17, Robert Haas  wrote:

> I agree.  The thing that scares me about the logical replication stuff
> is not that it might be slow (and if your numbers are to be believed,
> it isn't), but that I suspect it's riddled with bugs and possibly some
> questionable design decisions.  If we commit it and release it, then
> we're going to be stuck maintaining it for a very, very long time.  If
> it turns out to have serious bugs that can't be fixed without a new
> major release, it's going to be a serious black eye for the project.
>
> Of course, I have no evidence that that will happen.

This is a generic argument against applying any invasive patch. I
agree 9.2 had major bugs on release, though that was because of the
invasive nature of some of the changes, even in seemingly minor
patches.

The most invasive and therefore risky changes in this release are
already committed - changes to the way WAL reading and timelines work.
If we don't apply a single additional patch in this CF, we will still
in my opinion have a major requirement for beta testing prior to
release.

The code executed here is isolated to users of the new feature and is
therefore low risk to non-users. Of course there will be bugs.
Everybody understands what new feature means and we as a project
aren't exposed to risks from this. New feature also means
groundbreaking new capabilities, so the balance of high reward, low
risk means this gets my vote to apply. I'm just about to spend some
days giving a final review on it to confirm/refute that opinion in
technical detail.

Code using these features is available and marked them clearly as full
copyright transfer to PGDG, TPL licenced. That code is external not by
author's choice, but at the specific request of the project to make it
thay way. I personally will be looking to add code to core over time.
It was useful for everybody that replication solutions started out of
core, but replication is now a core requirement for databases and we
must fully deliver on that thought.

I agree with your concern re: checksums and foreign key locks. FK
locks has had considerable review and support, so I expect that to be
a manageable issue.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Setting visibility map in VACUUM's second phase

2013-01-24 Thread Jeff Janes

On Friday, December 7, 2012, Pavan Deolasee wrote:

>

Revised patch attached. This time much less invasive. I added a new
> function heap_page_is_all_visible() to check if the given page is
> all-visible and also return the visibility cutoff xid.

Hi Pavan,

lazy_vacuum_page now pins and unpins the vmbuffer for each page it marks
all-visible, which seems like a lot of needless traffic since the next
vmbuffer is likely to be the same as the previous one.

Could it instead get vmbuffer passed down to it from the two calling sites?
  (One call site would have to have it introduced, just for this purpose,
the other should be able to use its existing one.)

Cheers,

Jeff

Re: [HACKERS] [sepgsql 1/3] add name qualified creation label

2013-01-24 Thread Kohei KaiGai

2013/1/24 Tom Lane :
> John R Pierce  writes:
>> On 1/23/2013 8:32 PM, Tom Lane wrote:
>>> FWIW, in Fedora-land I see: ...
>
>> I'd be far more interested in what is in RHEL and CentOS.Fedora,
>> with its 6 month obsolescence cycle, is of zero interest to me for
>> deploying database servers.
>
> But of course Fedora is also the upstream that will become RHEL7
> and beyond.
>
>> EL6 has libselinux 2.0.94
>> EL5 has libselinux 1.33.4
>
> sepgsql already requires libselinux 2.0.99, so it doesn't appear to me
> that moving that goalpost is going to change things one way or the other
> for the existing RHEL branches.  I couldn't ship contrib/sepgsql today
> in those branches.
>
> It might be that the update timing makes a bigger difference in some
> other distros, though.  To return to Heikki's original point about
> Debian, what are they shipping today?
>
Even though I'm not good at release cycle of Debian, I tried to check
the shipped version of postgresql and libselinux for stable, testing,
unstable and experimental release.
I'm not certain why they don't push postgresql-9.2 into experimental
release yet. However, it seems to me optimistic libselinux-2.1.10 being
bundled on the timeline of postgresql-9.3.

If someone familiar with Debian's release cycle, I'd like to see the suggestion.

* Debian (stable) ... postgresql-8.4 + libselinux-2.0.96
http://packages.debian.org/en/squeeze/postgresql
http://packages.debian.org/en/source/squeeze/libselinux

* Debian (testing) ... postgresql-9.1 + libselinux-2.1.9
http://packages.debian.org/en/wheezy/postgresql
http://packages.debian.org/en/source/wheezy/libselinux

* Debian (unstable) ... postgresql-9.1 + libselinux-2.1.9
http://packages.debian.org/en/sid/postgresql
http://packages.debian.org/en/source/sid/libselinux

* Debian (experimental) ... postgresql-9.1 + libselinux-2.1.12
http://packages.debian.org/en/experimental/postgresql
http://packages.debian.org/en/source/experimental/libselinux

Also, Ubuntu almost reflects Debian's release.

* Ubuntu 11.10 ... postgresql-9.1 + libselinux-2.0.98
https://launchpad.net/ubuntu/oneiric/+package/postgresql
https://launchpad.net/ubuntu/oneiric/+source/libselinux

* Ubuntu 12.04 ... postgresql-9.1 + libselinux-2.1.0
https://launchpad.net/ubuntu/precise/+package/postgresql
https://launchpad.net/ubuntu/precise/+source/libselinux

* Ubuntu 12.10 ... postgresql-9.1 + libselinux-2.1.9
https://launchpad.net/ubuntu/quantal/+package/postgresql
https://launchpad.net/ubuntu/quantal/+source/libselinux

Thanks,
-- 
KaiGai Kohei 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CF3+4 (was Re: Parallel query execution)

2013-01-24 Thread Pavan Deolasee

On Wed, Jan 23, 2013 at 10:14 PM, Robert Haas  wrote:

> I think there's been something of a professionalization of PostgreSQL
> development over the last few years.   More and more people are able
> to get paid to work on PostgreSQL as part or in a few cases all of
> their job.  This trend includes me, but also a lot of other people.
> There are certainly good things about this, but I think it increases
> the pressure to get patches committed.

Also many of the new developers who were previously working from
proprietary companies (includes me) may not have seen so long cycles
for feature development, from design to acceptance/rejection. The
problem aggravates because the same developer was previously doing
development at much much faster pace and will find our style very
conservative. Of course, the quality of work might be a notch lower
when you are coding features at that pace and you may introduce
regression and bugs on the way, but that probably gets compensated by
more structured QA and testing that proprietary companies do. And many
of these developers might be working on equally important and mission
critical products even before. I know we are very conscious of our
code quality and product stability, and for the right reasons, but I
wonder the overall productivity will go up if we do the same i.e. have
quick feature acceptance cycles compensated by more QA. The problem is
being a community driven project we attract more developers but very
less QA people.

Would it help to significantly enhance our regression coverage so that
bugs are caught early (assuming that's a problem) ? If so, may be we
should have a month-long QAfest once where every developer is only
writing test cases.

Would it help to have a formal product manager (or a team) that gets
to decide whether we want a particular feature or not ? This might be
similar to our current model of discussion on hackers but more time
bound and with the authority to the team to accept/reject ideas at the
end of the time period and not keeping it vague for later decision
after people have put in a lot of efforts already.

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Setting visibility map in VACUUM's second phase

2013-01-24 Thread Pavan Deolasee

Hi Jeff,

On Thu, Jan 24, 2013 at 2:41 PM, Jeff Janes  wrote:
>
> lazy_vacuum_page now pins and unpins the vmbuffer for each page it marks
> all-visible, which seems like a lot of needless traffic since the next
> vmbuffer is likely to be the same as the previous one.
>

Good idea. Even though the cost of pinning/unpinning may not be high
with respect to the vacuum cost itself, but it seems to be a good idea
because we already do that at other places. Do you have any other
review comments on the patch or I'll fix this and send an updated
patch soon.

Thanks,
Pavan
-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [PATCH 0/3] Work around icc miscompilation

I'm sending three smaller patches for review, which try to fix icc
and pathscale (mis)compilation problems described in my last email.

Not sure if we need to fix the overflow check x + 1 <= 0 (x > 0) at
src/backend/executor/execQual.c:3088, which icc optimizes away, too.

  Fix x + y < x overflow checks
  Fix x + 1 < x overflow checks
  Fix overflow checking in repeat()

 src/backend/utils/adt/float.c |8 
 src/backend/utils/adt/oracle_compat.c |   18 +++---
 src/backend/utils/adt/varbit.c|7 +--
 src/backend/utils/adt/varlena.c   |   10 ++
 src/pl/plpgsql/src/pl_exec.c  |4 ++--
 5 files changed, 24 insertions(+), 23 deletions(-)

On 1/20/13 2:58 AM, Xi Wang wrote:
> Intel's icc and PathScale's pathcc compilers optimize away several
> overflow checks, since they consider signed integer overflow as
> undefined behavior.  This leads to a vulnerable binary.
> 
> Currently we use -fwrapv to disable such (mis)optimizations in gcc,
> but not in other compilers.
> 
> 
> Examples
> 
> 
> 1) x + 1 <= 0 (assuming x > 0).
> 
> src/backend/executor/execQual.c:3088
> 
> Below is the simplified code.
> 
> -
> void bar(void);
> void foo(int this_ndims)
> {
>  if (this_ndims <= 0)
>  return;
>  int elem_ndims = this_ndims;
>  int ndims = elem_ndims + 1;
>  if (ndims <= 0)
>  bar();
> }
> -
> 
> $ icc -S -o - sadd1.c
> ...
> foo:
> # parameter 1: %edi
> ..B1.1:
> ..___tag_value_foo.1:
> ..B1.2:
>  ret
> 
> 2) x + 1 < x
> 
> src/backend/utils/adt/float.c:2769
> src/backend/utils/adt/float.c:2785
> src/backend/utils/adt/oracle_compat.c:1045 (x + C < x)
> 
> Below is the simplified code.
> 
> -
> void bar(void);
> void foo(int count)
> {
>  int result = count + 1;
>  if (result < count)
>  bar();
> }
> -
> 
> $ icc -S -o - sadd2.c
> ...
> foo:
> # parameter 1: %edi
> ..B1.1:
> ..___tag_value_foo.1:
>  ret
> 3) x + y <= x (assuming y > 0)
> 
> src/backend/utils/adt/varbit.c:1142
> src/backend/utils/adt/varlena.c:1001
> src/backend/utils/adt/varlena.c:2024
> src/pl/plpgsql/src/pl_exec.c:1975
> src/pl/plpgsql/src/pl_exec.c:1981
> 
> Below is the simplified code.
> 
> -
> void bar(void);
> void foo(int sp, int sl)
> {
>  if (sp <= 0)
>  return;
>  int sp_pl_sl = sp + sl;
>  if (sp_pl_sl <= sl)
>  bar();
> }
> -
> 
> $ icc -S -o - sadd3.c
> foo:
> # parameter 1: %edi
> # parameter 2: %esi
> ..B1.1:
> ..___tag_value_foo.1:
> ..B1.2:
>  ret
> 
> Possible fixes
> ==
> 
> * Recent versions of icc and pathcc support gcc's workaround option,
> -fno-strict-overflow, to disable some optimizations based on signed
> integer overflow.  It's better to add this option to configure.
> They don't support gcc's -fwrapv yet.
> 
> * This -fno-strict-overflow option cannot help in all cases: it cannot
> prevent the latest icc from (mis)compiling the 1st case.  We could also
> fix the source code by avoiding signed integer overflows, as follows.
> 
> x + y <= 0 (assuming x > 0, y > 0)
> --> x > INT_MAX - y
> 
> x + y <= x (assuming y > 0)
> --> x > INT_MAX - y
> 
> I'd suggest to fix the code rather than trying to work around the
> compilers since the fix seems simple and portable.
> 
> See two recent compiler bugs of -fwrapv/-fno-strict-overflow as well.
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55883
> http://software.intel.com/en-us/forums/topic/358200
> 
> * I don't have access to IBM's xlc compiler.  Not sure how it works for
> the above cases.
> 
> - xi
> 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Event Triggers: adding information

2013-01-24 Thread Dimitri Fontaine

Robert Haas  writes:
> Or maybe we should just silently ignore failures to look up the event
> trigger.  That might be better, because the DBA could always do:
>
> DROP FUNCTION myeventtrgfn() CASCADE;
>
> ...and it would be undesirable for other sessions to error out in that
> case due to SnapshotNow effects.

What about taking a lock on the functions we decide we will need to be
running, maybe a ShareUpdateExclusiveLock, so that the function can not
disappear under us from concurrent activity?

Note to self, most probably using:
LockRelationOid(fnoid, ShareUpdateExclusiveLock);

After all, we might be right not to optimize for DDL concurrency…

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [PATCH 1/3] Fix x + y < x overflow checks

icc optimizes away the overflow check x + y < x (y > 0), because
signed integer overflow is undefined behavior in C.  Instead, use
a safe precondition test x > INT_MAX - y.
---
 src/backend/utils/adt/varbit.c  |7 +--
 src/backend/utils/adt/varlena.c |   10 ++
 src/pl/plpgsql/src/pl_exec.c|4 ++--
 3 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/src/backend/utils/adt/varbit.c b/src/backend/utils/adt/varbit.c
index 1712c12..af69200 100644
--- a/src/backend/utils/adt/varbit.c
+++ b/src/backend/utils/adt/varbit.c
@@ -16,6 +16,8 @@
 
 #include "postgres.h"
 
+#include 
+
 #include "access/htup_details.h"
 #include "libpq/pqformat.h"
 #include "nodes/nodeFuncs.h"
@@ -1138,12 +1140,13 @@ bit_overlay(VarBit *t1, VarBit *t2, int sp, int sl)
ereport(ERROR,
(errcode(ERRCODE_SUBSTRING_ERROR),
 errmsg("negative substring length not 
allowed")));
-   sp_pl_sl = sp + sl;
-   if (sp_pl_sl <= sl)
+   if (sl > INT_MAX - sp)
ereport(ERROR,
(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
 errmsg("integer out of range")));
 
+   sp_pl_sl = sp + sl;
+
s1 = bitsubstring(t1, 1, sp - 1, false);
s2 = bitsubstring(t1, sp_pl_sl, -1, true);
result = bit_catenate(s1, t2);
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 95e41bf..c907f44 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -997,12 +997,13 @@ text_overlay(text *t1, text *t2, int sp, int sl)
ereport(ERROR,
(errcode(ERRCODE_SUBSTRING_ERROR),
 errmsg("negative substring length not 
allowed")));
-   sp_pl_sl = sp + sl;
-   if (sp_pl_sl <= sl)
+   if (sl > INT_MAX - sp)
ereport(ERROR,
(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
 errmsg("integer out of range")));
 
+   sp_pl_sl = sp + sl;
+
s1 = text_substring(PointerGetDatum(t1), 1, sp - 1, false);
s2 = text_substring(PointerGetDatum(t1), sp_pl_sl, -1, true);
result = text_catenate(s1, t2);
@@ -2020,12 +2021,13 @@ bytea_overlay(bytea *t1, bytea *t2, int sp, int sl)
ereport(ERROR,
(errcode(ERRCODE_SUBSTRING_ERROR),
 errmsg("negative substring length not 
allowed")));
-   sp_pl_sl = sp + sl;
-   if (sp_pl_sl <= sl)
+   if (sl > INT_MAX - sp)
ereport(ERROR,
(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
 errmsg("integer out of range")));
 
+   sp_pl_sl = sp + sl;
+
s1 = bytea_substring(PointerGetDatum(t1), 1, sp - 1, false);
s2 = bytea_substring(PointerGetDatum(t1), sp_pl_sl, -1, true);
result = bytea_catenate(s1, t2);
diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index 0ecf651..a9cf6df 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -1972,13 +1972,13 @@ exec_stmt_fori(PLpgSQL_execstate *estate, 
PLpgSQL_stmt_fori *stmt)
 */
if (stmt->reverse)
{
-   if ((int32) (loop_value - step_value) > loop_value)
+   if (loop_value < INT_MIN + step_value)
break;
loop_value -= step_value;
}
else
{
-   if ((int32) (loop_value + step_value) < loop_value)
+   if (loop_value > INT_MAX - step_value)
break;
loop_value += step_value;
}
-- 
1.7.10.4



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [PATCH 2/3] Fix x + 1 < x overflow checks

icc optimizes away x + 1 < x because signed integer overflow is
undefined behavior in C.  Instead, simply check if x is INT_MAX.
---
 src/backend/utils/adt/float.c |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/backend/utils/adt/float.c b/src/backend/utils/adt/float.c
index b73e0d5..344b092 100644
--- a/src/backend/utils/adt/float.c
+++ b/src/backend/utils/adt/float.c
@@ -2764,12 +2764,12 @@ width_bucket_float8(PG_FUNCTION_ARGS)
result = 0;
else if (operand >= bound2)
{
-   result = count + 1;
/* check for overflow */
-   if (result < count)
+   if (count == INT_MAX)
ereport(ERROR,

(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
 errmsg("integer out of 
range")));
+   result = count + 1;
}
else
result = ((float8) count * (operand - bound1) / (bound2 
- bound1)) + 1;
@@ -2780,12 +2780,12 @@ width_bucket_float8(PG_FUNCTION_ARGS)
result = 0;
else if (operand <= bound2)
{
-   result = count + 1;
/* check for overflow */
-   if (result < count)
+   if (count == INT_MAX)
ereport(ERROR,

(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
 errmsg("integer out of 
range")));
+   result = count + 1;
}
else
result = ((float8) count * (bound1 - operand) / (bound1 
- bound2)) + 1;
-- 
1.7.10.4



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [PATCH 3/3] Fix overflow checking in repeat()

icc optimizes away `check + VARHDRSZ <= check' since signed integer
overflow is undefined behavior.  Simplify the overflow check for
`VARHDRSZ + count * slen' as `count > (INT_MAX - VARHDRSZ) / slen'.
---
 src/backend/utils/adt/oracle_compat.c |   18 +++---
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/src/backend/utils/adt/oracle_compat.c 
b/src/backend/utils/adt/oracle_compat.c
index d448088..a85a672 100644
--- a/src/backend/utils/adt/oracle_compat.c
+++ b/src/backend/utils/adt/oracle_compat.c
@@ -15,6 +15,8 @@
  */
 #include "postgres.h"
 
+#include 
+
 #include "utils/builtins.h"
 #include "utils/formatting.h"
 #include "mb/pg_wchar.h"
@@ -1034,20 +1036,14 @@ repeat(PG_FUNCTION_ARGS)
count = 0;
 
slen = VARSIZE_ANY_EXHDR(string);
-   tlen = VARHDRSZ + (count * slen);
 
/* Check for integer overflow */
-   if (slen != 0 && count != 0)
-   {
-   int check = count * slen;
-   int check2 = check + VARHDRSZ;
-
-   if ((check / slen) != count || check2 <= check)
-   ereport(ERROR,
-   
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
-errmsg("requested length too large")));
-   }
+   if (slen != 0 && count > (INT_MAX - VARHDRSZ) / slen)
+   ereport(ERROR,
+   (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+errmsg("requested length too large")));
 
+   tlen = VARHDRSZ + (count * slen);
result = (text *) palloc(tlen);
 
SET_VARSIZE(result, tlen);
-- 
1.7.10.4



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 0/3] Work around icc miscompilation


On 24.01.2013 11:33, Xi Wang wrote:

I'm sending three smaller patches for review, which try to fix icc
and pathscale (mis)compilation problems described in my last email.


These patches look ok at a quick glance, but how do we ensure this kind 
of problems don't crop back again in the future? Does icc give a warning 
about these? Do we have a buildfarm animal that produces the warnings?


If we fix these, can we stop using -frapv on gcc? Is there any way to 
get gcc to warn about these?


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

On Thursday, January 24, 2013 7:42 AM Noah Misch wrote:
> On Wed, Jan 09, 2013 at 11:22:06AM +, Simon Riggs wrote:
> > On 24 December 2012 16:57, Amit Kapila 
> wrote:
> > > Performance: Average of 3 runs of pgbench in tps
> > > 9.3devel  |  with trailing null patch
> > > --+--
> > > 578.9872  |   573.4980
> >
> > On balance, it would seem optimizing for this special case would
> > affect everybody negatively; not much, but enough. Which means we
> > should rekect this patch.
> >
> > Do you have a reason why a different view might be taken?
> 
> We've mostly ignored performance changes of a few percentage points,
> because
> the global consequences of adding or removing code to the binary image
> can
> easily change performance that much.  It would be great to get to the
> point
> where we can reason about 1% performance changes, but we're not there.
> If a
> measured +1% performance change would have yielded a different
> decision, we
> should rethink the decision based on more-robust criteria.

Today, I had taken one more set of readings of original pg_bench which are
below in this mail. The difference is that this set of readings are on Suse
11 and with Shared buffers - 4G
IMO, the changes in this patch are not causing any regression, however it
gives performance/size reduction
in some of the cases as shown by data in previous mails.
So the point to decide is whether the usecases in which it is giving benefit
are worth to have this Patch?
As Tom had already raised some concern's about the code, I think the Patch
can only have merit if the usecase
makes sense to people.

System Configuration: 
Hardware : 4 core (Intel(R) Xeon(R) CPU L5408 @ 2.13GHz) &  RAM : 24GB 
Operating System: Suse-Linux 11.2 x86_64   

Sever Configuration: 
The database cluster will be initialized with locales 
  COLLATE:  C 
  CTYPE:C 
  MESSAGES: en_US.UTF-8 
  MONETARY: en_US.UTF-8 
  NUMERIC:  en_US.UTF-8 
  TIME: en_US.UTF-8 

shared_buffers = 4GB 
checkpoint_segments = 255   
checkpoint_timeout = 10min   

pgbench: 
transaction type: TPC-B (sort of) 
scaling factor: 75 
query mode: simple 
number of clients: 4 
number of threads: 4 
duration: 300 s 

   originalpatch   
 Run-1: 311 312 
 Run-2: 307 302 
 Run-3: 300 312 
 Run-4: 285 295 
 Run-5: 308 305 
  
 Avg  : 302.2  305.2


With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 0/3] Work around icc miscompilation

On 1/24/13 5:02 AM, Heikki Linnakangas wrote:
> These patches look ok at a quick glance, but how do we ensure this kind 
> of problems don't crop back again in the future? Does icc give a warning 
> about these? Do we have a buildfarm animal that produces the warnings?
> 
> If we fix these, can we stop using -frapv on gcc? Is there any way to 
> get gcc to warn about these?

Thanks for reviewing.

gcc has this -Wstrict-overflow option to warn against overflow checks
that may be optimized away.  The result in inaccurate: it may produce
a large number of false warnings, and it may also miss many cases (esp.
when gcc's value-range-propagation fails to compute variables' ranges).

Not sure if other compilers have similar options.

I find these broken checks using a static checker I'm developing, and
only report cases that existing compilers do miscompile.  If you are
interested, I'll post a complete list of overflow checks in pgsql that
invoke undefined behavior and thus may be killed by future compilers.

I believe we can get rid of -fwrapv once we fix all such checks.

- xi

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v4 - Heikki's thoughts about the patch state


On 24.01.2013 00:30, Andres Freund wrote:

Hi,

I decided to reply on the patches thread to be able to find this later.

On 2013-01-23 22:48:50 +0200, Heikki Linnakangas wrote:

"logical changeset generation v4"
This is a boatload of infrastructure for supporting logical replication, yet
we have no code actually implementing logical replication that would go with
this. The premise of logical replication over trigger-based was that it'd be
faster, yet we cannot asses that without a working implementation. I don't
think this can be committed in this state.


Its a fair point that this is a huge amount of code without a user in
itself in-core.
But the reason it got no user included is because several people
explicitly didn't want a user in-core for now but said the first part of
this would be to implement the changeset generation as a separate
piece. Didn't you actually prefer not to have any users of this in-core
yourself?


Yes, I certainly did. But we still need to see the other piece of the 
puzzle to see how this fits with it.


BTW, why does all the transaction reordering stuff has to be in core?

How much of this infrastructure is to support replicating DDL changes? 
IOW, if we drop that requirement, how much code can we slash? Any other 
features or requirements that could be dropped? I think it's clear at 
this stage that this patch is not going to be committed as it is. If you 
can reduce it to a fraction of what it is now, that fraction might have 
a chance. Otherwise, it's just going to be pushed to the next commitfest 
as whole, and we're going to be having the same doubts and discussions then.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Event Triggers: adding information

2013-01-24 Thread Dimitri Fontaine

Robert Haas  writes:
> On Mon, Jan 21, 2013 at 12:27 PM, Dimitri Fontaine
>  wrote:
>>   - Context exposing and filtering
>
> I'm not feeling very sanguine about any of this.  I feel strongly that
> we should try to do something that's more principled than just
> throwing random junk into ProcessUtility.

The goal is to allow for advanced users of that feature to get at the
sequence, constraints and index names that have been picked
automatically by PostgreSQL when doing some CREATE TABLE statement that
involves them:

   CREATE TABLE foo (id serial PRIMARY KEY, f1 text);

Andres and Steve, as you're the next ones to benefit directly from that
whole feature and at different levels, do you have a strong opinion on
whether you really need that to operate at the logical level?

The trick is not implementing the feature (that's quite easy really),
it's about being able to provide it without any API promises, because
it's an opening on the internals.

Maybe the answer is to provide the same level of information as you get
in an Event Trigger to new hooks.

>>   - Additional Information to PL/pgSQL
>>
>> We're talking about :
>>
>>   - operation   (CREATE, ALTER, DROP)
>>   - object type (TABLE, FUNCTION, TEXT SEARCH DICTIONARY, …)
>>   - object OID(when there's a single object target)
>>   - object shema name (when there's a single object target)
>>   - object name   (when there's a single object target)

I realise I omited something important here. The current patch effort is
only adding those information if the target object exists in the
catalogs at the time of the information lookup.

> In ddl_command_start, I believe you can only expose the information
> that is available in the parse tree.  That is, if the user typed
> CREATE TABLE foo, you can tell the event trigger that the user didn't
> specify a schema and that they mentioned the name foo.  You cannot

I used to do that, and that's the reason why I had the Command String
Normalisation in the same patch: both are walking the parsetree to some
extend, and I didn't want to have different routines doing different
walks to get at the information.

> predict what schema foo will actually end up in at that point, and we
> shouldn't try.  I don't have a problem with exposing the information
> we have at this point with the documented caveat that if you rely on
> it to mean something it doesn't you get to keep both pieces.

Fair enough. I retracted from doing that to be able to separate away the
parse tree walking code for later review, and I'm now wondering if
that's really what we want to do.

If auditing, you can log the information after the command has been run
(in case of CREATE and ALTER command) or before the object has been
deleted (in case of a DROP command).

If you want to cancel the whole operation depending on the name of the
object, or the schema it ends up in, then you can do it at the end of
the command's execution with a RAISE EXCEPTION.

> But I wonder: wouldn't it be better to just expose the raw string the
> user typed?  I mean, in all the cases we really care about, the
> information we can *reliably* expose at this point is going to be
> pretty nearly the same as extracting the third space-delimited word
> out of the command text and splitting it on a period.  And you can do
> that much easily even in PL/pgsql.  You could also extract a whole lot

Totally Agreed. That's another reason why I want to provide users with
the Normalized Command String, it will be then be even easier for them
to do what you just say.

After having been working on that for some time, though, I'm leaning
toward Tom's view that you shouldn't try to expose information about
objets that don't exists in the catalogs, because "that's not code,
that's magic".

> of OTHER potentially useful information from the command text that you
> couldn't get if we just exposed the given schema name and object name.
>  For example, this would allow you to write an event trigger that lets
> you enforce a policy that all column names must use Hungarian
> notation.  Note that this is a terrible idea and you shouldn't do it,
> but you could if you wanted to.  And, the trigger would be relatively
> long and complex and you might have bugs, but that's still better than
> the status quo where you are simply out of luck.

You seem to be talking about System Hungarian, not Apps Hungarian, but I
don't think that's the point here.

  http://www.joelonsoftware.com/articles/Wrong.html

> Now, in a ddl_command_end trigger, there's a lot more information that
> you can usefully expose.  In theory, if the command records "what it
> did" somewhere, you can extract that information with as much
> precision as it cared to record.  However, I'm pretty skeptical of the
> idea of exposing a single OID.  I mean, if the "main" use case here is

There's precedent in PostgreSQL: how do you get information about each
row that were in the target from a FOR EACH STATEMENT t

Re: [HACKERS] Event Triggers: adding information

2013-01-24 Thread Dimitri Fontaine

Robert Haas  writes:
> Can you point me specifically at what you have in mind so I can make
> sure I'm considering the right thing?

Something like this part:

+ -- now try something crazy to ensure we don't crash the backend
+ create function test_event_trigger_drop_function()
+  returns event_trigger
+  as $$
+ BEGIN
+ drop function test_event_trigger2() cascade;
+ END
+ $$ language plpgsql;
+ 
+ create function test_event_trigger2() returns event_trigger as $$
+ BEGIN
+ RAISE NOTICE 'test_event_trigger2: % %', tg_event, tg_tag;
+ END
+ $$ language plpgsql;
+ 
+ create event trigger drop_test_a on "ddl_command_start"
+ when tag in ('create table')
+execute procedure test_event_trigger_drop_function();
+ 
+ create event trigger drop_test_b on "ddl_command_start"
+execute procedure test_event_trigger2();
+
+ create table event_trigger_fire1 (a int);
+ 
+ -- then cleanup the crazy test
+ drop event trigger drop_test_a;
+ drop event trigger drop_test_b;
+ drop function test_event_trigger_drop_function();

The only problem for the regression tests being that there's an OID in
the ouput, but with your proposed error message that would go away.

-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v4 - Heikki's thoughts about the patch state

Hi Robert, Hi all,

On 2013-01-23 20:17:04 -0500, Robert Haas wrote:
> On Wed, Jan 23, 2013 at 5:30 PM, Andres Freund  wrote:
> > The only reason the submitted version of logical decoding is
> > comparatively slow is that its xmin update policy is braindamaged,
> > working on that right now.
> 
> I agree.  The thing that scares me about the logical replication stuff
> is not that it might be slow (and if your numbers are to be believed,
> it isn't), but that I suspect it's riddled with bugs and possibly some
> questionable design decisions.  If we commit it and release it, then
> we're going to be stuck maintaining it for a very, very long time.  If
> it turns out to have serious bugs that can't be fixed without a new
> major release, it's going to be a serious black eye for the project.

Thats way much more along the lines of what I am afraid of than the
performance stuff - but Heikki cited those, so I replied to that.

Note that I didn't say this must, must go in - I just don't think
Heikki's reasoning about why not hit the nail on the head.

> Of course, I have no evidence that that will happen.  But it is a
> really big piece of code, and therefore unless you are superman, it's
> probably got a really large number of bugs.  The scary thing is that
> it is not as if we can say, well, this is a big hunk of code, but it
> doesn't really touch the core of the system, so if it's broken, it'll
> be broken itself, but it won't break anything else.  Rather, this code
> is deeply in bed with WAL, with MVCC, and with the on-disk format of
> tuples, and makes fundamental changes to the first two of those.  You
> agreed with Tom that 9.2 is the buggiest release in recent memory, but
> I think logical replication could easily be an order of magnitude
> worse.

I tried very, very hard to get the basics of the design & interface
solid. Which obviously doesn't man I am succeeding - luckily not being
superhuman after all ;). And I think thats very much where input is
desparetely needed and where I failed to raise enough attention. The
"output plugin" interface follewed by the walsender interface is what
needs to be most closely vetted.
Those are the permanent, user/developer exposed UI and the one we should
try to keep as stable as possible.

The output plugin callbacks are defined here:
http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=blob;f=src/include/replication/output_plugin.h;hb=xlog-decoding-rebasing-cf4
To make it more agnostic of the technology to implement changeset
extraction we possibly should replace the ReorderBuffer(TXN|Change)
structs being passed by something more implementation agnostic.

walsender interface:
http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=blob;f=src/backend/replication/repl_gram.y;hb=xlog-decoding-rebasing-cf4
The interesting new commands are:
1) K_INIT_LOGICAL_REPLICATION NAME NAME
2) K_START_LOGICAL_REPLICATION NAME RECPTR plugin_options
3) K_FREE_LOGICAL_REPLICATION NAME

1 & 3 allocate (respectively free) the permanent state associated with
one changeset consumer whereas START_LOGICAL_REPLICATION streams out
changes starting at RECPTR.

Btw, there are currently *no* changes to the wal format at all if
wal_format < logical except that xl_running_xacts are logged more
frequently which obviously could easily be made conditional. Baring bugs
of course.
The changes with wal_level>=logical aren't that big either imo:
* heap_insert, heap_update prevent full page writes from removing their
  normal record by using a separate XLogRecData block for the buffer and
  the record
* heap_delete adds more data (the pkey of the tuple) after the unchanged
  xl_heap_delete struct
* On changes to catalog tables (relfilenode, tid, cmin, cmax) are logged.

No changes to mvcc for normal backends at all, unless you count the very
slightly changed *Satisfies interface (getting passed a HeapTuple
instead of HeapTupleHeader).

I am not sure what you're concerned about WRT the on-disk format of the
tuples? We are pretty much nailed down on that due to pg_upgrade'ability
anyway and it could be changed from this patches POV without a problem,
the output plugin just sees normal HeapTuples? Or are you concerned
about the code extracting them from the xlog records?

So I think the "won't break anything else" argument can be made rather
fairly if the heapam.c changes, which aren't that complex, are vetted
closely.

Now, the disucssion about all the code thats active *during* decoding is
something else entirely :/

> You
> agreed with Tom that 9.2 is the buggiest release in recent memory, but
> I think logical replication could easily be an order of magnitude
> worse.

I unfortunately think that not providing more builtin capabilities in
this area also has significant dangers. Imo this is one of the weakest,
or even the weakest, area of postgres.

I personally have the impression that just about nobody did actual beta
testing of the lastest releases, especially 9.2, and t

Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review]

On Thursday, January 24, 2013 1:56 AM Andres Freund wrote:
> On 2013-01-22 12:32:07 +, Amit kapila wrote:
> > This closes all comments raised till now for this patch.
> > Kindly let me know if you feel something is missing?
> 
> I am coming late to this patch, so bear with me if I repeat somethign
> said elsewhere.

Thanks for looking at patch.

> Review comments of cursory pass through the patch:
> * most comments are hard to understand. I know the problem of that
>   being hard for a non-native speaker by heart, but I think another
> pass
>   over them would be good thing.

I shall give another pass to improve the comments in code.

> * The gram.y changes arround set_rest_(more|common) seem pretty
> confused
>   to me. 

>E.g. its not possible anymore to set the timezone for a  function. 

What do you exactly mean by this part of comment.

> And why is it possible to persistently set the search path,
>   but not client encoding? Why is FROM CURRENT in set_rest_more?

I think the main reason was some of the commands can work only in
transaction 
and as SET Persistent cannot work inside transaction block, so I had kept
some seggregation. 
I shall reply you the exact reason in separate mail.

> * set_config_file should elog(ERROR), not return on an unhandled
>   setstmt->kind

Shall fix it.

> * why are you creating AutoConfFileName if its not stat'able? It seems
>   better to simply skip parsing the old file in that case

Sure, it's better to handle as you are suggesting.

> * Writing the temporary file to .$pid seems like a bad idea, better use
>   one file for that, SET PERSISTENT is protected by an exclusive lock
>   anyway.

  I think we can use one temporary file, infact that was one of the ways I
have asked in one of the previous mails. 
  However Tom and Zoltan felt this is better way to do it. 
  I can think of one reason where it will be better to have .$pid, and that
is if some one has opened the 
  file manually, then all other sessions can fail (in WINDOWS). Infact this
is one of test Zoltan had performed.

> * the write sequence should be:
>   * fsync(tempfile)
>   * fsync(directory)
>   * rename(tempfile, configfile)
>   * fsync(configfile)
>   * fsync(directory)

Why do we need fsync(directory) and fsync(configfile)?

> * write_auto_conf_file should probably escape quoted values?
  Can you please elaborate more, I am not able to understand your point?

> * coding style should be adhered to more closesly, there are many
>   if (pointer) which should be if (pointer != NULL), 

Are you pointing in function  validate_conf_option(const char *name, char
*value) 
  for below usage: 
  + if (value)

> single-line blocks
>   enclosed in curlies which shouldn't, etc.
Shall fix.

> * replace_auto_config_file and surrounding functions need more comments
>   in the header

  Sure, shall add more comments in function header of following functions: 
  validate_conf_option() 
  RemoveAutoConfTmpFiles() 
  write_auto_conf_file() 
  replace_auto_config_value()

> * the check that prevents persistent SETs in a transaction should
> rather
>   be in utility.c and use PreventTransactionChain like most of the
>   others that need to do that (c.f. T_CreatedbStmt).

  We can move the check in utility.c, but if we use PreventTransactionChain,
then it will be disallowed 
  in functions as well. But the original idea was to not support in
transaction blocks only. 
  So I feel use of current function IsTransactionBlock() should be okay.

With Regards,
Amit Kapila.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CF3+4 (was Re: Parallel query execution)

On Thursday, January 24, 2013 2:58 AM Stephen Frost wrote:
> Heikki,
> 
> * Heikki Linnakangas (hlinnakan...@vmware.com) wrote:
> > FWIW, here's how I feel about some the patches. It's not an
> exhaustive list.
> 
> Thanks for going through them and commenting on them.
> 

> > "built-in/SQL Command to edit the server configuration file
> > (postgresql.conf)"
> > I think this should be a pgfoundry project, not in core. At least for
> now.
> 
> I don't think it would ever get deployed or used then..

  I think for this feature there has been lot of test already done and
reviewed by quite a few people.
  About the feature's value and high level design, there is quite a lot of
discuss already happened.
  At this point, there is no design level concern for this patch, there are
few issues which can be dealt
  with minor changes.
  Do you see any lack of importance for this feature?

With Regards,
Amit Kapila.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v4 - Heikki's thoughts about the patch state

One random thing that caught my eye in the patch, I though I'd mention 
it while I still remember: In heap_delete, you call heap_form_tuple() in 
a critical section. That's a bad idea, because if it runs out of memory 
-> PANIC.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v4 - Heikki's thoughts about the patch state

On 2013-01-24 12:38:25 +0200, Heikki Linnakangas wrote:
> On 24.01.2013 00:30, Andres Freund wrote:
> >Hi,
> >
> >I decided to reply on the patches thread to be able to find this later.
> >
> >On 2013-01-23 22:48:50 +0200, Heikki Linnakangas wrote:
> >>"logical changeset generation v4"
> >>This is a boatload of infrastructure for supporting logical replication, yet
> >>we have no code actually implementing logical replication that would go with
> >>this. The premise of logical replication over trigger-based was that it'd be
> >>faster, yet we cannot asses that without a working implementation. I don't
> >>think this can be committed in this state.
> >
> >Its a fair point that this is a huge amount of code without a user in
> >itself in-core.
> >But the reason it got no user included is because several people
> >explicitly didn't want a user in-core for now but said the first part of
> >this would be to implement the changeset generation as a separate
> >piece. Didn't you actually prefer not to have any users of this in-core
> >yourself?
>
> Yes, I certainly did. But we still need to see the other piece of the puzzle
> to see how this fits with it.

Fair enough. I am also working on a user of this infrastructure but that
doesn't help you very much. Steve Singer seemed to make some stabs at
writing an output plugin as well. Steve, how far did you get there?

> BTW, why does all the transaction reordering stuff has to be in core?

It didn't use to, but people argued pretty damned hard that no undecoded
data should ever allowed to leave the postgres cluster. And to be fair
it makes writing an output plugin *way* much easier. Check
http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=blob;f=contrib/test_decoding/test_decoding.c;hb=xlog-decoding-rebasing-cf4
If you skip over tuple_to_stringinfo(), which is just pretty generic
scaffolding for converting a whole tuple to a string, writing out the
changes in some format by now is pretty damn simple.

> How much of this infrastructure is to support replicating DDL changes? IOW,
> if we drop that requirement, how much code can we slash?

Unfortunately I don't think too much unless we add in other code that
allows us to check whether the current definition of a table is still
the same as it was back when the tuple was logged.

> Any other features or requirements that could be dropped? I think it's clear 
> at this stage that
> this patch is not going to be committed as it is. If you can reduce it to a
> fraction of what it is now, that fraction might have a chance. Otherwise,
> it's just going to be pushed to the next commitfest as whole, and we're
> going to be having the same doubts and discussions then.

One thing that reduces complexity is to declare the following as
unsupported:
- CREATE TABLE foo(data text);
- DECODE UP TO HERE;
- INSERT INTO foo(data) VALUES(very-long-to-be-externally-toasted-tuple);
- DROP TABLE foo;
- DECODE UP TO HERE;

but thats just a minor thing.

I think what we can do more realistically than to chop of required parts
of changeset extraction is to start applying some of the preliminary
patches independently:
- the relmapper/relfilenode changes + pg_relation_by_filenode(spc,
  relnode) should be independently committable if a bit boring
- allowing walsenders to connect to a database possibly needs an interface 
change
  but otherwise it should be fine to go in independently. It also has
  other potential use-cases, so I think thats fair.
- logging xl_running_xact's more frequently could also be committed
  independently and makes sense independently as it allows a standby to
  enter HS faster if the master is busy
- Introducing InvalidCommandId should be relatively uncontroversial. The
  fact that no invalid value for command ids exists is imo an oversight
- the *Satisfies change could be applied and they are imo ready but
  there's no use-case for it without the rest, so I am not sure whether
  theres a point
- currently not separately available, but we could add wal_level=logical
  independently. There would be no user of it, but it would be partial
  work. That includes the relcache support for keeping track of the
  primary key which already is available separately.


Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] My first patch! (to \df output)

2013-01-24 Thread Phil Sorber

On Thu, Jan 24, 2013 at 2:27 AM, Craig Ringer  wrote:
> On 01/24/2013 01:50 AM, Phil Sorber wrote:
>> This looks good to me now. Compiles and works as described.
> Ready to go?
>
> https://commitfest.postgresql.org/action/patch_view?id=1008
>

I guess I wasn't ready to be so bold, but sure. :) I changed it to
'ready for committer'.

>
> --
>  Craig Ringer   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: logical changeset generation v4 - Heikki's thoughts about the patch state

On 2013-01-24 13:28:18 +0200, Heikki Linnakangas wrote:
> One random thing that caught my eye in the patch, I though I'd mention it
> while I still remember: In heap_delete, you call heap_form_tuple() in a
> critical section. That's a bad idea, because if it runs out of memory ->
> PANIC.

Good point, will fix.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: Proposal for Allow postgresql.conf values to be changed via SQL [review]

On 2013-01-24 16:45:42 +0530, Amit Kapila wrote:
> > * The gram.y changes arround set_rest_(more|common) seem pretty
> > confused
> >   to me.
>
> >E.g. its not possible anymore to set the timezone for a function.
>
> What do you exactly mean by this part of comment.

The set_rest_more production is used in FunctionSetResetClause and youve
removed capabilities from it.

> > And why is it possible to persistently set the search path,
> >   but not client encoding? Why is FROM CURRENT in set_rest_more?
>
> I think the main reason was some of the commands can work only in
> transaction  and as SET Persistent cannot work inside transaction block, so I 
> had kept
> some seggregation.

Yes, I can see reasons for doing this, I just think the split isn't
correct as youve done it.

> > * Writing the temporary file to .$pid seems like a bad idea, better use
> >   one file for that, SET PERSISTENT is protected by an exclusive lock
> >   anyway.
>
>   I think we can use one temporary file, infact that was one of the ways I
> have asked in one of the previous mails.
>   However Tom and Zoltan felt this is better way to do it.

The have? I didn't read it like that. The file can only ever written by
a running postmaster and we already have code that ensures that. There's
absolutely no need for the tempfile to have a nondeterministic
name. That makes cleanup way easier as well.

>   I can think of one reason where it will be better to have .$pid, and that
> is if some one has opened the
>   file manually, then all other sessions can fail (in WINDOWS). Infact this
> is one of test Zoltan had performed.

Why on earth should somebody have the tempfile open? There's an
exclusive lock around writing the file in your code and if anybody
interferes like that with postgres' temporary data you have far bigger
problems than SET PERSISTENT erroring out.

> > * the write sequence should be:
> >   * fsync(tempfile)
> >   * fsync(directory)
> >   * rename(tempfile, configfile)
> >   * fsync(configfile)
> >   * fsync(directory)
>
> Why do we need fsync(directory) and fsync(configfile)?

So they don't vanish /get corrupted after a crash? The above is the
canonically safe way to safely write a file without an invalid file ever
being visible.

> > * write_auto_conf_file should probably escape quoted values?
>   Can you please elaborate more, I am not able to understand your
>   point?

You do something like (don't have the code right here)
if (quoted)
   appendStringInfo("= \"%s\"", value).
what happens if value contains a "?

> > * coding style should be adhered to more closesly, there are many
> >   if (pointer) which should be if (pointer != NULL),
>
> Are you pointing in function  validate_conf_option(const char *name, char  
> *value)
>   for below usage:
>   + if (value)

For example, yes.

> > * the check that prevents persistent SETs in a transaction should
> > rather
> >   be in utility.c and use PreventTransactionChain like most of the
> >   others that need to do that (c.f. T_CreatedbStmt).
>
>   We can move the check in utility.c, but if we use PreventTransactionChain,
>   then it will be disallowed in functions as well. But the original idea was 
> to not support in
>   transaction blocks only.
>   So I feel use of current function IsTransactionBlock() should be okay.

I don't think its even remotely safe to allow doing this from a
function. Doing it there explicitly allows doing it in a transaction.

Should we find out its safe, fine, relaxing restrictions lateron is no
problem, imposing ones after introducing a feature is.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [sepgsql 1/3] add name qualified creation label

2013-01-24 Thread Magnus Hagander

On Thu, Jan 24, 2013 at 10:11 AM, Kohei KaiGai  wrote:
> 2013/1/24 Tom Lane :
>> John R Pierce  writes:
>>> On 1/23/2013 8:32 PM, Tom Lane wrote:
 FWIW, in Fedora-land I see: ...
>>
>>> I'd be far more interested in what is in RHEL and CentOS.Fedora,
>>> with its 6 month obsolescence cycle, is of zero interest to me for
>>> deploying database servers.
>>
>> But of course Fedora is also the upstream that will become RHEL7
>> and beyond.

Do we know which version of Fedora will become RHEL7, and thus, which
version of libselinux will go in RHEL7? (And do we know which version
of postgres will go in RHEL7, assuming release schedules hold)

>> It might be that the update timing makes a bigger difference in some
>> other distros, though.  To return to Heikki's original point about
>> Debian, what are they shipping today?
>>
> Even though I'm not good at release cycle of Debian, I tried to check
> the shipped version of postgresql and libselinux for stable, testing,
> unstable and experimental release.
> I'm not certain why they don't push postgresql-9.2 into experimental
> release yet. However, it seems to me optimistic libselinux-2.1.10 being
> bundled on the timeline of postgresql-9.3.
>
> If someone familiar with Debian's release cycle, I'd like to see the 
> suggestion.
>
> * Debian (stable) ... postgresql-8.4 + libselinux-2.0.96
> http://packages.debian.org/en/squeeze/postgresql
> http://packages.debian.org/en/source/squeeze/libselinux
>
> * Debian (testing) ... postgresql-9.1 + libselinux-2.1.9
> http://packages.debian.org/en/wheezy/postgresql
> http://packages.debian.org/en/source/wheezy/libselinux

Just as a note, wheezy is the version that will be the next debian
stable, and it's in freeze since quite a while back. So we can safely
expect it will be 2.1.9 that's included in the next debian stable.


--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BUG #6510: A simple prompt is displayed using wrong charset

On Thu, Jan 24, 2013 at 03:45:36PM +0800, Craig Ringer wrote:
> On 01/24/2013 01:34 AM, Tom Lane wrote:
> > Alexander Law  writes:
> >> Please let me know if I can do something to get the bug fix 
> >> (https://commitfest.postgresql.org/action/patch_view?id=902) committed.
> > It's waiting on some Windows-savvy committer to pick it up, IMO.
> I'm no committer, but I can work with Windows and know text encoding
> issues fairly well. I'll take a look, though I can't do it immediately
> as I have some other priority work. Should be able to in the next 24h.

Feel free, but it's already Ready for Committer.  Also, Andrew has said he
plans to pick it up.  If in doubt, I think this patch is relatively solid in
terms of non-committer review.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BUG #6510: A simple prompt is displayed using wrong charset

2013-01-24 Thread Craig Ringer

On 01/24/2013 07:57 PM, Noah Misch wrote:
> On Thu, Jan 24, 2013 at 03:45:36PM +0800, Craig Ringer wrote:
>> On 01/24/2013 01:34 AM, Tom Lane wrote:
>>> Alexander Law  writes:
 Please let me know if I can do something to get the bug fix 
 (https://commitfest.postgresql.org/action/patch_view?id=902) committed.
>>> It's waiting on some Windows-savvy committer to pick it up, IMO.
>> I'm no committer, but I can work with Windows and know text encoding
>> issues fairly well. I'll take a look, though I can't do it immediately
>> as I have some other priority work. Should be able to in the next 24h.
> Feel free, but it's already Ready for Committer.  Also, Andrew has said he
> plans to pick it up.  If in doubt, I think this patch is relatively solid in
> terms of non-committer review.
I'm happy with that; I'll look elsewhere. Thanks.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] \d failing to find uppercased object names

2013-01-24 Thread Nikolay Samokhvalov

Hello,

some app created tables ussing uppercase letters in table names (this app
was migrated from mysql). One tries to use \d to search tables info:

ru=# \d pnct_Board
Did not find any relation named "pnct_Board".

Of course, it works with ":
ru=# \d "pnct_Board"
 Table "public.pnct_Board"
   Column|   Type   | Modifiers

-+--+---
 id  | bigint   | not null default
nextval('"pnct_Board_id_seq"'::regclass)
<...>


Could this be considered as gotcha?

The thing is that Postgres reply is pretty much clear:
 > Did not find any relation named "pnct_Board".
, but table "pnct_Board" does exist in the system, so Postgres' reply is
obviously incorrect.

Version: 9.2.1

Re: [HACKERS] \d failing to find uppercased object names

2013-01-24 Thread Виктор Егоров

2013/1/24 Nikolay Samokhvalov :
> some app created tables ussing uppercase letters in table names (this app
> was migrated from mysql). One tries to use \d to search tables info:
>
> ...
>
> Could this be considered as gotcha?
>
> The thing is that Postgres reply is pretty much clear:
>  > Did not find any relation named "pnct_Board".
> , but table "pnct_Board" does exist in the system, so Postgres' reply is
> obviously incorrect.

Take a look at this section in the docs:
http://www.postgresql.org/docs/current/interactive/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS

This is expected behavior.


-- 
Victor Y. Yegorov


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_ctl idempotent option

On Thu, Jan 24, 2013 at 09:05:59AM +0530, Ashutosh Bapat wrote:
> I agree, answering the question, whether the particular attempt of
> starting a server succeeded or not, will need the current behaviour.
> Now, question is which of these behaviours should be default?

That would work.  pg_upgrade knows the cluster version at that point and
can use the proper flag.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review]

On Thursday, January 24, 2013 5:25 PM Andres Freund wrote:
> On 2013-01-24 16:45:42 +0530, Amit Kapila wrote:
> > > * The gram.y changes arround set_rest_(more|common) seem pretty
> > > confused
> > >   to me.
> >
> > >E.g. its not possible anymore to set the timezone for a function.
> >
> > What do you exactly mean by this part of comment.
> 
> The set_rest_more production is used in FunctionSetResetClause and
> youve
> removed capabilities from it.

set_rest_more has set_reset_common, so there should be no problem.

set_rest_more:/* Generic SET syntaxes: */ 
set_rest_common 
| var_name FROM CURRENT_P 
{ 
VariableSetStmt *n =
makeNode(VariableSetStmt); 
n->kind = VAR_SET_CURRENT; 
n->name = $1; 
$$ = n; 
} 

> > > And why is it possible to persistently set the search path,
> > >   but not client encoding? Why is FROM CURRENT in set_rest_more?
> >
> > I think the main reason was some of the commands can work only in
> > transaction  and as SET Persistent cannot work inside transaction
> block, so I had kept
> > some seggregation.
> 
> Yes, I can see reasons for doing this, I just think the split isn't
> correct as youve done it.
> 
> > > * Writing the temporary file to .$pid seems like a bad idea, better
> use
> > >   one file for that, SET PERSISTENT is protected by an exclusive
> lock
> > >   anyway.
> >
> >   I think we can use one temporary file, infact that was one of the
> ways I
> > have asked in one of the previous mails.
> >   However Tom and Zoltan felt this is better way to do it.

> The have? I didn't read it like that. The file can only ever written by
> a running postmaster and we already have code that ensures that.
> There's
> absolutely no need for the tempfile to have a nondeterministic
> name. That makes cleanup way easier as well.

Sure, I understand that cleaning will be easier. So IIUC you are suggesting,
we should just keep
name as postgresql.auto.conf.tmp

In general, please read the below mail where it has been suggested to use
.$pid
http://www.postgresql.org/message-id/28379.1358541...@sss.pgh.pa.us


> >   I can think of one reason where it will be better to have .$pid,
> and that
> > is if some one has opened the
> >   file manually, then all other sessions can fail (in WINDOWS).
> Infact this
> > is one of test Zoltan had performed.
> 
> Why on earth should somebody have the tempfile open? There's an
> exclusive lock around writing the file in your code and if anybody
> interferes like that with postgres' temporary data you have far bigger
> problems than SET PERSISTENT erroring out.

I am also not sure, but may be some people do for test purpose. 


> > > * the write sequence should be:
> > >   * fsync(tempfile)
> > >   * fsync(directory)
> > >   * rename(tempfile, configfile)
> > >   * fsync(configfile)
> > >   * fsync(directory)
> >
> > Why do we need fsync(directory) and fsync(configfile)?
> 
> So they don't vanish /get corrupted after a crash? The above is the
> canonically safe way to safely write a file without an invalid file
> ever
> being visible.

Do you think there will be problem if we just use as below: 
* fsync(tempfile)
* rename(tempfile, configfile)

I have seen such kind of use elsewhere also in code (writeTimeLineHistory())

> > > * write_auto_conf_file should probably escape quoted values?
> >   Can you please elaborate more, I am not able to understand your
> >   point?
> 
> You do something like (don't have the code right here)
> if (quoted)
>appendStringInfo("= \"%s\"", value).
> what happens if value contains a "?

I think this case is missed. I shall take care to handle the case when value
contains such value.


> > > * coding style should be adhered to more closesly, there are many
> > >   if (pointer) which should be if (pointer != NULL),
> >
> > Are you pointing in function  validate_conf_option(const char *name,
> char  *value)
> >   for below usage:
> >   + if (value)
> 
> For example, yes.

Okay, I shall fix these, but I have seen such code at other places
(set_config_option), that's why I have kept it like how it is currently.

> 
> > > * the check that prevents persistent SETs in a transaction should
> > > rather
> > >   be in utility.c and use PreventTransactionChain like most of the
> > >   others that need to do that (c.f. T_CreatedbStmt).
> >
> >   We can move the check in utility.c, but if we use
> PreventTransactionChain,
> >   then it will be disallowed in functions as well. But the original
> idea was to not support in
> >   transaction blocks only.
> >   So I feel use of current function IsTransactionBlock() should be
> okay.
> 
> I don't think its even remotely safe to allow doing this from a
> function. Doing it there explic

[HACKERS] Re: Proposal for Allow postgresql.conf values to be changed via SQL [review]

On 2013-01-24 18:37:29 +0530, Amit Kapila wrote:
> On Thursday, January 24, 2013 5:25 PM Andres Freund wrote:
> > On 2013-01-24 16:45:42 +0530, Amit Kapila wrote:
> > > > * The gram.y changes arround set_rest_(more|common) seem pretty
> > > > confused
> > > >   to me.
> > >
> > > >E.g. its not possible anymore to set the timezone for a function.
> > >
> > > What do you exactly mean by this part of comment.
> > 
> > The set_rest_more production is used in FunctionSetResetClause and
> > youve
> > removed capabilities from it.
> 
> set_rest_more has set_reset_common, so there should be no problem.
> 
> set_rest_more:/* Generic SET syntaxes: */ 
> set_rest_common 
> | var_name FROM CURRENT_P 
> { 
> VariableSetStmt *n = 
> makeNode(VariableSetStmt); 
> n->kind = VAR_SET_CURRENT; 
> n->name = $1; 
> $$ = n; 
> } 

True. I still think that the split youve made isn't right as-is.

> > > > * Writing the temporary file to .$pid seems like a bad idea, better use
> > > >   one file for that, SET PERSISTENT is protected by an exclusive lock
> > > >   anyway.
> > >
> > >   I think we can use one temporary file, infact that was one of the ways I
> > > have asked in one of the previous mails.
> > >   However Tom and Zoltan felt this is better way to do it.

> > The have? I didn't read it like that. The file can only ever written by
> > a running postmaster and we already have code that ensures that.
> > There's
> > absolutely no need for the tempfile to have a nondeterministic
> > name. That makes cleanup way easier as well.
> 
> Sure, I understand that cleaning will be easier. So IIUC you are suggesting,
> we should just keep
> name as postgresql.auto.conf.tmp
> 
> In general, please read the below mail where it has been suggested to use
> .$pid
> http://www.postgresql.org/message-id/28379.1358541...@sss.pgh.pa.us

That was in the context of your use of mkstemp() as far as I read
it. We use constantly named temp files in other locations as well.


> > Why on earth should somebody have the tempfile open? There's an
> > exclusive lock around writing the file in your code and if anybody
> > interferes like that with postgres' temporary data you have far bigger
> > problems than SET PERSISTENT erroring out.
> 
> I am also not sure, but may be some people do for test purpose. 

That seems like an completely pointless reasoning.

> > > > * the write sequence should be:
> > > >   * fsync(tempfile)
> > > >   * fsync(directory)
> > > >   * rename(tempfile, configfile)
> > > >   * fsync(configfile)
> > > >   * fsync(directory)
> > >
> > > Why do we need fsync(directory) and fsync(configfile)?
> > 
> > So they don't vanish /get corrupted after a crash? The above is the
> > canonically safe way to safely write a file without an invalid file
> > ever
> > being visible.
> 
> Do you think there will be problem if we just use as below: 
> * fsync(tempfile)
> * rename(tempfile, configfile)
> 
> I have seen such kind of use elsewhere also in code (writeTimeLineHistory())

Yea, there are some places where the code isn't entirely safe. No reason
to introduce more of those.
> > > > * the check that prevents persistent SETs in a transaction should
> > > > rather
> > > >   be in utility.c and use PreventTransactionChain like most of the
> > > >   others that need to do that (c.f. T_CreatedbStmt).
> > >
> > >   We can move the check in utility.c, but if we use
> > PreventTransactionChain,
> > >   then it will be disallowed in functions as well. But the original
> > idea was to not support in
> > >   transaction blocks only.
> > >   So I feel use of current function IsTransactionBlock() should be
> > okay.
> > 
> > I don't think its even remotely safe to allow doing this from a
> > function. Doing it there explicitly allows doing it in a transaction.
> 
> As SET command is allowed inside a function, so I don't think without any
> reason we should disallow SET PERSISTENT in function. 
> The reason why it's not allowed in transaction is that we have to delete the
> temporary file in transaction end or at rollback which could have made the
> logic much more complex.

Yea, and allowing use in functions doesn't help you with that at all:

Consider the following plpgsql function:

$$
BEGIN
SET PERSISTENT foo = 'bar';
RAISE ERROR 'blub';
END;
$$

Now you have a SET PERSISTENT that succeeded although the transaction
failed.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Proposal for Allow postgresql.conf values to be changed via SQL [review]

2013-01-24 Thread Alvaro Herrera

Amit Kapila escribió:
> On Wednesday, January 23, 2013 11:51 PM Alvaro Herrera wrote:

> > Yes it is -- the /etc/postgresql// directory (where
> > postgresql.conf resides) is owned by user postgres.
> 
> So in that case we can consider postgresql.auto.conf to be in path w.r.t
> postgresql.conf instead of data directory.

That seems sensible to me.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BUG #6510: A simple prompt is displayed using wrong charset



On 01/24/2013 03:42 AM, Craig Ringer wrote:

On 01/24/2013 01:06 AM, Alexander Law wrote:

Hello,
Please let me know if I can do something to get the bug fix 
(https://commitfest.postgresql.org/action/patch_view?id=902) committed.
I would like to fix other bugs related to postgres localization, but 
I am not sure yet how to do it.


For anyone looking for the history, the 1st post on this topic is here:

http://www.postgresql.org/message-id/e1s3twb-0004oy...@wrigleys.postgresql.org



Yeah.

I'm happy enough with this patch. ISTM it's really a bug and should be 
backpatched, no?


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Back-branch update releases coming in a couple weeks

2013-01-24 Thread MauMau


From: "Fujii Masao" 

On Thu, Jan 24, 2013 at 7:42 AM, MauMau  wrote:
I searched through PostgreSQL mailing lists with "WAL contains references 
to

invalid pages", and i found 19 messages.  Some people encountered similar
problem.  There were some discussions regarding those problems (Tom and
Simon Riggs commented), but those discussions did not reach a solution.

I also found a discussion which might relate to this problem.  Does this 
fix

the problem?

[BUG] lag of minRecoveryPont in archive recovery
http://www.postgresql.org/message-id/20121206.130458.170549097.horiguchi.kyot...@lab.ntt.co.jp


Yes. Could you check whether you can reproduce the problem on the
latest REL9_2_STABLE?


I tried to produce the problem by doing "pg_ctl stop -mi" against the 
primary more than ten times on REL9_2_STABLE, but the problem did not 
appear.  However, I encountered the crash only once out of dozens of 
failovers, possibly more than a hundred times, on PostgreSQL 9.1.6.  So, I'm 
not sure the problem is fixed in REL9_2_STABLE.


I'm wondering if the fix discussed in the above thread solves my problem.  I 
found the following differences between Horiguchi-san's case and my case:


(1)
Horiguchi-san says the bug outputs the message:

WARNING:  page 0 of relation base/16384/16385 does not exist

On the other hand, I got the message:

WARNING:  page 506747 of relation base/482272/482304 was uninitialized


(2)
Horiguchi-san produced the problem when he shut the standby immediately and 
restarted it.  However, I saw the problem during failover.



(3)
Horiguchi-san did not use any index, but in my case the WARNING message 
refers to an index.



But there's a similar point.  Horiguchi-san says the problem occurs after 
DELETE+VACUUM.  In my case, I shut the primary down while the application 
was doing INSERT/UPDATE.  As the below messages show, some vacuuming was 
running just before the immediate shutdown:


...
LOG:  automatic vacuum of table "GOLD.scm1.tbl1": index scans: 0
pages: 0 removed, 36743 remain
tuples: 0 removed, 73764 remain
system usage: CPU 0.09s/0.11u sec elapsed 0.66 sec
LOG:  automatic analyze of table "GOLD.scm1.tbl1" system usage: CPU 
0.00s/0.14u sec elapsed 0.32 sec

LOG:  automatic vacuum of table "GOLD.scm1.tbl2": index scans: 0
pages: 0 removed, 12101 remain
tuples: 40657 removed, 44142 remain system usage: CPU 0.06s/0.06u sec 
elapsed 0.30 sec
LOG:  automatic analyze of table "GOLD.scm1.tbl2" system usage: CPU 
0.00s/0.06u sec elapsed 0.14 sec

LOG:  received immediate shutdown request
...


Could you tell me the details of the problem discussed and fixed in the 
upcoming minor release?  I would to like to know the phenomenon and its 
conditions, and whether it applies to my case.


Regards
MauMau




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 1/3] Fix x + y < x overflow checks

2013-01-24 Thread Claudio Freire

On Thu, Jan 24, 2013 at 6:36 AM, Xi Wang  wrote:
> icc optimizes away the overflow check x + y < x (y > 0), because
> signed integer overflow is undefined behavior in C.  Instead, use
> a safe precondition test x > INT_MAX - y.


I should mention gcc 4.7 does the same, and it emits a warning.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Passing connection string to pg_basebackup

2013-01-24 Thread Hari Babu

On Tue, Jan 22, 2013 3:27 PM Hari Babu wrote:
>On Saturday, January 19, 2013 5:49 PM Magnus Hagander wrote:
>>On Fri, Jan 18, 2013 at 1:05 PM, Heikki Linnakangas
>> wrote:
>>> On 18.01.2013 13:41, Amit Kapila wrote:

 On Friday, January 18, 2013 3:46 PM Heikki Linnakangas wrote:
>
> On 18.01.2013 08:50, Amit Kapila wrote:
 So to solve this problem below can be done:
 1. Support connection string in pg_basebackup and mention keepalives or
 connection_timeout
 2. Support recv_timeout separately to provide a way to users who are
not
 comfortable tcp keepalives

 a. 1 can be done alone
 b. 2 can be done alone
 c. both 1 and 2.
>>>
>>>
>>> Right. Let's do just 1 for now. An general application level, non-TCP,
>>> keepalive message at the libpq level might be a good idea, but that's a
much
>>> larger patch, definitely not 9.3 material.
>>
>>+1 for doing 1 now. But actually, I think we can just keep it that way
>>in the future as well. If you need to specify these fairly advanced
>>options, using a connection string really isn't a problem.
>>
>>I think it would be more worthwhile to go through the rest of the
>>tools in bin/ and make sure they *all* support connection strings.
>>And, an important point,  do it the same way.
>
>Presently I am trying to implement the option-1 by adding an extra command
line
>Option -C "connection_string" to pg_basebackup and pg_receivexlog.
>This option can be used with all the tools in bin folder.
>
>The existing command line options to the tools are not planned to remove as
of now.
>
>To handle both options, we can follow these approaches.
>
>1. To make the code simpler, the connection string is formed inside with
the existing
>command line options, if the user is not provided the "connection_string"
option.
>which is used for further processing.
>
>2. The connection_string and existing command line options are handled
separately.
>
>I feel approach-1 is better. Please provide your suggestions on the same.

Here is the patch which handles taking of connection string as an argument
to pg_basebackup and pg_receivexlog. 

Description of changes: 

1. New command line "-C connection-string"option is added for passing the
connection string. 
2. Used "PQconnectdb" function for connecting to server instead of existing
function "PQconnectdbParams". 
3. The existing command line parameters are formed in a string and passed to
"PQconnectdb" function. 
4. With the connection string, if user provides additional options with
existing command line options, higher priority is given for the additional
options. 
5. "conninfo_parse" function is modified to handle of single quote in the
password provided as input. 


please provide your suggestions.


Regards, 
Hari babu.


pg_basebkup_recvxlog_conn_string_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review]

On Thursday, January 24, 2013 6:51 PM Andres Freund wrote:
> On 2013-01-24 18:37:29 +0530, Amit Kapila wrote:
> > On Thursday, January 24, 2013 5:25 PM Andres Freund wrote:
> > > On 2013-01-24 16:45:42 +0530, Amit Kapila wrote:
> > > > > * The gram.y changes arround set_rest_(more|common) seem pretty
> > > > > confused
> > > > >   to me.
> > > >
> > > > >E.g. its not possible anymore to set the timezone for a
> function.
> > > >
> > > > What do you exactly mean by this part of comment.
> > >
> > > The set_rest_more production is used in FunctionSetResetClause and
> > > youve
> > > removed capabilities from it.
> >
> > set_rest_more has set_reset_common, so there should be no problem.
> >
> > set_rest_more:/* Generic SET syntaxes: */
> > set_rest_common
> > | var_name FROM CURRENT_P
> > {
> > VariableSetStmt *n =
> makeNode(VariableSetStmt);
> > n->kind = VAR_SET_CURRENT;
> > n->name = $1;
> > $$ = n;
> > }
> 
> True. I still think that the split youve made isn't right as-is.

I am working on to provide the exact reasoning of split and if some variable
is not appropriate,
I shall do the appropriate movement.

> > > > > * Writing the temporary file to .$pid seems like a bad idea,
> better use
> > > > >   one file for that, SET PERSISTENT is protected by an
> exclusive lock
> > > > >   anyway.
> > > >
> > > >   I think we can use one temporary file, infact that was one of
> the ways I
> > > > have asked in one of the previous mails.
> > > >   However Tom and Zoltan felt this is better way to do it.
> 
> > > The have? I didn't read it like that. The file can only ever
> written by
> > > a running postmaster and we already have code that ensures that.
> > > There's
> > > absolutely no need for the tempfile to have a nondeterministic
> > > name. That makes cleanup way easier as well.
> >
> > Sure, I understand that cleaning will be easier. So IIUC you are
> suggesting,
> > we should just keep
> > name as postgresql.auto.conf.tmp
> >
> > In general, please read the below mail where it has been suggested to
> use
> > .$pid
> > http://www.postgresql.org/message-id/28379.1358541...@sss.pgh.pa.us
> 
> That was in the context of your use of mkstemp() as far as I read
> it. We use constantly named temp files in other locations as well.

Okay, as I told you that I have also proposed initially, so as now you have
pointed it specifically,
I shall do it that way unless somebody will have another strong point of not
doing it this way. 
 
> > > Why on earth should somebody have the tempfile open? There's an
> > > exclusive lock around writing the file in your code and if anybody
> > > interferes like that with postgres' temporary data you have far
> bigger
> > > problems than SET PERSISTENT erroring out.
> >
> > I am also not sure, but may be some people do for test purpose.
> 
> That seems like an completely pointless reasoning.

I have no specific reason, so I shall do it the way you have suggested.

> > > > > * the write sequence should be:
> > > > >   * fsync(tempfile)
> > > > >   * fsync(directory)
> > > > >   * rename(tempfile, configfile)
> > > > >   * fsync(configfile)
> > > > >   * fsync(directory)
> > > >
> > > > Why do we need fsync(directory) and fsync(configfile)?
> > >
> > > So they don't vanish /get corrupted after a crash? The above is the
> > > canonically safe way to safely write a file without an invalid file
> > > ever
> > > being visible.
> >
> > Do you think there will be problem if we just use as below:
> > * fsync(tempfile)
> > * rename(tempfile, configfile)
> >
> > I have seen such kind of use elsewhere also in code
> (writeTimeLineHistory())
> 
> Yea, there are some places where the code isn't entirely safe. No
> reason
> to introduce more of those.

Okay, I will check this and let you know if I have any doubts.

> > > > > * the check that prevents persistent SETs in a transaction
> should
> > > > > rather
> > > > >   be in utility.c and use PreventTransactionChain like most of
> the
> > > > >   others that need to do that (c.f. T_CreatedbStmt).
> > > >
> > > >   We can move the check in utility.c, but if we use
> > > PreventTransactionChain,
> > > >   then it will be disallowed in functions as well. But the
> original
> > > idea was to not support in
> > > >   transaction blocks only.
> > > >   So I feel use of current function IsTransactionBlock() should
> be
> > > okay.
> > >
> > > I don't think its even remotely safe to allow doing this from a
> > > function. Doing it there explicitly allows doing it in a
> transaction.
> >
> > As SET command is allowed inside a function, so I don't think without
> any
> > reason we should disallow SET PERSISTENT in function.
> > The reason why it's not allowed in transaction is that we

Re: [HACKERS] [PATCH 0/3] Work around icc miscompilation

Xi Wang  writes:
> On 1/24/13 5:02 AM, Heikki Linnakangas wrote:
>> If we fix these, can we stop using -frapv on gcc? Is there any way to 
>> get gcc to warn about these?

> I believe we can get rid of -fwrapv once we fix all such checks.

TBH, I find that statement to be nonsense, and these patches to be
completely the wrong way to go about it.

The fundamental problem here is that the compiler, unless told otherwise
by a compilation switch, believes it is entitled to assume that no
integer overflow will happen anywhere in the program.  Therefore, any
error check that is looking for overflow *should* get optimized away.
The only reason the compiler would fail to do that is if its optimizer
isn't quite smart enough to prove that the code is testing for an
overflow condition.  So what you are proposing here is merely the next
move in an arms race with the compiler writers, and it will surely
break again in a future generation of compilers.  Or even if these
particular trouble spots don't break, something else will.  The only
reliable solution is to not let the compiler make that type of
assumption.

So I think we should just reject all of these, and instead fix configure
to make sure it turns on icc's equivalent of -fwrapv.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Setting visibility map in VACUUM's second phase

2013-01-24 Thread Jeff Janes

On Thu, Jan 24, 2013 at 1:28 AM, Pavan Deolasee
 wrote:
> Hi Jeff,
>
> On Thu, Jan 24, 2013 at 2:41 PM, Jeff Janes  wrote:
>>
>> lazy_vacuum_page now pins and unpins the vmbuffer for each page it marks
>> all-visible, which seems like a lot of needless traffic since the next
>> vmbuffer is likely to be the same as the previous one.
>>
>
> Good idea. Even though the cost of pinning/unpinning may not be high
> with respect to the vacuum cost itself, but it seems to be a good idea
> because we already do that at other places. Do you have any other
> review comments on the patch or I'll fix this and send an updated
> patch soon.

That was the only thing that stood out to me.  You had some doubts
about visibility_cutoff_xid, but I don't know enough about that to
address them.  I agree that it would be nice to avoid the code
duplication, but I don't see a reasonable way to do that.

It applies cleanly (some offsets), builds without warnings, passes
make check under cassert.No documentation or regression tests are
needed.  We want this, and it does it.

I have verified the primary objective (setting vm sooner) but haven't
experimentally verified that this actually increases throughput for
some workload.  For pgbench when all data fits in shared_buffers, at
least it doesn't cause a readily noticeable slow down.

I haven't devised any patch-specific test cases, either for
correctness or performance.  I just used make check, pgbench, and the
"vacuum verbose" verification you told us about in the original
submission.

If we are going to scan a block twice, I wonder if it should be done
the other way around.  If there are any dead tuples that need to be
removed from indexes, there is no point in dirtying the page with a
HOT prune on the first pass when it will just be dirtied again after
the indexes are vacuumed.  I don't see this idea holding up your patch
though, as surely it would be more invasive than what you are doing.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v4 - Heikki's thoughts about the patch state

2013-01-24 Thread Steve Singer

On 13-01-24 06:40 AM, Andres Freund wrote:

Fair enough. I am also working on a user of this infrastructure but that
doesn't help you very much. Steve Singer seemed to make some stabs at
writing an output plugin as well. Steve, how far did you get there?

I was able to get something that generated output for INSERT statements
in a format similar to what a modified slony apply trigger would want.
This was with the list of tables to replicate hard-coded in the plugin.
This was with the patchset from the last commitfest.I had gotten a bit
hung up on the UPDATE and DELETE support because slony allows you to use
an arbitrary user specified unique index as your key. It looks like
better support for tables with a unique non-primary key is in the most
recent patch set. I am hoping to have time this weekend to update my
plugin to use parameters passed in on the init and other updates in the
most recent version. If I make some progress I will post a link to my
progress at the end of the weekend. My big issue is that I have limited
time to spend on this.

BTW, why does all the transaction reordering stuff has to be in core?

It didn't use to, but people argued pretty damned hard that no undecoded
data should ever allowed to leave the postgres cluster. And to be fair
it makes writing an output plugin *way* much easier. Check
http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=blob;f=contrib/test_decoding/test_decoding.c;hb=xlog-decoding-rebasing-cf4
If you skip over tuple_to_stringinfo(), which is just pretty generic
scaffolding for converting a whole tuple to a string, writing out the
changes in some format by now is pretty damn simple.

I think we will find that the replication systems won't be the only
users of this feature. I have often seen systems that have a logging
requirement for auditing purposes or to log then reconstruct the
sequence of changes made to a set of tables in order to feed a
downstream application. Triggers and a journaling table are the
traditional way of doing this but it should be pretty easy to write a
plugin to accomplish the same thing that should give better
performance. If the reordering stuff wasn't in core this would be much
harder.

How much of this infrastructure is to support replicating DDL changes? IOW,
if we drop that requirement, how much code can we slash?

Unfortunately I don't think too much unless we add in other code that
allows us to check whether the current definition of a table is still
the same as it was back when the tuple was logged.

Any other features or requirements that could be dropped? I think it's clear at
this stage that
this patch is not going to be committed as it is. If you can reduce it to a
fraction of what it is now, that fraction might have a chance. Otherwise,
it's just going to be pushed to the next commitfest as whole, and we're
going to be having the same doubts and discussions then.

One thing that reduces complexity is to declare the following as
unsupported:
- CREATE TABLE foo(data text);
- DECODE UP TO HERE;
- INSERT INTO foo(data) VALUES(very-long-to-be-externally-toasted-tuple);
- DROP TABLE foo;
- DECODE UP TO HERE;

but thats just a minor thing.

I think what we can do more realistically than to chop of required parts
of changeset extraction is to start applying some of the preliminary
patches independently:
- the relmapper/relfilenode changes + pg_relation_by_filenode(spc,
relnode) should be independently committable if a bit boring
- allowing walsenders to connect to a database possibly needs an interface
change
but otherwise it should be fine to go in independently. It also has
other potential use-cases, so I think thats fair.
- logging xl_running_xact's more frequently could also be committed
independently and makes sense independently as it allows a standby to
enter HS faster if the master is busy
- Introducing InvalidCommandId should be relatively uncontroversial. The
fact that no invalid value for command ids exists is imo an oversight
- the *Satisfies change could be applied and they are imo ready but
there's no use-case for it without the rest, so I am not sure whether
theres a point
- currently not separately available, but we could add wal_level=logical
independently. There would be no user of it, but it would be partial
work. That includes the relcache support for keeping track of the
primary key which already is available separately.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BUG #6510: A simple prompt is displayed using wrong charset

On Thu, Jan 24, 2013 at 08:50:36AM -0500, Andrew Dunstan wrote:
> On 01/24/2013 03:42 AM, Craig Ringer wrote:
>> On 01/24/2013 01:06 AM, Alexander Law wrote:
>>> Hello,
>>> Please let me know if I can do something to get the bug fix  
>>> (https://commitfest.postgresql.org/action/patch_view?id=902) 
>>> committed.
>>> I would like to fix other bugs related to postgres localization, but  
>>> I am not sure yet how to do it.
>>
>> For anyone looking for the history, the 1st post on this topic is here:
>>
>> http://www.postgresql.org/message-id/e1s3twb-0004oy...@wrigleys.postgresql.org
>
>
> Yeah.
>
> I'm happy enough with this patch. ISTM it's really a bug and should be  
> backpatched, no?

It is a bug, yes.  I'm neutral on whether to backpatch.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Proposal for Allow postgresql.conf values to be changed via SQL [review]

Andres Freund  writes:
> On 2013-01-24 16:45:42 +0530, Amit Kapila wrote:
>>> * Writing the temporary file to .$pid seems like a bad idea, better use
>>> one file for that, SET PERSISTENT is protected by an exclusive lock
>>> anyway.

>> I think we can use one temporary file, infact that was one of the ways I
>> have asked in one of the previous mails.
>> However Tom and Zoltan felt this is better way to do it.

> The have? I didn't read it like that. The file can only ever written by
> a running postmaster and we already have code that ensures that. There's
> absolutely no need for the tempfile to have a nondeterministic
> name. That makes cleanup way easier as well.

Say again?  Surely the temp file is being written by whichever backend
is executing SET PERSISTENT, and there could be more than one.

Or, if it isn't and you really do mean that this patch is getting the
postmaster process involved in executing SET PERSISTENT, that's going to
be sufficient grounds for rejection right there.  The only way that we
have to keep the postmaster reliable is to not have it doing more than
it absolutely must.

While I have not been paying very much attention to this patch, I have
to say that every time I've looked into the thread I've gotten an
overwhelming impression of overdesign and overcomplication.  This should
not be a very large or subtle patch.  Write a setting into a temp file,
rename it into place, done.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Skip checkpoint on promoting from streaming replication

On 6 January 2013 21:58, Simon Riggs  wrote:
> On 9 August 2012 10:45, Simon Riggs  wrote:
>> On 22 June 2012 05:03, Kyotaro HORIGUCHI
>>  wrote:
>>
>>>I hope this is promising.
>>
>> I've reviewed this and thought about it over some time.
>
> I've been torn between the need to remove the checkpoint for speed and
> being worried about the implications of doing so.
>
> We promote in multiple use cases. When we end a PITR, or are
> performing a switchover, it doesn't really matter how long the
> shutdown checkpoint takes, so I'm inclined to leave it there in those
> cases. For failover, we need fast promotion.
>
> So my thinking is to make   pg_ctl promote -m fast
> be the way to initiate a fast failover that skips the shutdown checkpoint.
>
> That way all existing applications work the same as before, while new
> users that explicitly choose to do so will gain from the new option.

Here's a patch to skip checkpoint when we do

  pg_ctl promote -m fast

We keep the end of recovery checkpoint in all other cases.

The only thing left from Kyotaro's patch is a single line of code -
the call to ReadCheckpointRecord() that checks to see if the WAL
records for the last two restartpoints is on disk, which was an
important line of code.

Patch implements a new record type XLOG_END_OF_RECOVERY that behaves
on replay like a shutdown checkpoint record. I put this back in from
my patch because I believe its important that we have a clear place
where the WAL history changes timelineId. WAL format change bump
required.

So far this is only barely tested, but considering time goes on, I
thought people might want to pass comment on this.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

fast_promote.v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Proposal for Allow postgresql.conf values to be changed via SQL [review]

On 2013-01-24 11:22:52 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2013-01-24 16:45:42 +0530, Amit Kapila wrote:
> >>> * Writing the temporary file to .$pid seems like a bad idea, better use
> >>> one file for that, SET PERSISTENT is protected by an exclusive lock
> >>> anyway.
> 
> >> I think we can use one temporary file, infact that was one of the ways I
> >> have asked in one of the previous mails.
> >> However Tom and Zoltan felt this is better way to do it.
> 
> > The have? I didn't read it like that. The file can only ever written by
> > a running postmaster and we already have code that ensures that. There's
> > absolutely no need for the tempfile to have a nondeterministic
> > name. That makes cleanup way easier as well.
> 
> Say again?  Surely the temp file is being written by whichever backend
> is executing SET PERSISTENT, and there could be more than one.

Sure, but the patch acquires SetPersistentLock exlusively beforehand
which seems fine to me.


Any opinion whether its acceptable to allow SET PERSISTENT in functions?
It seems absurd to me to allow it, but Amit seems to be of another
opinion.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 0/3] Work around icc miscompilation

On 1/24/13 10:48 AM, Tom Lane wrote:
> The fundamental problem here is that the compiler, unless told otherwise
> by a compilation switch, believes it is entitled to assume that no
> integer overflow will happen anywhere in the program.  Therefore, any
> error check that is looking for overflow *should* get optimized away.
> The only reason the compiler would fail to do that is if its optimizer
> isn't quite smart enough to prove that the code is testing for an
> overflow condition.  So what you are proposing here is merely the next
> move in an arms race with the compiler writers, and it will surely
> break again in a future generation of compilers.  Or even if these
> particular trouble spots don't break, something else will.  The only
> reliable solution is to not let the compiler make that type of
> assumption.

What I am proposing here is the opposite: _not_ to enter an arm race
with the compiler writers.  Instead, make the code conform to the C
standard, something both sides can agree on.

Particularly, avoid using (signed) overflowed results to detect
overflows, which the C standard clearly specifies as undefined
behavior and many compilers are actively exploiting.

We could use either unsigned overflows (which is well defined) or
precondition testing (like `x > INT_MAX - y' in the patches).

> So I think we should just reject all of these, and instead fix configure
> to make sure it turns on icc's equivalent of -fwrapv.

While I agree it's better to turn on icc's -fno-strict-overflow as a
workaround, the fundamental problem here is that we are _not_
programming in the C language.  Rather, we are programming in some
C-with-signed-wrapraround dialect.

The worst part of this C dialect is that it has never been specified
clearly what it means by "signed wraparound":

gcc's -fwrapv assumes signed wraparound for add/sub/mul, but not for
div (e.g., INT_MIN/-1 traps on x86) nor shift (e.g., 1<<32 produces
undef with clang).

clang's -fwrapv also assumes workaround for pointer arithmetic, while
gcc's does not.

I have no idea what icc and pathscale's -fno-strict-overflow option
does (probably the closest thing to -fwrapv).  Sometimes it prevents
such checks from being optimized away, sometimes it doesn't.

Anyway, it would not be surprising if future C compilers break this
dialect again.  But they shouldn't break standard-conforming code.

- xi

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] pg_isready (was: [WIP] pg_ping utility)

2013-01-24 Thread Phil Sorber

On Wed, Jan 23, 2013 at 8:12 PM, Fujii Masao  wrote:
> On Thu, Jan 24, 2013 at 8:47 AM, Phil Sorber  wrote:
>> On Wed, Jan 23, 2013 at 6:07 PM, Tom Lane  wrote:
>>> Phil Sorber  writes:
 On Wed, Jan 23, 2013 at 1:58 PM, Bruce Momjian  wrote:
> On Wed, Jan 23, 2013 at 12:27:45PM -0500, Tom Lane wrote:
>> +1 for default timeout --- if this isn't like "ping" where you are
>> expecting to run indefinitely, I can't see that it's a good idea for it
>> to sit very long by default, in any circumstance.
>>>
> FYI, the pg_ctl -w (wait) default is 60 seconds:
>>>
 Great. That is what I came to on my own as well. Figured that might be
 a sticking point, but as there is precedent, I'm happy with it.
>>>
>>> I'm not sure that's a relevant precedent at all.  What that number is
>>> is the time that pg_ctl will wait around for the postmaster to start or
>>> stop before reporting a problem --- and in either case, a significant
>>> delay (multiple seconds) is not surprising, because of crash-recovery
>>> work, shutdown checkpointing, etc.  For pg_isready, you'd expect to get
>>> a response more or less instantly, wouldn't you?  Personally, I'd decide
>>> that pg_isready is broken if it didn't give me an answer in a couple of
>>> seconds, much less a minute.
>>>
>>> What I had in mind was a default timeout of maybe 3 or 4 seconds...
>>
>> I was thinking that if it was in a script you wouldn't care if it was
>> 60 seconds. If it was at the command line you would ^C it much
>> earlier. I think the default linux TCP connection timeout is around 20
>> seconds. My feeling is everyone is going to have a differing opinion
>> on this, which is why I was hoping that some good precedent existed.
>> I'm fine with 3 or 4, whatever can be agreed upon.
>
> +1 with 3 or 4 secounds.
>
> Aside from this issue, I have one minor comment. ISTM you need to
> add the following line to the end of the help message. This line has
> been included in the help message of all the other client commands.
>
> Report bugs to .

Ok, I set the default timeout to 3 seconds, added the bugs email to
the help, and also added docs that I forgot last time.

Also, still hoping to get some feedback on my other issues.

Thanks.

>
> Regards,
>
> --
> Fujii Masao


pg_isready_timeout_v2.diff
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Skip checkpoint on promoting from streaming replication


On 24.01.2013 18:24, Simon Riggs wrote:

On 6 January 2013 21:58, Simon Riggs  wrote:

I've been torn between the need to remove the checkpoint for speed and
being worried about the implications of doing so.

We promote in multiple use cases. When we end a PITR, or are
performing a switchover, it doesn't really matter how long the
shutdown checkpoint takes, so I'm inclined to leave it there in those
cases. For failover, we need fast promotion.

So my thinking is to make   pg_ctl promote -m fast
be the way to initiate a fast failover that skips the shutdown checkpoint.

That way all existing applications work the same as before, while new
users that explicitly choose to do so will gain from the new option.


Here's a patch to skip checkpoint when we do

   pg_ctl promote -m fast

We keep the end of recovery checkpoint in all other cases.


Hmm, there seems to be no way to do a "fast" promotion with a trigger file.

I'm a bit confused why there needs to be special mode for this. Can't we 
just always do the "fast" promotion? I agree that there's no urgency 
when you're doing PITR, but shouldn't do any harm either. Or perhaps 
always do "fast" promotion when starting up from standby mode, and 
"slow" otherwise.


Are we comfortable enough with this to skip the checkpoint after crash 
recovery?


I may be missing something, but it looks like after a "fast" promotion, 
you don't request a new checkpoint. So it can take quite a while for the 
next checkpoint to be triggered by checkpoint_timeout/segments. That 
shouldn't be a problem, but I feel that it'd be prudent to request a new 
checkpoint immediately (not necessarily an "immediate" checkpoint, though).



The only thing left from Kyotaro's patch is a single line of code -
the call to ReadCheckpointRecord() that checks to see if the WAL
records for the last two restartpoints is on disk, which was an
important line of code.


Why's that important, just for paranoia? If the last two restartpoints 
have disappeared, something's seriously wrong, and you will be in 
trouble e.g if you crash at that point. Do we need to be extra paranoid 
when doing a "fast" promotion?



Patch implements a new record type XLOG_END_OF_RECOVERY that behaves
on replay like a shutdown checkpoint record. I put this back in from
my patch because I believe its important that we have a clear place
where the WAL history changes timelineId. WAL format change bump
required.


Agreed, such a WAL record is essential.

At replay, an end-of-recovery record should be a signal to the hot 
standby mechanism that there are no transactions running in the master 
at that point, same as a shutdown checkpoint.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Back-branch update releases coming in a couple weeks

2013-01-24 Thread Fujii Masao

On Thu, Jan 24, 2013 at 11:53 PM, MauMau  wrote:
> From: "Fujii Masao" 
>>
>> On Thu, Jan 24, 2013 at 7:42 AM, MauMau  wrote:
>>>
>>> I searched through PostgreSQL mailing lists with "WAL contains references
>>> to
>>> invalid pages", and i found 19 messages.  Some people encountered similar
>>> problem.  There were some discussions regarding those problems (Tom and
>>> Simon Riggs commented), but those discussions did not reach a solution.
>>>
>>> I also found a discussion which might relate to this problem.  Does this
>>> fix
>>> the problem?
>>>
>>> [BUG] lag of minRecoveryPont in archive recovery
>>>
>>> http://www.postgresql.org/message-id/20121206.130458.170549097.horiguchi.kyot...@lab.ntt.co.jp
>>
>>
>> Yes. Could you check whether you can reproduce the problem on the
>> latest REL9_2_STABLE?
>
>
> I tried to produce the problem by doing "pg_ctl stop -mi" against the
> primary more than ten times on REL9_2_STABLE, but the problem did not
> appear.  However, I encountered the crash only once out of dozens of
> failovers, possibly more than a hundred times, on PostgreSQL 9.1.6.  So, I'm
> not sure the problem is fixed in REL9_2_STABLE.

You can reproduce the problem in REL9_1_STABLE?

Sorry, I pointed wrong version, i.e., REL9_2_STABLE. Since you encountered
the problem in 9.1.6, you need to retry the test in REL9_1_STABLE.

>
> I'm wondering if the fix discussed in the above thread solves my problem.  I
> found the following differences between Horiguchi-san's case and my case:
>
> (1)
> Horiguchi-san says the bug outputs the message:
>
> WARNING:  page 0 of relation base/16384/16385 does not exist
>
> On the other hand, I got the message:
>
>
> WARNING:  page 506747 of relation base/482272/482304 was uninitialized
>
>
> (2)
> Horiguchi-san produced the problem when he shut the standby immediately and
> restarted it.  However, I saw the problem during failover.
>
>
> (3)
> Horiguchi-san did not use any index, but in my case the WARNING message
> refers to an index.
>
>
> But there's a similar point.  Horiguchi-san says the problem occurs after
> DELETE+VACUUM.  In my case, I shut the primary down while the application
> was doing INSERT/UPDATE.  As the below messages show, some vacuuming was
> running just before the immediate shutdown:
>
> ...
> LOG:  automatic vacuum of table "GOLD.scm1.tbl1": index scans: 0
> pages: 0 removed, 36743 remain
> tuples: 0 removed, 73764 remain
> system usage: CPU 0.09s/0.11u sec elapsed 0.66 sec
> LOG:  automatic analyze of table "GOLD.scm1.tbl1" system usage: CPU
> 0.00s/0.14u sec elapsed 0.32 sec
> LOG:  automatic vacuum of table "GOLD.scm1.tbl2": index scans: 0
> pages: 0 removed, 12101 remain
> tuples: 40657 removed, 44142 remain system usage: CPU 0.06s/0.06u sec
> elapsed 0.30 sec
> LOG:  automatic analyze of table "GOLD.scm1.tbl2" system usage: CPU
> 0.00s/0.06u sec elapsed 0.14 sec
> LOG:  received immediate shutdown request
> ...
>
>
> Could you tell me the details of the problem discussed and fixed in the
> upcoming minor release?  I would to like to know the phenomenon and its
> conditions, and whether it applies to my case.

http://www.postgresql.org/message-id/20121206.130458.170549097.horiguchi.kyot...@lab.ntt.co.jp

Could you read the discussion in the above thread?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Improve concurrency of foreign key locking

2013-01-24 Thread Alvaro Herrera

Alvaro Herrera wrote:
> Improve concurrency of foreign key locking

I noticed a bug in visibility routines when pg_upgrade is involved:
tuples marked FOR UPDATE in the old cluster (i.e. HEAP_XMAX_INVALID not
set, HEAP_XMAX_EXCL_LOCK set, HEAP_XMAX_IS_MULTI not set) are invisible
(dead) on the new cluster.  I had defined the HEAP_XMAX_IS_LOCKED_ONLY
thusly:

#define HEAP_XMAX_IS_LOCKED_ONLY(infomask) \
((infomask) & HEAP_XMAX_LOCK_ONLY)

but this doesn't work for the reason stated above.  The fix is to
redefine the macro like this:

/*
 * A tuple is only locked (i.e. not updated by its Xmax) if the
 * HEAP_XMAX_LOCK_ONLY bit is set; or, for pg_upgrade's sake, if the Xmax is
 * not a multi and the EXCL_LOCK bit is set.
 *
 * See also HeapTupleHeaderIsOnlyLocked, which also checks for a possible
 * aborted updater transaction.
 *
 * Beware of multiple evaluations of the argument.
 */
#define HEAP_XMAX_IS_LOCKED_ONLY(infomask) \
(((infomask) & HEAP_XMAX_LOCK_ONLY) || \
 (((infomask) & (HEAP_XMAX_IS_MULTI | HEAP_LOCK_MASK)) == 
HEAP_XMAX_EXCL_LOCK))



-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Proposal for Allow postgresql.conf values to be changed via SQL [review]

Andres Freund  writes:
> On 2013-01-24 11:22:52 -0500, Tom Lane wrote:
>> Say again?  Surely the temp file is being written by whichever backend
>> is executing SET PERSISTENT, and there could be more than one.

> Sure, but the patch acquires SetPersistentLock exlusively beforehand
> which seems fine to me.

Why should we have such a lock?  Seems like that will probably introduce
as many problems as it fixes.  Deadlock risk, blockages, etc.  It is not
necessary for atomicity, since rename() would be atomic already.

> Any opinion whether its acceptable to allow SET PERSISTENT in functions?
> It seems absurd to me to allow it, but Amit seems to be of another
> opinion.

Well, it's really a definitional question I think: do you expect that
subsequent failure of the transaction should cause such a SET to roll
back?

I think it would be entirely self-consistent to define SET PERSISTENT as
a nontransactional operation.  Then the implementation would just be to
write the file immediately when the command is executed, and there's no
particular reason why it can't be allowed inside a transaction block.

If you want it to be transactional, then the point of disallowing it in
transaction blocks (or functions) would be to not have a very large
window between writing the file and committing.  But it's still possible
that the transaction would fail somewhere in there, leading to the
inconsistent outcome that the transaction reports failing but we applied
the SET anyway.  I do agree that it would be nonsensical to allow SET
PERSISTENT in functions but not transaction blocks.

Another approach is to remember the requested setting until somewhere in
the pre-commit sequence, and then try to do the file write at that time.
I'm not terribly thrilled with that approach, though, because (a) it
only narrows the window for an inconsistent outcome, it doesn't remove
it entirely; (b) there are already too many things that want to be the
"last thing done before commit"; and (c) it adds complexity and overhead
that I'd just as soon this patch not add.  But if you want S.P. to be
transactional and allowed inside functions, I think this would be the
only acceptable implementation.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_retainxlog for inclusion in 9.3?

2013-01-24 Thread Peter Eisentraut

After reviewing this, it appears to me that this is really just a very
verbose version of

archive_command = 'sleep $initialsleep; while test $(psql -AtX -c "select 
pg_xlogfile_name(something) < $$%f$$ collate \"C\";") = t; sleep $sleep; done'

I think it might be better to just document this as an example.  I don't
quite see the overhead of maintaining another tool justified.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_retainxlog for inclusion in 9.3?

2013-01-24 Thread Magnus Hagander

On Thu, Jan 24, 2013 at 6:04 PM, Peter Eisentraut  wrote:
> After reviewing this, it appears to me that this is really just a very
> verbose version of
>
> archive_command = 'sleep $initialsleep; while test $(psql -AtX -c "select 
> pg_xlogfile_name(something) < $$%f$$ collate \"C\";") = t; sleep $sleep; done'
>
> I think it might be better to just document this as an example.  I don't
> quite see the overhead of maintaining another tool justified.

Well, obviously I don't entirely agree ;)

Yes, it's a convenience command. Like pg_standby was. And like many
other commands that we maintain as part of *core*, such as createuser,
vacuumdb, etc. Those can all be done with an even *simpler* command
than the one you suggest above. So I don't see that as an argument why
it wouldn't be useful.

Also, the command you suggest above does not work on Windows. You can
probably write a .BAT file to do it for you, but I'm pretty sure it's
impossible to do it as an archive_command there.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: Proposal for Allow postgresql.conf values to be changed via SQL [review]

On 2013-01-24 12:02:59 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2013-01-24 11:22:52 -0500, Tom Lane wrote:
> >> Say again?  Surely the temp file is being written by whichever backend
> >> is executing SET PERSISTENT, and there could be more than one.
> 
> > Sure, but the patch acquires SetPersistentLock exlusively beforehand
> > which seems fine to me.
> 
> Why should we have such a lock?  Seems like that will probably introduce
> as many problems as it fixes.  Deadlock risk, blockages, etc.  It is not
> necessary for atomicity, since rename() would be atomic already.

Well, the patch acquires it as-is, so I made the judgement based on
that.
But I think that lock isn't neccesarily a bad idea. While the
replacement of the values clearly is atomic due to the rename I think
there's another confusing behaviour lurking:

Backend A: starts
Backend B: starts
Backend A: reads the config
Backend B: reads the config
Backend A: does SET PERSISTENT foobar =..;
Backend B: does SET PERSISTENT foobar =..;

Now B overwrites the config change A has made as they are all stored in
the same file.

So the only safe way to do this seems to be:

LWLockAcquire(SetPersistentLock, LW_EXCLUSIVE);
ProcessConfigFile();
validate_option();
write_rename();
LWLockRelease();

We can decide not to care about the above case but the window isn't that
small if no reload is implied and so this seems to be overly grotty.

> > Any opinion whether its acceptable to allow SET PERSISTENT in functions?
> > It seems absurd to me to allow it, but Amit seems to be of another
> > opinion.
> 
> Well, it's really a definitional question I think: do you expect that
> subsequent failure of the transaction should cause such a SET to roll
> back?
> 
> I think it would be entirely self-consistent to define SET PERSISTENT as
> a nontransactional operation.  Then the implementation would just be to
> write the file immediately when the command is executed, and there's no
> particular reason why it can't be allowed inside a transaction block.

Thats how its implemented atm except for not allowing it in transactions.

I think the reason I am weary of allowing it inside transaction is that
I think the config file needs to be reloaded before writing the new one
and it seems dangerous to me to reload the config in all the possible
situations a function can be called.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_retainxlog for inclusion in 9.3?

Magnus Hagander  writes:
> On Thu, Jan 24, 2013 at 6:04 PM, Peter Eisentraut  wrote:
>> I think it might be better to just document this as an example.  I don't
>> quite see the overhead of maintaining another tool justified.

> Well, obviously I don't entirely agree ;)

> Yes, it's a convenience command. Like pg_standby was. And like many
> other commands that we maintain as part of *core*, such as createuser,
> vacuumdb, etc. Those can all be done with an even *simpler* command
> than the one you suggest above. So I don't see that as an argument why
> it wouldn't be useful.

We've discussed removing a lot of those tools, too.  Not breaking
backwards compatibility is probably the only reason they're still there.

In the case at hand, I seem to recall from upthread that we expect
this'd be obsolete in a release or two.  If that's true then I think
a para or two of documentation is a better idea than a tool we'll be
essentially condemned to keep maintaining forever.

> Also, the command you suggest above does not work on Windows. You can
> probably write a .BAT file to do it for you, but I'm pretty sure it's
> impossible to do it as an archive_command there.

Perhaps we could whip up such a .BAT file and put it in the docs?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_retainxlog for inclusion in 9.3?

2013-01-24 Thread Magnus Hagander

On Thu, Jan 24, 2013 at 6:25 PM, Tom Lane  wrote:
> Magnus Hagander  writes:
>> On Thu, Jan 24, 2013 at 6:04 PM, Peter Eisentraut  wrote:
>>> I think it might be better to just document this as an example.  I don't
>>> quite see the overhead of maintaining another tool justified.
>
>> Well, obviously I don't entirely agree ;)
>
>> Yes, it's a convenience command. Like pg_standby was. And like many
>> other commands that we maintain as part of *core*, such as createuser,
>> vacuumdb, etc. Those can all be done with an even *simpler* command
>> than the one you suggest above. So I don't see that as an argument why
>> it wouldn't be useful.
>
> We've discussed removing a lot of those tools, too.  Not breaking
> backwards compatibility is probably the only reason they're still there.
>
> In the case at hand, I seem to recall from upthread that we expect
> this'd be obsolete in a release or two.  If that's true then I think
> a para or two of documentation is a better idea than a tool we'll be
> essentially condemned to keep maintaining forever.

Not really sure there is such an expectation - any more than there was
such an expectation when we initially put pg_standby in there. It
would be *possible* to do it, certainly. But it's not like we have an
actual plan. And AFAIK the stuff that was discussed upthread was a
simplified version of it - not the full flexibility.

That said, it's certainly a point that we'd have to maintain it. But I
don't see why we'd have to maintain it beyond the point where we
included the same functionality in core, if we did.


>> Also, the command you suggest above does not work on Windows. You can
>> probably write a .BAT file to do it for you, but I'm pretty sure it's
>> impossible to do it as an archive_command there.
>
> Perhaps we could whip up such a .BAT file and put it in the docs?

That would probably work, yes.


--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review]

Andres Freund  writes:
> Backend A: does SET PERSISTENT foobar =..;
> Backend B: does SET PERSISTENT foobar =..;

> Now B overwrites the config change A has made as they are all stored in
> the same file.

Say what?  I thought the plan was one setting per file, so that we don't
get involved in having to parse-and-edit the file contents.  What was
all that argument about a new directory, if we're only using one file?

If we are using just one file, then I agree a lock would be needed to
synchronize updates.  But that seems to add a lot of complication
elsewhere.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: fix corner use case of variadic fuctions usage

2013-01-24 Thread Pavel Stehule

Hello

so there is updated version

+ some lines of doc
+ new regress tests

there are no reply to my previous mail - so I choose

concat(variadic null) ---> NULL
concat(variadic '{}') ---> empty string

this behave should not be precedent for any other variadic "any"
function, because concat() and concat_ws() functions has a specific
behave - when it is called with one parameter returns string or empty
string

Regards

Pavel


2013/1/23 Pavel Stehule :
> 2013/1/23 Tom Lane :
>> Pavel Stehule  writes:
>>> next related example
>>
>>> CREATE OR REPLACE FUNCTION public.myleast(VARIADIC integer[])
>>>  RETURNS integer
>>>  LANGUAGE sql
>>> AS $function$
>>> select min(v) from unnest($1) g(v)
>>> $function$
>>
>> The reason you get a null from that is that (1) unnest() produces zero
>> rows out for either a null or empty-array input, and (2) min() over
>> zero rows produces NULL.
>>
>> In a lot of cases, it's not very sane for aggregates over no rows to
>> produce NULL; the best-known example is that SUM() produces NULL, when
>> anyone who'd not suffered brain-damage from sitting on the SQL committee
>> would have made it return zero.  So I'm not very comfortable with
>> generalizing from this specific case to decide that NULL is the
>> universally right result.
>>
>
> I am testing some situation and there are no consistent idea - can we
> talk just only about functions concat and concat_ws?
>
> these functions are really specific - now we talk about corner use
> case and strongly PostgreSQL proprietary solution. So any solution
> should not too bad.
>
> Difference between all solution will by maximally +/- 4 simple rows
> per any function.
>
> Possibilities
>
> 1) A. concat(variadic NULL) => empty string, B. concat(variadic '{}')
> => empty string -- if we accept @A, then B is ok
> 2) A. concat(variadic NULL) => NULL, B. concat(variadic '{}') => NULL
> -- question - is @B valid ?
> 3) A. concat(variadic NULL) => NULL, B. concat(variadic '{}) => empty string
>
> There are no other possibility.
>
> I can live with any variant - probably we find any precedent to any
> variant in our code or in ANSI SQL.
>
> I like @2 as general concept for PostgreSQL variadic functions, but
> when we talk about concat() and concat_ws() @1 is valid too.
>
> Please, can somebody say his opinion early
>
> Regards
>
> Pavel
>
>
>
>> regards, tom lane


variadic_any_fix_20130124.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: Proposal for Allow postgresql.conf values to be changed via SQL [review]

On 2013-01-24 12:30:02 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > Backend A: does SET PERSISTENT foobar =..;
> > Backend B: does SET PERSISTENT foobar =..;
>
> > Now B overwrites the config change A has made as they are all stored in
> > the same file.
>
> Say what?  I thought the plan was one setting per file, so that we don't
> get involved in having to parse-and-edit the file contents.  What was
> all that argument about a new directory, if we're only using one file?

I initially thought that as well (and voted for it) but after reskimming
the thread and reading the patch that doesn't seem to be the case unless
its implemented in a way I don't understand:

+#define PG_AUTOCONF_DIR"auto.conf.d"
+#define PG_AUTOCONF_FILENAME   "postgresql.auto.conf"

+   /* Frame auto conf name and auto conf sample temp file name */
+   snprintf(AutoConfFileName, sizeof(AutoConfFileName), "%s/%s/%s",
+   ConfigFileDir,
+   PG_AUTOCONF_DIR,
+   PG_AUTOCONF_FILENAME);
+   snprintf(AutoConfTmpFileName, sizeof(AutoConfTmpFileName),"%s/%s/%s.%d",
+   ConfigFileDir,
+   PG_AUTOCONF_DIR,
+   PG_AUTOCONF_FILENAME,
+   (int) getpid());
+

I don't understand what the conf.d is all about either if only one file
is going to be used.

> If we are using just one file, then I agree a lock would be needed to
> synchronize updates.  But that seems to add a lot of complication
> elsewhere.

More people seem to have voted for the single file approach but I still
haven't understood why...

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Skip checkpoint on promoting from streaming replication

On 24 January 2013 16:52, Heikki Linnakangas  wrote:
> On 24.01.2013 18:24, Simon Riggs wrote:
>>
>> On 6 January 2013 21:58, Simon Riggs  wrote:
>>>
>>> I've been torn between the need to remove the checkpoint for speed and
>>>
>>> being worried about the implications of doing so.
>>>
>>> We promote in multiple use cases. When we end a PITR, or are
>>> performing a switchover, it doesn't really matter how long the
>>> shutdown checkpoint takes, so I'm inclined to leave it there in those
>>> cases. For failover, we need fast promotion.
>>>
>>> So my thinking is to make   pg_ctl promote -m fast
>>> be the way to initiate a fast failover that skips the shutdown
>>> checkpoint.
>>>
>>> That way all existing applications work the same as before, while new
>>> users that explicitly choose to do so will gain from the new option.
>>
>>
>> Here's a patch to skip checkpoint when we do
>>
>>pg_ctl promote -m fast
>>
>> We keep the end of recovery checkpoint in all other cases.
>
>
> Hmm, there seems to be no way to do a "fast" promotion with a trigger file.

True. I thought we were moving away from trigger files to use of "promote"

> I'm a bit confused why there needs to be special mode for this. Can't we
> just always do the "fast" promotion? I agree that there's no urgency when
> you're doing PITR, but shouldn't do any harm either. Or perhaps always do
> "fast" promotion when starting up from standby mode, and "slow" otherwise.
>
> Are we comfortable enough with this to skip the checkpoint after crash
> recovery?

I'm not. Maybe if we get no bugs we can make it do this always, in next release.

It;s fast when it needs to be and safe otherwise.


> I may be missing something, but it looks like after a "fast" promotion, you
> don't request a new checkpoint. So it can take quite a while for the next
> checkpoint to be triggered by checkpoint_timeout/segments. That shouldn't be
> a problem, but I feel that it'd be prudent to request a new checkpoint
> immediately (not necessarily an "immediate" checkpoint, though).

I thought of that and there is a long comment to explain why I didn't.

Two problems:

1) an immediate checkpoint can cause a disk/resource usage spike,
which is definitely not what you need just when a spike of connections
and new SQL hits the system.

2) If we did that, we would have an EndOfRecovery record, some other
records and then a Shutdown checkpoint.
As I right this, (2) is wrong, because we shouldn't do a a Shutdown
checkpoint anyway.

But I still think (1) is a valid concern.

>> The only thing left from Kyotaro's patch is a single line of code -
>> the call to ReadCheckpointRecord() that checks to see if the WAL
>> records for the last two restartpoints is on disk, which was an
>> important line of code.
>
>
> Why's that important, just for paranoia? If the last two restartpoints have
> disappeared, something's seriously wrong, and you will be in trouble e.g if
> you crash at that point. Do we need to be extra paranoid when doing a "fast"
> promotion?

The check is cheap, so what do we gain by skipping the check?

>> Patch implements a new record type XLOG_END_OF_RECOVERY that behaves
>> on replay like a shutdown checkpoint record. I put this back in from
>> my patch because I believe its important that we have a clear place
>> where the WAL history changes timelineId. WAL format change bump
>> required.
>
>
> Agreed, such a WAL record is essential.
>
> At replay, an end-of-recovery record should be a signal to the hot standby
> mechanism that there are no transactions running in the master at that
> point, same as a shutdown checkpoint.

I had a reason why I didn't do that, but it seems to have slipped my mind.

If I can't remember, I'll add it.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] LATERAL, UNNEST and spec compliance

2013-01-24 Thread David Fetter

Folks,

Andrew Gierth asked me to send this out as his email is in a parlous
state at the moment. My comments will follow in replies. Without
further ado:

SQL2008 says, for 7.6

6)
a) If TR is contained in a FC with no intervening , then the scope clause SC of TR is the or innermost that contains FC. The
scope of a range variable of TR is the , ,
, , and of SC, together
with every that is simply contained in FC and
is preceded by TR, and every that is simply
contained in FC and is preceded by TR, and the of all
s contained in SC that contain TR. If SC is the that is the of a simple table
query STQ, then the scope of a range variable of TR also includes the
of STQ.

This is the clause that defines the scope effect of LATERAL, and as can be
seen, it defines , i.e. UNNEST(), as having the
same behaviour as .

It is also worth noting at this point that pg's "FROM func()" syntax is not
in the spec (the nearest is "FROM TABLE()").

Our implementation of UNNEST currently deviates from the spec by not being
implicitly LATERAL; given the (sub)query

SELECT * FROM sometable, UNNEST(somearray);

then "somearray" is required to be a parameter or outer reference rather
than a column of "sometable". To get the spec's behaviour for this, we
currently have to do:

SELECT * FROM sometable, LATERAL UNNEST(somearray);

which is non-standard syntax. (In the spec, only can
follow LATERAL.)

(We also don't accept the (optional) syntax of S301, allowing multiple
parameters to UNNEST().)

As I see it, the current options are:

1. Do nothing, and insist on non-standard use of the LATERAL keyword.

2. Add UNNEST to the grammar (or parse analysis) as a special case, making
it implicitly LATERAL.

(This would make implementing S301 easier, but special cases are ugly.)

3. Make all cases of SRFs in the FROM-clause implicitly LATERAL.

(As far as I can tell, those cases whose behaviour would be changed by
this actually produce errors in versions prior to 9.3, so no working
code should be affected.)

Since LATERAL is new in 9.3, I think the pros and cons of these choices
should be considered now, rather than being allowed to slide by unexamined.
--
David Fetter http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

On Thu, Jan 17, 2013 at 07:54:55AM -0500, Robert Haas wrote:
> On Wed, Jan 16, 2013 at 1:38 PM, Tom Lane  wrote:
> >> Where I really need someone to hit me upside the head with a
> >> clue-stick is the code I added to the bottom of RelationBuildDesc()
> >> in relcache.c. The idea is that on first access to an unlogged MV,
> >> to detect that the heap has been replaced by the init fork, set
> >> relisvalid to false, and make the heap look normal again.
> >
> > Hmm.  I agree that relcache.c has absolutely no business doing that,
> > but not sure what else to do instead.  Seems like it might be better
> > done at first touch of the MV in the parser, rewriter, or planner ---
> > but the fact that I can't immediately decide which of those is right
> > makes me feel that it's still too squishy.
> 
> I think we shouldn't be doing that at all.  The whole business of
> transferring the relation-is-invalid information from the relation to
> a pg_class flag seems like a Rube Goldberg device to me.  I'm still
> not convinced that we should have a relation-is-invalid flag at all,
> but can we at least not have two?
> 
> It seems perfectly adequate to detect that the MV is invalid when we
> actually try to execute a plan - that is, when we first access the
> heap or one of its indexes.  So the bit can just live in the
> file-on-disk, and there's no need to have a second copy of it in
> pg_class.

Like Kevin, I want a way to distinguish unpopulated MVs from MVs that
genuinely yielded the empty set at last refresh.  I agree that there's no
particular need to store that fact in pg_class, and I would much prefer only
storing it one way in any case.  A user-visible disadvantage of the current
implementation is that relisvalid remains stale until something opens the rel.
That's fine for the system itself, but it can deceive user-initiated catalog
queries.  Imagine a check_postgres action that looks for invalid MVs to
complain about.  It couldn't just scan pg_class; it would need to first do
something that opens every MV.

I suggest the following:

1. Let an invalid MV have a zero-length heap.  Distinguish a valid, empty MV
   by giving it a page with no tuples.  This entails VACUUM[1] not truncating
   MVs below one page and the refresh operation, where necessary, extending
   the relation from zero pages to one.
2. Remove pg_class.relisvalid.
3. Add a bool field to RelationData.  The word "valid" is used in that context
   to refer to the validity of the structure itself, so perhaps call the new
   field rd_scannable.  RelationIsFlaggedAsValid() can become a macro;
   consider changing its name for consistency with the field name.
4. During relcache build, set the field to "RelationGetNumberBlocks(rel) != 0"
   for MVs, fixed "true" for everyone else.  Any operation that changes the
   field must, and probably would anyway, instigate a relcache invalidation.
5. Expose a database function, say pg_relation_scannable(), for querying the
   current state of a relation.  This supports user-level monitoring.

Does that seem reasonable?  One semantic difference to keep in mind is that
unlogged MVs will be considered invalid on the standby while valid on the
master.  That's essentially an accurate report, so I won't mind it.

For the benefit of the archives, I note that we almost need not truncate an
unlogged materialized view during crash recovery.  MVs are refreshed in a
VACUUM FULL-like manner: fill a new relfilenode, fsync it, and point the MV's
pg_class to that relfilenode.  When a crash occurs with no refresh in flight,
the latest refresh had been safely synced.  When a crash cuts short a refresh,
the pg_class update will not stick, and the durability of the old heap is not
in doubt.  However, non-btree index builds don't have the same property; we
would need to force an immediate sync of the indexes to be safe here.  It
would remain necessary to truncate unlogged MVs when recovering a base backup,
which may contain a partially-written refresh that did eventually commit.
Future MV variants that modify the MV in place would also need the usual
truncate on crash.

I'm going to follow this with a review covering a broader range of topics.

Thanks,
nm

[1] For the time being, it's unfortunate to VACUUM materialized views at all;
they only ever bear frozen tuples.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

Hi Kevin,

The patch conflicts with git master; I tested against master@{2013-01-20}.

On Wed, Jan 16, 2013 at 12:40:55AM -0500, Kevin Grittner wrote:
> I've been struggling with two areas:
> 
>  - pg_dump sorting for MVs which depend on other MVs

>From your later messages, I understand that you have a way forward on this.

>  - proper handling of the relisvalid flag for unlogged MVs after recovery

I have discussed this in a separate email.  While reading the patch to assess
that topic, I found a few more things:

> *** a/contrib/pg_upgrade/version_old_8_3.c
> --- b/contrib/pg_upgrade/version_old_8_3.c
> ***
> *** 145,151  old_8_3_check_for_tsquery_usage(ClusterInfo *cluster)
>   "FROM   
> pg_catalog.pg_class c, "
>   "   
> pg_catalog.pg_namespace n, "
>   "   
> pg_catalog.pg_attribute a "
> ! "WHERE  
> c.relkind = 'r' AND "
>   "   
> c.oid = a.attrelid AND "
>   "   
> NOT a.attisdropped AND "
>   "   
> a.atttypid = 'pg_catalog.tsquery'::pg_catalog.regtype AND "
> --- 145,151 
>   "FROM   
> pg_catalog.pg_class c, "
>   "   
> pg_catalog.pg_namespace n, "
>   "   
> pg_catalog.pg_attribute a "
> ! "WHERE  
> c.relkind in ('r', 'm') AND "
>   "   
> c.oid = a.attrelid AND "
>   "   
> NOT a.attisdropped AND "
>   "   
> a.atttypid = 'pg_catalog.tsquery'::pg_catalog.regtype AND "

PostgreSQL 8.3 clusters won't contain materialized views, so it doesn't really
matter whether this change happens or not.  I suggest adding a comment,
whether or not you keep the code change.

> *** a/contrib/sepgsql/sepgsql.h
> --- b/contrib/sepgsql/sepgsql.h
> ***
> *** 32,37 
> --- 32,39 
>   
>   /*
>* Internally used code of object classes
> +  *
> +  * NOTE: Materialized views are treated as tables for now.

This smells like a bypass of mandatory access control.  Unless you've
determined that this is correct within the sepgsql security model, I suggest
starting with a draconian policy, like simply crippling MVs.  Even if you have
determined that, separating out the nontrivial sepgsql support might be good.
The set of ideal reviewers is quite different.

>*/
>   #define SEPG_CLASS_PROCESS  0
>   #define SEPG_CLASS_FILE 1
> *** a/contrib/vacuumlo/vacuumlo.c
> --- b/contrib/vacuumlo/vacuumlo.c
> ***
> *** 209,215  vacuumlo(const char *database, const struct _param * param)
>   strcat(buf, "  AND a.atttypid = t.oid ");
>   strcat(buf, "  AND c.relnamespace = s.oid ");
>   strcat(buf, "  AND t.typname in ('oid', 'lo') ");
> ! strcat(buf, "  AND c.relkind = 'r'");
>   strcat(buf, "  AND s.nspname !~ '^pg_'");
>   res = PQexec(conn, buf);
>   if (PQresultStatus(res) != PGRES_TUPLES_OK)
> --- 209,215 
>   strcat(buf, "  AND a.atttypid = t.oid ");
>   strcat(buf, "  AND c.relnamespace = s.oid ");
>   strcat(buf, "  AND t.typname in ('oid', 'lo') ");
> ! strcat(buf, "  AND c.relkind in ('r', 'm')");

It concerns me slightly that older vacuumlo could silently remove large
objects still referenced by MVs.  Only slightly, though, because the next MV
refresh would remove those references anyway.  Is there anything we should do
to help that situation?  If nothing else, perhaps backpatch this patch hunk.

> +
> + WITH OIDS
> + WITHOUT OIDS
> + 
> +  
> +   These are obsolescent syntaxes equivalent to WITH (OIDS)
> +   and WITH (OIDS=FALSE), respectively.  If you wish to give
> +   both an OIDS setting and storage parameters, you must use
> +   the WITH ( ... ) syntax; see above.
> +  
> + 
> +

Let's not support OIDs on MVs.  They'll be regenerated on every refresh.

> ***
> *** 336,342  ExplainOneQuery(Query *query, IntoClause *into, ExplainState 
> *es,
>*/
>   void
>   ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
> !   const char *queryString, ParamListInfo params)
>   {
>

[HACKERS] NOT VALID FKs and Constraints

2013-01-24 Thread Cody Cutrer

9.1 introduced delayed validation on FKs, and 9.2 on table constraints,
however neither one has been useful due to lesser-locks on ALTER TABLE
being reverted (see
http://www.postgresql.org/message-id/1330350691-su...@alvh.no-ip.org),
preventing the lock benefits during the VALIDATE stage.

I don't see any commits since then about either one. Is fixing this still
on the radar for 9.3? Also, is a similar feature planned (for 9.3 or
further out) for NOT NULL column constraints?

Thanks,

Cody Cutrer

Re: [HACKERS] [PATCH] pg_isready (was: [WIP] pg_ping utility)

2013-01-24 Thread Fujii Masao

On Fri, Jan 25, 2013 at 1:45 AM, Phil Sorber  wrote:
> On Wed, Jan 23, 2013 at 8:12 PM, Fujii Masao  wrote:
>> On Thu, Jan 24, 2013 at 8:47 AM, Phil Sorber  wrote:
>>> On Wed, Jan 23, 2013 at 6:07 PM, Tom Lane  wrote:
 Phil Sorber  writes:
> On Wed, Jan 23, 2013 at 1:58 PM, Bruce Momjian  wrote:
>> On Wed, Jan 23, 2013 at 12:27:45PM -0500, Tom Lane wrote:
>>> +1 for default timeout --- if this isn't like "ping" where you are
>>> expecting to run indefinitely, I can't see that it's a good idea for it
>>> to sit very long by default, in any circumstance.

>> FYI, the pg_ctl -w (wait) default is 60 seconds:

> Great. That is what I came to on my own as well. Figured that might be
> a sticking point, but as there is precedent, I'm happy with it.

 I'm not sure that's a relevant precedent at all.  What that number is
 is the time that pg_ctl will wait around for the postmaster to start or
 stop before reporting a problem --- and in either case, a significant
 delay (multiple seconds) is not surprising, because of crash-recovery
 work, shutdown checkpointing, etc.  For pg_isready, you'd expect to get
 a response more or less instantly, wouldn't you?  Personally, I'd decide
 that pg_isready is broken if it didn't give me an answer in a couple of
 seconds, much less a minute.

 What I had in mind was a default timeout of maybe 3 or 4 seconds...
>>>
>>> I was thinking that if it was in a script you wouldn't care if it was
>>> 60 seconds. If it was at the command line you would ^C it much
>>> earlier. I think the default linux TCP connection timeout is around 20
>>> seconds. My feeling is everyone is going to have a differing opinion
>>> on this, which is why I was hoping that some good precedent existed.
>>> I'm fine with 3 or 4, whatever can be agreed upon.
>>
>> +1 with 3 or 4 secounds.
>>
>> Aside from this issue, I have one minor comment. ISTM you need to
>> add the following line to the end of the help message. This line has
>> been included in the help message of all the other client commands.
>>
>> Report bugs to .
>
> Ok, I set the default timeout to 3 seconds, added the bugs email to
> the help, and also added docs that I forgot last time.

Thanks!

> Also, still hoping to get some feedback on my other issues.

set_pglocale_pgservice() should be called?

I think that the command name (i.e., pg_isready) should be given to
PQpingParams() as fallback_application_name. Otherwise, the server
by default uses "unknown" as the application name of pg_isready.
It's undesirable.

Why isn't the following message output only when invalid option is
specified?

Try \"%s --help\" for more information.

When the conninfo string including the hostname or port number is
specified in -d option, pg_isready displays the wrong information
as follows.

$ pg_isready -d "port="
/tmp:5432 - no response

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v4 - Heikki's thoughts about the patch state

2013-01-24 Thread Robert Haas

On Thu, Jan 24, 2013 at 6:14 AM, Andres Freund  wrote:
> Thats way much more along the lines of what I am afraid of than the
> performance stuff - but Heikki cited those, so I replied to that.
>
> Note that I didn't say this must, must go in - I just don't think
> Heikki's reasoning about why not hit the nail on the head.

Fair enough, no argument.

Before getting bogged down in technical commentary, let me say this
very clearly: I am enormously grateful for your work on this project.
Logical replication based on WAL decoding is a feature of enormous
value that PostgreSQL has needed for a long time, and your work has
made that look like an achievable goal.  Furthermore, it seems to me
that you have pursued the community process with all the vigor and
sincerity for which anyone could ask.  Serious design concerns were
raised early in the process and you made radical changes to the design
which I believe have improved it tremendously, and you've continued to
display an outstanding attitude at every phase of this process about
which I can't say enough good things.  There is no question in my mind
that this work is going to be the beginning of a process that
revolutionizes the way people think about replication and PostgreSQL,
and you deserve our sincere thanks for that.

Now, the bad news is, I don't think it's very reasonable to try to
commit this to 9.3.  I think it is just too much stuff too late in the
cycle.  I've reviewed some of the patches from time to time but there
is a lot more stuff and it's big and complicated and it's not really
clear that we have the interface quite right yet, even though I think
it's also clear that we are a lot of closer than we were.  I don't
want to be fixing that during beta, much less after release.

> I tried very, very hard to get the basics of the design & interface
> solid. Which obviously doesn't man I am succeeding - luckily not being
> superhuman after all ;). And I think thats very much where input is
> desparetely needed and where I failed to raise enough attention. The
> "output plugin" interface follewed by the walsender interface is what
> needs to be most closely vetted.
> Those are the permanent, user/developer exposed UI and the one we should
> try to keep as stable as possible.
>
> The output plugin callbacks are defined here:
> http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=blob;f=src/include/replication/output_plugin.h;hb=xlog-decoding-rebasing-cf4
> To make it more agnostic of the technology to implement changeset
> extraction we possibly should replace the ReorderBuffer(TXN|Change)
> structs being passed by something more implementation agnostic.
>
> walsender interface:
> http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=blob;f=src/backend/replication/repl_gram.y;hb=xlog-decoding-rebasing-cf4
> The interesting new commands are:
> 1) K_INIT_LOGICAL_REPLICATION NAME NAME
> 2) K_START_LOGICAL_REPLICATION NAME RECPTR plugin_options
> 3) K_FREE_LOGICAL_REPLICATION NAME
>
> 1 & 3 allocate (respectively free) the permanent state associated with
> one changeset consumer whereas START_LOGICAL_REPLICATION streams out
> changes starting at RECPTR.

Forgive me for not having looked at the patch, but to what extent is
all this, ah, documented?

> Btw, there are currently *no* changes to the wal format at all if
> wal_format < logical except that xl_running_xacts are logged more
> frequently which obviously could easily be made conditional. Baring bugs
> of course.
> The changes with wal_level>=logical aren't that big either imo:
> * heap_insert, heap_update prevent full page writes from removing their
>   normal record by using a separate XLogRecData block for the buffer and
>   the record
> * heap_delete adds more data (the pkey of the tuple) after the unchanged
>   xl_heap_delete struct
> * On changes to catalog tables (relfilenode, tid, cmin, cmax) are logged.
>
> No changes to mvcc for normal backends at all, unless you count the very
> slightly changed *Satisfies interface (getting passed a HeapTuple
> instead of HeapTupleHeader).
>
> I am not sure what you're concerned about WRT the on-disk format of the
> tuples? We are pretty much nailed down on that due to pg_upgrade'ability
> anyway and it could be changed from this patches POV without a problem,
> the output plugin just sees normal HeapTuples? Or are you concerned
> about the code extracting them from the xlog records?

Mostly, my concern is that you've accidentally broken something, or
that your code will turn out to be flaky in ways we can't now predict.
 My only really specific concern at this point is about the special
treatment of catalog tables.  We've never done anything like that
before, and it feels like a bad idea.  In particular, the fact that
you have to WAL-log new information about cmin/cmax really suggests
that we're committing ourselves to the MVCC infrastructure in a way
that we weren't previously.  There's some category of stuff that our
MVCC

Re: [HACKERS] Materialized views WIP patch

2013-01-24 Thread Kevin Grittner

Thanks for looking at this!

Noah Misch wrote:

> For the benefit of the archives, I note that we almost need not truncate an
> unlogged materialized view during crash recovery. MVs are refreshed in a
> VACUUM FULL-like manner: fill a new relfilenode, fsync it, and point the MV's
> pg_class to that relfilenode. When a crash occurs with no refresh in flight,
> the latest refresh had been safely synced. When a crash cuts short a refresh,
> the pg_class update will not stick, and the durability of the old heap is not
> in doubt. However, non-btree index builds don't have the same property; we
> would need to force an immediate sync of the indexes to be safe here. It
> would remain necessary to truncate unlogged MVs when recovering a base backup,
> which may contain a partially-written refresh that did eventually commit.
> Future MV variants that modify the MV in place would also need the usual
> truncate on crash.

Hmm. That's a very good observation. Perhaps the issue can be
punted to a future release where we start adding more incremental
updates to them. I'll think on that, but on the face of it, it
sounds like the best choice.

> I'm going to follow this with a review covering a broader range
> of topics.

I'll need time to digest the rest of it. As you note, recent
commits conflict with the last patch. Please look at the github
repo where I've been working on this. I'll post an updated patch
later today.

https://github.com/kgrittn/postgres/tree/matview

You might want to ignore the interim work on detecting the new
pg_dump dependencies through walking the internal structures. I
decided that was heading in a direction which might be
unnecessarily fragile and slow; so I tried writing it as a query
against the system tables. I'm pretty happy with the results.
Here's the query:

with recursive w as
(
select
    d1.objid,
    d1.objid as wrkid,
    d2.refobjid,
    c2.relkind as refrelkind
  from pg_depend d1
  join pg_class c1 on c1.oid = d1.objid
                  and c1.relkind = 'm'
                  and c1.relisvalid
  join pg_rewrite r1 on r1.ev_class = d1.objid
  join pg_depend d2 on d2.classid = 'pg_rewrite'::regclass
                   and d2.objid = r1.oid
                   and d2.refobjid <> d1.objid
  join pg_class c2 on c2.oid = d2.refobjid
                  and c2.relkind in ('m','v')
                  and c2.relisvalid
  where d1.classid = 'pg_class'::regclass
union
select
    w.objid,
    w.refobjid as wrkid,
    d3.refobjid,
    c3.relkind as refrelkind
  from w
  join pg_rewrite r3 on r3.ev_class = w.refobjid
  join pg_depend d3 on d3.classid = 'pg_rewrite'::regclass
                   and d3.objid = r3.oid
                   and d3.refobjid <> w.refobjid
  join pg_class c3 on c3.oid = d3.refobjid
                  and c3.relkind in ('m','v')
                  and c3.relisvalid
  where w.refrelkind <> 'm'
),
x as
(
select objid::regclass, refobjid::regclass from w
  where refrelkind = 'm'
)
select 'm'::text as type, x.objid, x.refobjid from x
union all
select
    'i'::text as type,
    x.objid,
    i.indexrelid as refobjid
  from x
  join pg_index i on i.indrelid = x.refobjid
                 and i.indisvalid
;

If we bail on having pg_class.relisvalid, then it will obviously
need adjustment.

-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-01-24 Thread Robert Haas

On Wed, Jan 23, 2013 at 1:45 PM, Alvaro Herrera
 wrote:
> Andres Freund escribió:
>> I somewhat dislike the fact that CONCURRENTLY isn't really concurrent
>> here (for the listeners: swapping the indexes acquires exlusive locks) ,
>> but I don't see any other naming being better.
>
> REINDEX ALMOST CONCURRENTLY?

I'm kind of unconvinced of the value proposition of this patch.  I
mean, you can DROP INDEX CONCURRENTLY and CREATE INDEX CONCURRENTLY
today, so ... how is this better?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for REINDEX CONCURRENTLY

On Thu, Jan 24, 2013 at 01:29:56PM -0500, Robert Haas wrote:
> On Wed, Jan 23, 2013 at 1:45 PM, Alvaro Herrera
>  wrote:
> > Andres Freund escribió:
> >> I somewhat dislike the fact that CONCURRENTLY isn't really concurrent
> >> here (for the listeners: swapping the indexes acquires exlusive locks) ,
> >> but I don't see any other naming being better.
> >
> > REINDEX ALMOST CONCURRENTLY?
> 
> I'm kind of unconvinced of the value proposition of this patch.  I
> mean, you can DROP INDEX CONCURRENTLY and CREATE INDEX CONCURRENTLY
> today, so ... how is this better?

This has been on the TODO list for a while, and I don't think the
renaming in a transaction work needed to use drop/create is really
something we want to force on users.  In addition, doing that for all
tables in a database is even more work, so I would be disappointed _not_
to get this feature in 9.3.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v4 - Heikki's thoughts about the patch state

2013-01-24 Thread Stephen Frost

* Robert Haas (robertmh...@gmail.com) wrote:
> Now, the bad news is, I don't think it's very reasonable to try to
> commit this to 9.3.  I think it is just too much stuff too late in the
> cycle.  I've reviewed some of the patches from time to time but there
> is a lot more stuff and it's big and complicated and it's not really
> clear that we have the interface quite right yet, even though I think
> it's also clear that we are a lot of closer than we were.  I don't
> want to be fixing that during beta, much less after release.

The only way to avoid this happening again and again, imv, is to get it
committed early in whatever cycle it's slated to release for.  We've got
some serious challenges there though because we want to encourage
everyone to focus on beta testing and going through the release process,
plus we don't want to tag/branch too early or we create more work for
ourselves.

It would have been nice to get this into 9.3, but I can certainly
understand needing to move it back, but can we get a slightly more
specific plan around getting it in then?

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Strange Windows problem, lock_timeout test request

2013-01-24 Thread Jeff Janes

On Sat, Jan 19, 2013 at 12:15 PM, Andrew Dunstan  wrote:
> On 01/19/2013 02:36 AM, Boszormenyi Zoltan wrote:
>>
>
> A long time ago I had a lot of sympathy with this answer, but these days not
> so much. Getting a working mingw/msys environment sufficient to build a bare
> bones PostgreSQL from scratch is both cheap and fairly easy. The
> improvements that mingw has made in its install process, and the presence of
> cheap or free windows instances in the cloud combine to make this pretty
> simple.  But since it's still slightly involved here is how I constructed
> one such  this morning:

I've used this description, skipping the Amazon part and putting it
directly on my Windows computer, and it worked.

Except bin/pg_ctl does not work.  It just silently exits without doing
anything, so I have to use bin/postgres to start the database (which
is what "make check" uses anyway, so not a problem if you just want
make check).  Is that just me or is that a known problem?  I've seen
some discussion from 2004, but didn't find a conclusion.

What advantages does mingw have over MSVC?  Is it just that it is
cost-free, or is it easier to use mingw than MSVC for someone used to
building on Linux?  (mingw certainly does not seem to have the
advantage of being fast!)

Would you like to put this somewhere on wiki.postgresql.org, or would
you mind if I do so?

Thanks for the primer,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for REINDEX CONCURRENTLY

Bruce Momjian  writes:
> On Thu, Jan 24, 2013 at 01:29:56PM -0500, Robert Haas wrote:
>> I'm kind of unconvinced of the value proposition of this patch.  I
>> mean, you can DROP INDEX CONCURRENTLY and CREATE INDEX CONCURRENTLY
>> today, so ... how is this better?

> This has been on the TODO list for a while, and I don't think the
> renaming in a transaction work needed to use drop/create is really
> something we want to force on users.  In addition, doing that for all
> tables in a database is even more work, so I would be disappointed _not_
> to get this feature in 9.3.

I haven't given the current patch a look, but based on previous
discussions, this isn't going to be more than a macro for things that
users can do already --- that is, it's going to be basically DROP
CONCURRENTLY plus CREATE CONCURRENTLY plus ALTER INDEX RENAME, including
the fact that the RENAME step will transiently need an exclusive lock.
(If that's not what it's doing, it's broken.)  So there's some
convenience argument for it, but it's hardly amounting to a stellar
improvement.

I'm kind of inclined to put it off till after we fix the SnapshotNow
race condition problems; at that point it should be possible to do
REINDEX CONCURRENTLY more simply and without any exclusive lock
anywhere.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for REINDEX CONCURRENTLY

On 2013-01-24 13:29:56 -0500, Robert Haas wrote:
> On Wed, Jan 23, 2013 at 1:45 PM, Alvaro Herrera
>  wrote:
> > Andres Freund escribió:
> >> I somewhat dislike the fact that CONCURRENTLY isn't really concurrent
> >> here (for the listeners: swapping the indexes acquires exlusive locks) ,
> >> but I don't see any other naming being better.
> >
> > REINDEX ALMOST CONCURRENTLY?
> 
> I'm kind of unconvinced of the value proposition of this patch.  I
> mean, you can DROP INDEX CONCURRENTLY and CREATE INDEX CONCURRENTLY
> today, so ... how is this better?

In the wake of beb850e1d873f8920a78b9b9ee27e9f87c95592f I wrote a script
to do this and it really is harder than one might think:
* you cannot do it in the database as CONCURRENTLY cannot be used in a
  TX
* you cannot do it to toast tables (this is currently broken in the
  patch but should be fixable)
* you cannot legally do it when foreign key reference your unique key
* you cannot do it to exclusion constraints or non-immediate indexes

All of those are fixable (and most are) within REINDEX CONCURRENLY, so I
find that to be a major feature even if its not as good as it could be.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v4 - Heikki's thoughts about the patch state


On 24.01.2013 20:27, Robert Haas wrote:

Before getting bogged down in technical commentary, let me say this
very clearly: I am enormously grateful for your work on this project.
Logical replication based on WAL decoding is a feature of enormous
value that PostgreSQL has needed for a long time, and your work has
made that look like an achievable goal.  Furthermore, it seems to me
that you have pursued the community process with all the vigor and
sincerity for which anyone could ask.  Serious design concerns were
raised early in the process and you made radical changes to the design
which I believe have improved it tremendously, and you've continued to
display an outstanding attitude at every phase of this process about
which I can't say enough good things.


+1. I really appreciate all the work you Andres have put into this. I've 
argued in the past myself that there should be a little tool that 
scrapes the WAL to do logical replication. Essentially, just what you've 
implemented.


That said (hah, you knew there would be a "but" ;-)), now that I see 
what that looks like, I'm feeling that maybe it wasn't such a good idea 
after all. It sounded like a fairly small patch that greatly reduces the 
overhead in the master with existing replication systems like slony, but 
it turned out to be a huge patch with a lot of new concepts and interfaces.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Skip checkpoint on promoting from streaming replication

On 24 January 2013 17:44, Simon Riggs  wrote:

>> At replay, an end-of-recovery record should be a signal to the hot standby
>> mechanism that there are no transactions running in the master at that
>> point, same as a shutdown checkpoint.
>
> I had a reason why I didn't do that, but it seems to have slipped my mind.
>
> If I can't remember, I'll add it.

I think it was simply to keep things simple and avoid bugs in this release.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gistchoose vs. bloat


On 21.01.2013 15:19, Heikki Linnakangas wrote:

On 21.01.2013 15:06, Tom Lane wrote:

Jeff Davis writes:

On Mon, 2013-01-21 at 00:48 -0500, Tom Lane wrote:

I looked at this patch. ISTM we should not have the option at all but
just do it always. I cannot believe that always-go-left is ever a
preferable strategy in the long run; the resulting imbalance in the
index will surely kill any possible benefit. Even if there are some
cases where it miraculously fails to lose, how many users are going to
realize that applies to their case and make use of the option?



Sounds good to me.



If I remember correctly, there was also an argument that it may be
useful for repeatable test results. That's a little questionable for
performance (except in those cases where few penalties are identical
anyway), but could plausibly be useful for a crash report or something.


Meh. There's already a random decision, in the equivalent place and for
a comparable reason, in btree (cf _bt_findinsertloc). Nobody's ever
complained about that being indeterminate, so I'm unconvinced that
there's a market for it with gist.


I wonder if it would work for gist to do something similar to
_bt_findinsertloc, and have a bias towards the left page, but sometimes
descend to one of the pages to the right. You would get the cache
locality of usually descending down the same subtree, but also fill the
pages to the right. Jeff / Alexander, want to give that a shot?


I did some experimenting with that. I used the same test case Alexander 
did, with geonames data, and compared unpatched version, the original 
patch, and the attached patch that biases the first "best" tuple found, 
but still sometimes chooses the other equally good ones.


testname| initsize | finalsize | idx_blks_read | idx_blks_hit
+--+---+---+--
 patched-10-4mb | 75497472 |  90202112 |   5853604 | 10178331
 unpatched-4mb  | 75145216 |  94863360 |   5880676 | 10185647
 unpatched-4mb  | 75587584 |  97165312 |   5903107 | 10183759
 patched-2-4mb  | 74768384 |  81403904 |   5768124 | 10193738
 origpatch-4mb  | 74883072 |  82182144 |   5783412 | 10185373

All these tests were performed with shared_buffers=4MB, so that the 
index won't fit completely in shared buffers, and I could use the 
idx_blk_read/hit ratio as a measure of cache-friendliness. The 
"origpath" test was with a simplified version of Alexander's patch, see 
attached. It's the same as the original, but with the 
randomization-option and gist-build related code removed. The patched-10 
and patched-2 tests are two variants with my patch, with 1/10 and 1/2 
chance of moving to the next equal tuple, respectively. The differences 
in cache hit ratio might be just a result of different index sizes. I 
included two unpatched runs above to show that there's a fair amount of 
noise in these tests. That's because of the random nature of the test 
case; it picks rows to delete and insert at random.


I think the conclusion is that all of these patches are effective. The 
1/10 variant is less effective, as expected, as it's closer in behavior 
to the unpatched behavior than the others. The 1/2 variant seems as good 
as the original patch.



I performed another test, by inserting 100 duplicate values on an 
empty table.


  testname  | finalsize | idx_blks_read | idx_blks_hit
+---+---+--
 unpatched  |  89030656 | 21350 |  2972033
 patched-10 |  88973312 | 21450 |  2971920
 patched-2  |  88481792 | 22947 |  2970260
 origpatch  |  61186048 |761817 |  2221500

The original patch produces a much smaller index in this test, but since 
the inserts are distributed randomly, the pages are accessed randomly 
which is bad for caching.


A table full of duplicates isn't very realistic, but overall, I'm 
leaning towards my version of this patch (gistchoose-2.patch). It has 
less potential for causing a regression in existing applications, but is 
just as effective in the original scenario of repeated delete+insert.


- Heikki


duplicatetest.sh
Description: Bourne shell script


gistbloat.sh
Description: Bourne shell script
diff --git a/src/backend/access/gist/gistutil.c b/src/backend/access/gist/gistutil.c
index 8b60774..75778f6 100644
--- a/src/backend/access/gist/gistutil.c
+++ b/src/backend/access/gist/gistutil.c
@@ -379,6 +379,7 @@ gistchoose(Relation r, Page p, IndexTuple it,	/* it has compressed entry */
 	GISTENTRY	entry,
 identry[INDEX_MAX_KEYS];
 	bool		isnull[INDEX_MAX_KEYS];
+	int			look_further_on_equal = -1;
 
 	Assert(!GistPageIsLeaf(p));
 
@@ -446,6 +447,8 @@ gistchoose(Relation r, Page p, IndexTuple it,	/* it has compressed entry */
 
 if (j < r->rd_att->natts - 1)
 	best_penalty[j + 1] = -1;
+
+look_further_on_equal = -1;
 			}
 			else if (best_penalty[j] == usize)
 			{
@@ -468,12 +471,46 @@ gistchoose(Relation

Re: [HACKERS] [PATCH] pg_isready (was: [WIP] pg_ping utility)

2013-01-24 Thread Phil Sorber

On Thu, Jan 24, 2013 at 1:12 PM, Fujii Masao  wrote:
> set_pglocale_pgservice() should be called?
>
> I think that the command name (i.e., pg_isready) should be given to
> PQpingParams() as fallback_application_name. Otherwise, the server
> by default uses "unknown" as the application name of pg_isready.
> It's undesirable.
>
> Why isn't the following message output only when invalid option is
> specified?
>
> Try \"%s --help\" for more information.

I've updated the patch to address these three issues. Attached.

>
> When the conninfo string including the hostname or port number is
> specified in -d option, pg_isready displays the wrong information
> as follows.
>
> $ pg_isready -d "port="
> /tmp:5432 - no response
>

This is what i asked about in my previous email about precedence of
the parameters. I can parse that with PQconninfoParse, but what are
the rules for merging both individual and conninfo params together?

For example if someone did: pg_isready -h foo -d "host=bar port=4321" -p 1234

What should the connection parameters be?

> Regards,
>
> --
> Fujii Masao


pg_isready_timeout_v3.diff
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gistchoose vs. bloat


On 21.01.2013 15:19, Heikki Linnakangas wrote:

On 21.01.2013 15:06, Tom Lane wrote:

Jeff Davis writes:

On Mon, 2013-01-21 at 00:48 -0500, Tom Lane wrote:

I looked at this patch. ISTM we should not have the option at all but
just do it always. I cannot believe that always-go-left is ever a
preferable strategy in the long run; the resulting imbalance in the
index will surely kill any possible benefit. Even if there are some
cases where it miraculously fails to lose, how many users are going to
realize that applies to their case and make use of the option?



Sounds good to me.



If I remember correctly, there was also an argument that it may be
useful for repeatable test results. That's a little questionable for
performance (except in those cases where few penalties are identical
anyway), but could plausibly be useful for a crash report or something.


Meh. There's already a random decision, in the equivalent place and for
a comparable reason, in btree (cf _bt_findinsertloc). Nobody's ever
complained about that being indeterminate, so I'm unconvinced that
there's a market for it with gist.


I wonder if it would work for gist to do something similar to
_bt_findinsertloc, and have a bias towards the left page, but sometimes
descend to one of the pages to the right. You would get the cache
locality of usually descending down the same subtree, but also fill the
pages to the right. Jeff / Alexander, want to give that a shot?


I did some experimenting with that. I used the same test case Alexander 
did, with geonames data, and compared unpatched version, the original 
patch, and the attached patch that biases the first "best" tuple found, 
but still sometimes chooses the other equally good ones.


testname| initsize | finalsize | idx_blks_read | idx_blks_hit
+--+---+---+--
 patched-10-4mb | 75497472 |  90202112 |   5853604 | 10178331
 unpatched-4mb  | 75145216 |  94863360 |   5880676 | 10185647
 unpatched-4mb  | 75587584 |  97165312 |   5903107 | 10183759
 patched-2-4mb  | 74768384 |  81403904 |   5768124 | 10193738
 origpatch-4mb  | 74883072 |  82182144 |   5783412 | 10185373

All these tests were performed with shared_buffers=4MB, so that the 
index won't fit completely in shared buffers, and I could use the 
idx_blk_read/hit ratio as a measure of cache-friendliness. The 
"origpath" test was with a simplified version of Alexander's patch, see 
attached. It's the same as the original, but with the 
randomization-option and gist-build related code removed. The patched-10 
and patched-2 tests are two variants with my patch, with 1/10 and 1/2 
chance of moving to the next equal tuple, respectively. The differences 
in cache hit ratio might be just a result of different index sizes. I 
included two unpatched runs above to show that there's a fair amount of 
noise in these tests. That's because of the random nature of the test 
case; it picks rows to delete and insert at random.


I think the conclusion is that all of these patches are effective. The 
1/10 variant is less effective, as expected, as it's closer in behavior 
to the unpatched behavior than the others. The 1/2 variant seems as good 
as the original patch.



I performed another test, by inserting 100 duplicate values on an 
empty table.


  testname  | finalsize | idx_blks_read | idx_blks_hit
+---+---+--
 unpatched  |  89030656 | 21350 |  2972033
 patched-10 |  88973312 | 21450 |  2971920
 patched-2  |  88481792 | 22947 |  2970260
 origpatch  |  61186048 |761817 |  2221500

The original patch produces a much smaller index in this test, but since 
the inserts are distributed randomly, the pages are accessed randomly 
which is bad for caching.


A table full of duplicates isn't very realistic, but overall, I'm 
leaning towards my version of this patch (gistchoose-2.patch). It has 
less potential for causing a regression in existing applications, but is 
just as effective in the original scenario of repeated delete+insert.


- Heikki


duplicatetest.sh
Description: Bourne shell script


gistbloat.sh
Description: Bourne shell script
diff --git a/src/backend/access/gist/gistutil.c b/src/backend/access/gist/gistutil.c
index 8b60774..75778f6 100644
--- a/src/backend/access/gist/gistutil.c
+++ b/src/backend/access/gist/gistutil.c
@@ -379,6 +379,7 @@ gistchoose(Relation r, Page p, IndexTuple it,	/* it has compressed entry */
 	GISTENTRY	entry,
 identry[INDEX_MAX_KEYS];
 	bool		isnull[INDEX_MAX_KEYS];
+	int			look_further_on_equal = -1;
 
 	Assert(!GistPageIsLeaf(p));
 
@@ -446,6 +447,8 @@ gistchoose(Relation r, Page p, IndexTuple it,	/* it has compressed entry */
 
 if (j < r->rd_att->natts - 1)
 	best_penalty[j + 1] = -1;
+
+look_further_on_equal = -1;
 			}
 			else if (best_penalty[j] == usize)
 			{
@@ -468,12 +471,46 @@ gistchoose(Relation

Re: [HACKERS] Strange Windows problem, lock_timeout test request



On 01/24/2013 01:44 PM, Jeff Janes wrote:

On Sat, Jan 19, 2013 at 12:15 PM, Andrew Dunstan  wrote:

On 01/19/2013 02:36 AM, Boszormenyi Zoltan wrote:
A long time ago I had a lot of sympathy with this answer, but these days not
so much. Getting a working mingw/msys environment sufficient to build a bare
bones PostgreSQL from scratch is both cheap and fairly easy. The
improvements that mingw has made in its install process, and the presence of
cheap or free windows instances in the cloud combine to make this pretty
simple.  But since it's still slightly involved here is how I constructed
one such  this morning:

I've used this description, skipping the Amazon part and putting it
directly on my Windows computer, and it worked.

Except bin/pg_ctl does not work.  It just silently exits without doing
anything, so I have to use bin/postgres to start the database (which
is what "make check" uses anyway, so not a problem if you just want
make check).  Is that just me or is that a known problem?  I've seen
some discussion from 2004, but didn't find a conclusion.


Did you copy libpq.dll from the lib directory to the bin directory? If 
not, try that and see if it fixes the problem.




What advantages does mingw have over MSVC?  Is it just that it is
cost-free, or is it easier to use mingw than MSVC for someone used to
building on Linux?  (mingw certainly does not seem to have the
advantage of being fast!)


See Craig's email today about problems with SDKs and availabilty of



Would you like to put this somewhere on wiki.postgresql.org, or would
you mind if I do so?

Thanks for the primer,

Jeff





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gistchoose vs. bloat

Heikki Linnakangas  writes:
> I did some experimenting with that. I used the same test case Alexander 
> did, with geonames data, and compared unpatched version, the original 
> patch, and the attached patch that biases the first "best" tuple found, 
> but still sometimes chooses the other equally good ones.

>  testname| initsize | finalsize | idx_blks_read | idx_blks_hit
> +--+---+---+--
>   patched-10-4mb | 75497472 |  90202112 |   5853604 | 10178331
>   unpatched-4mb  | 75145216 |  94863360 |   5880676 | 10185647
>   unpatched-4mb  | 75587584 |  97165312 |   5903107 | 10183759
>   patched-2-4mb  | 74768384 |  81403904 |   5768124 | 10193738
>   origpatch-4mb  | 74883072 |  82182144 |   5783412 | 10185373

> I think the conclusion is that all of these patches are effective. The 
> 1/10 variant is less effective, as expected, as it's closer in behavior 
> to the unpatched behavior than the others. The 1/2 variant seems as good 
> as the original patch.

At least on this example, it seems a tad better, if you look at index
size.

> A table full of duplicates isn't very realistic, but overall, I'm 
> leaning towards my version of this patch (gistchoose-2.patch). It has 
> less potential for causing a regression in existing applications, but is 
> just as effective in the original scenario of repeated delete+insert.

+1 for this patch, but I think the comments could use more work.  I was
convinced it was wrong on first examination, mainly because it's hard to
follow the underdocumented look_further_on_equal logic.  I propose the
attached, which is the same logic with better comments (I also chose to
rename and invert the sense of the state variable, because it seemed
easier to follow this way ... YMMV on that though).

regards, tom lane

diff --git a/src/backend/access/gist/gistutil.c b/src/backend/access/gist/gistutil.c
index 8b60774f50349222ee04fef13613e3592120973b..a127627334e80704edeed38d3522e4000a695abb 100644
*** a/src/backend/access/gist/gistutil.c
--- b/src/backend/access/gist/gistutil.c
*** gistchoose(Relation r, Page p, IndexTupl
*** 379,384 
--- 379,385 
  	GISTENTRY	entry,
  identry[INDEX_MAX_KEYS];
  	bool		isnull[INDEX_MAX_KEYS];
+ 	int			keep_current_best;
  
  	Assert(!GistPageIsLeaf(p));
  
*** gistchoose(Relation r, Page p, IndexTupl
*** 402,407 
--- 403,433 
  	best_penalty[0] = -1;
  
  	/*
+ 	 * If we find a tuple that's exactly as good as the currently best one, we
+ 	 * could use either one.  When inserting a lot of tuples with the same or
+ 	 * similar keys, it's preferable to descend down the same path when
+ 	 * possible, as that's more cache-friendly.  On the other hand, if all
+ 	 * inserts land on the same leaf page after a split, we're never going to
+ 	 * insert anything to the other half of the split, and will end up using
+ 	 * only 50% of the available space.  Distributing the inserts evenly would
+ 	 * lead to better space usage, but that hurts cache-locality during
+ 	 * insertion.  To get the best of both worlds, when we find a tuple that's
+ 	 * exactly as good as the previous best, choose randomly whether to stick
+ 	 * to the old best, or use the new one.  Once we decide to stick to the
+ 	 * old best, we keep sticking to it for any subsequent equally good tuples
+ 	 * we might find.  This favors tuples with low offsets, but still allows
+ 	 * some inserts to go to other equally-good subtrees.
+ 	 *
+ 	 * keep_current_best is -1 if we haven't yet had to make a random choice
+ 	 * whether to keep the current best tuple.  If we have done so, and
+ 	 * decided to keep it, keep_current_best is 1; if we've decided to
+ 	 * replace, keep_current_best is 0.  (This state will be reset to -1 as
+ 	 * soon as we've made the replacement, but sometimes we make the choice in
+ 	 * advance of actually finding a replacement best tuple.)
+ 	 */
+ 	keep_current_best = -1;
+ 
+ 	/*
  	 * Loop over tuples on page.
  	 */
  	maxoff = PageGetMaxOffsetNumber(p);
*** gistchoose(Relation r, Page p, IndexTupl
*** 446,451 
--- 472,480 
  
  if (j < r->rd_att->natts - 1)
  	best_penalty[j + 1] = -1;
+ 
+ /* we have new best, so reset keep-it decision */
+ keep_current_best = -1;
  			}
  			else if (best_penalty[j] == usize)
  			{
*** gistchoose(Relation r, Page p, IndexTupl
*** 468,479 
  		}
  
  		/*
! 		 * If we find a tuple with zero penalty for all columns, there's no
! 		 * need to examine remaining tuples; just break out of the loop and
! 		 * return it.
  		 */
  		if (zero_penalty)
! 			break;
  	}
  
  	return result;
--- 497,537 
  		}
  
  		/*
! 		 * If we looped past the last column, and did not update "result",
! 		 * then this tuple is exactly as good as the prior best tuple.
! 		 */
! 		if (j == r->rd_att->natts && result != i)
! 		{
! 			if (keep_curr

Re: [HACKERS] Materialized views WIP patch

On Thu, Jan 24, 2013 at 01:29:10PM -0500, Kevin Grittner wrote:
> Noah Misch wrote:
> > For the benefit of the archives, I note that we almost need not truncate an
> > unlogged materialized view during crash recovery. MVs are refreshed in a
> > VACUUM FULL-like manner: fill a new relfilenode, fsync it, and point the 
> > MV's
> > pg_class to that relfilenode. When a crash occurs with no refresh in flight,
> > the latest refresh had been safely synced. When a crash cuts short a 
> > refresh,
> > the pg_class update will not stick, and the durability of the old heap is 
> > not
> > in doubt. However, non-btree index builds don't have the same property; we
> > would need to force an immediate sync of the indexes to be safe here. It
> > would remain necessary to truncate unlogged MVs when recovering a base 
> > backup,
> > which may contain a partially-written refresh that did eventually commit.
> > Future MV variants that modify the MV in place would also need the usual
> > truncate on crash.
> 
> Hmm. That's a very good observation. Perhaps the issue can be
> punted to a future release where we start adding more incremental
> updates to them. I'll think on that, but on the face of it, it
> sounds like the best choice.

That situation is challenging for the same reason pg_class.relisvalid was hard
to implement for unlogged relations.  The startup process doesn't know the
relkind of the unlogged-relation relfilenodes it cleans.  If you can work
through all that, it's certainly a nice endpoint to not lose unlogged snapshot
MVs on crash.  But I intended the first half of my message as the
recommendation and the above as a wish for the future.

> You might want to ignore the interim work on detecting the new
> pg_dump dependencies through walking the internal structures. I
> decided that was heading in a direction which might be
> unnecessarily fragile and slow; so I tried writing it as a query
> against the system tables. I'm pretty happy with the results.
> Here's the query:
> 
> with recursive w as
[snip]

Why is the dependency problem of ordering MV refreshes and MV index builds so
different from existing pg_dump dependency problems?

> If we bail on having pg_class.relisvalid, then it will obviously
> need adjustment.

Even if we don't have the column, we can have the fact of an MV's validity
SQL-visible in some other way.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Strange Windows problem, lock_timeout test request



On 01/24/2013 02:41 PM, Andrew Dunstan wrote:




What advantages does mingw have over MSVC?  Is it just that it is
cost-free, or is it easier to use mingw than MSVC for someone used to
building on Linux?  (mingw certainly does not seem to have the
advantage of being fast!)


See Craig's email today about problems with SDKs and availabilty of



Not sure what happened there.

... availability of free 64 bit MSVC compilers.

Also, some third party libraries are built with the Mingw compilers and 
it's often best not to switch if you can help it.






Would you like to put this somewhere on wiki.postgresql.org, or would
you mind if I do so?


Please go for it.

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gistchoose vs. bloat

2013-01-24 Thread Alexander Korotkov

On Thu, Jan 24, 2013 at 11:26 PM, Heikki Linnakangas <
hlinnakan...@vmware.com> wrote:

> On 21.01.2013 15:19, Heikki Linnakangas wrote:
>
>> On 21.01.2013 15:06, Tom Lane wrote:
>>
>>> Jeff Davis writes:
>>>
 On Mon, 2013-01-21 at 00:48 -0500, Tom Lane wrote:

> I looked at this patch. ISTM we should not have the option at all but
> just do it always. I cannot believe that always-go-left is ever a
> preferable strategy in the long run; the resulting imbalance in the
> index will surely kill any possible benefit. Even if there are some
> cases where it miraculously fails to lose, how many users are going to
> realize that applies to their case and make use of the option?
>

>>>  Sounds good to me.

>>>
>>>  If I remember correctly, there was also an argument that it may be
 useful for repeatable test results. That's a little questionable for
 performance (except in those cases where few penalties are identical
 anyway), but could plausibly be useful for a crash report or something.

>>>
>>> Meh. There's already a random decision, in the equivalent place and for
>>> a comparable reason, in btree (cf _bt_findinsertloc). Nobody's ever
>>> complained about that being indeterminate, so I'm unconvinced that
>>> there's a market for it with gist.
>>>
>>
>> I wonder if it would work for gist to do something similar to
>> _bt_findinsertloc, and have a bias towards the left page, but sometimes
>> descend to one of the pages to the right. You would get the cache
>> locality of usually descending down the same subtree, but also fill the
>> pages to the right. Jeff / Alexander, want to give that a shot?
>>
>
> I did some experimenting with that. I used the same test case Alexander
> did, with geonames data, and compared unpatched version, the original
> patch, and the attached patch that biases the first "best" tuple found, but
> still sometimes chooses the other equally good ones.
>
> testname| initsize | finalsize | idx_blks_read | idx_blks_hit
> +--+--**-+---+**--
>  patched-10-4mb | 75497472 |  90202112 |   5853604 | 10178331
>  unpatched-4mb  | 75145216 |  94863360 |   5880676 | 10185647
>  unpatched-4mb  | 75587584 |  97165312 |   5903107 | 10183759
>  patched-2-4mb  | 74768384 |  81403904 |   5768124 | 10193738
>  origpatch-4mb  | 74883072 |  82182144 |   5783412 | 10185373
>
> All these tests were performed with shared_buffers=4MB, so that the index
> won't fit completely in shared buffers, and I could use the
> idx_blk_read/hit ratio as a measure of cache-friendliness. The "origpath"
> test was with a simplified version of Alexander's patch, see attached. It's
> the same as the original, but with the randomization-option and gist-build
> related code removed. The patched-10 and patched-2 tests are two variants
> with my patch, with 1/10 and 1/2 chance of moving to the next equal tuple,
> respectively. The differences in cache hit ratio might be just a result of
> different index sizes. I included two unpatched runs above to show that
> there's a fair amount of noise in these tests. That's because of the random
> nature of the test case; it picks rows to delete and insert at random.
>
> I think the conclusion is that all of these patches are effective. The
> 1/10 variant is less effective, as expected, as it's closer in behavior to
> the unpatched behavior than the others. The 1/2 variant seems as good as
> the original patch.
>

Does two unpatched-4mb lines are different by coincidence? If so then
variance is significant and we need more experiments to actually compare
patched-2-4mb and origpatch-4mb lines.
There is another cause of overhead when use randomization in gistchoose:
extra penalty calls. It could be significant when index fits to cache. In
order evade it I especially change behaviour of my patch from "look
sequentially and choose random" to "look in random order". I think we need
to include comparison of CPU time.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] Materialized views WIP patch

2013-01-24 Thread Kevin Grittner

Noah Misch wrote:
> On Thu, Jan 24, 2013 at 01:29:10PM -0500, Kevin Grittner wrote:
>> Noah Misch wrote:
>>> For the benefit of the archives, I note that we almost need not truncate an
>>> unlogged materialized view during crash recovery. MVs are refreshed in a
>>> VACUUM FULL-like manner: fill a new relfilenode, fsync it, and point the 
>>> MV's
>>> pg_class to that relfilenode. When a crash occurs with no refresh in flight,
>>> the latest refresh had been safely synced. When a crash cuts short a 
>>> refresh,
>>> the pg_class update will not stick, and the durability of the old heap is 
>>> not
>>> in doubt. However, non-btree index builds don't have the same property; we
>>> would need to force an immediate sync of the indexes to be safe here. It
>>> would remain necessary to truncate unlogged MVs when recovering a base 
>>> backup,
>>> which may contain a partially-written refresh that did eventually commit.
>>> Future MV variants that modify the MV in place would also need the usual
>>> truncate on crash.
>> 
>> Hmm. That's a very good observation. Perhaps the issue can be
>> punted to a future release where we start adding more incremental
>> updates to them. I'll think on that, but on the face of it, it
>> sounds like the best choice.
> 
> That situation is challenging for the same reason pg_class.relisvalid was hard
> to implement for unlogged relations. The startup process doesn't know the
> relkind of the unlogged-relation relfilenodes it cleans. If you can work
> through all that, it's certainly a nice endpoint to not lose unlogged snapshot
> MVs on crash. But I intended the first half of my message as the
> recommendation and the above as a wish for the future.

Well, if I just don't create an init fork for MVs, they are left as
they were on recovery, aren't they? So for 9.3, that solves that
issue, I think. pg_class.relisvald is a separate issue.

>> You might want to ignore the interim work on detecting the new
>> pg_dump dependencies through walking the internal structures. I
>> decided that was heading in a direction which might be
>> unnecessarily fragile and slow; so I tried writing it as a query
>> against the system tables. I'm pretty happy with the results.
>> Here's the query:
>> 
>> with recursive w as
> [snip]
> 
> Why is the dependency problem of ordering MV refreshes and MV index builds so
> different from existing pg_dump dependency problems?

If mva has indexes and is referenced by mvb, the CREATE statements
are all properly ordered, but you want mva populated and indexed
before you attempt to populate mvb. (Populated to get correct
results, indexed to get them quickly.) We don't have anything else
like that.

>> If we bail on having pg_class.relisvalid, then it will obviously
>> need adjustment.
> 
> Even if we don't have the column, we can have the fact of an MV's validity
> SQL-visible in some other way.

Sure, I didn't say we had to abandon the query -- probably just
replace the relisvalid tests with a function call using the oid of
the MV.

-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

Jameison Martin  writes:
> there have been a lot of different threads on this patch. i'm going to take a 
> stab at teasing them out, summarizing them, and addressing them individually.

Thanks for reviewing the bidding so carefully.

> 4.3) does it violate a principle in the code (Tom's latest email)

> from my perspective, the code has to deal with the fact that natts on the 
> persistent row is different than the tupdesc already, this is the foundation 
> upon which ALTER TABLE ADD NULL and DROP COLUMN as a metadata operation are 
> built. so i believe that this patch strengthens the code's ability to handle 
> the ALTER TABLE ADD NULL case by generalizing it: now the code will more 
> frequently need to deal with the disparity between natts and tupdesc rather 
> than only after someone did an ALTER TABLE. i'm an advocate of making corner 
> cases go through generalized code where possible.

I think we're already pretty convinced that that case works, since ALTER
TABLE ADD COLUMN has been around for a very long time.

> however, in all honestly i hadn't consider Tom's point that the patch has 
> created a situation where natts on a row can deviate from the tupdesc that it 
> was built with (as opposed to the current tupdesc which it could always 
> deviate due to a subsequent ALTER TABLE). this is a subtle point and i don't 
> think i have enough experience in the Postgres code to argue one way or 
> another about it. so i'll have to leave that determination up to people that 
> are more familiar with the code.

To be a bit more concrete, the thing that is worrying me is that the
executor frequently transforms tuples between "virtual" and HeapTuple
formats (look at the TupleTableSlot infrastructure, junk filters, and
tuplesort/tuplestore, for example).  Up to now such transformation could
be counted on not to affect the apparent number of columns in the tuple;
but with this patch, a tuple materialized as a HeapTuple might well show
an natts smaller than what you'd conclude from looking at it in virtual
slot format.  Now it's surely true that any code that's broken by that
would be broken anyway if it were fed a tuple direct-from-disk from a
relation that had suffered a later ADD COLUMN, but there are lots of
code paths where raw disk tuples would never appear.  So IMO there's a
pretty fair chance of exposing latent bugs with this.

As an example that this type of concern isn't hypothetical, and that
identifying such bugs can be very difficult, see commit 039680aff.
That bug had been latent for years before it was exposed by an unrelated
change, and it took several more years to track it down.

As I said, I'd be willing to take this risk if the patch showed more
attractive performance benefits ... but it still seems mighty marginal
from here.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gistchoose vs. bloat

Alexander Korotkov  writes:
> There is another cause of overhead when use randomization in gistchoose:
> extra penalty calls. It could be significant when index fits to cache. In
> order evade it I especially change behaviour of my patch from "look
> sequentially and choose random" to "look in random order". I think we need
> to include comparison of CPU time.

Hmm ... actually, isn't that an argument in favor of Heikki's method?
The way he's doing it, we can exit without making additional penalty
calls once we've found a zero-penalty tuple and decided not to look
further (which is something with a pretty high probability).

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [BUGS] BUG #7815: Upgrading PostgreSQL from 9.1 to 9.2 with pg_upgrade/postgreql-setup fails - invalid status retrieve

On Wed, Jan 23, 2013 at 04:58:57PM -0500, Bruce Momjian wrote:
> Attached is a ready-to-apply version of the patch.  I continued to use
> postmaster.pid to determine if I need to try the start/stop test ---
> that allows me to know which servers _might_ be running, because a
> server start failure might be for many reasons and I would prefer not to
> suggest the server is running if I can avoid it, and the pid file gives
> me that.
> 
> The patch is longer than I expected because the code needed to be
> reordered.  The starting banner indicates if it a live check, but to
> check for a live server we need to start/stop the servers and we needed
> to know where the binaries are, and we didn't do that until after the
> start banner.  I removed the 'check' line for binary checks, and moved
> that before the banner printing.  postmaster_start also now needs an
> option to supress an error.

Applied.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

On Thu, Jan 24, 2013 at 03:14:15PM -0500, Kevin Grittner wrote:
> Noah Misch wrote:
> > On Thu, Jan 24, 2013 at 01:29:10PM -0500, Kevin Grittner wrote:
> >> Noah Misch wrote:
> >>> For the benefit of the archives, I note that we almost need not truncate 
> >>> an
> >>> unlogged materialized view during crash recovery. MVs are refreshed in a
> >>> VACUUM FULL-like manner: fill a new relfilenode, fsync it, and point the 
> >>> MV's
> >>> pg_class to that relfilenode. When a crash occurs with no refresh in 
> >>> flight,
> >>> the latest refresh had been safely synced. When a crash cuts short a 
> >>> refresh,
> >>> the pg_class update will not stick, and the durability of the old heap is 
> >>> not
> >>> in doubt. However, non-btree index builds don't have the same property; we
> >>> would need to force an immediate sync of the indexes to be safe here. It
> >>> would remain necessary to truncate unlogged MVs when recovering a base 
> >>> backup,
> >>> which may contain a partially-written refresh that did eventually commit.
> >>> Future MV variants that modify the MV in place would also need the usual
> >>> truncate on crash.
> >> 
> >> Hmm. That's a very good observation. Perhaps the issue can be
> >> punted to a future release where we start adding more incremental
> >> updates to them. I'll think on that, but on the face of it, it
> >> sounds like the best choice.
> > 
> > That situation is challenging for the same reason pg_class.relisvalid was 
> > hard
> > to implement for unlogged relations. The startup process doesn't know the
> > relkind of the unlogged-relation relfilenodes it cleans. If you can work
> > through all that, it's certainly a nice endpoint to not lose unlogged 
> > snapshot
> > MVs on crash. But I intended the first half of my message as the
> > recommendation and the above as a wish for the future.
> 
> Well, if I just don't create an init fork for MVs, they are left as
> they were on recovery, aren't they? So for 9.3, that solves that
> issue, I think. pg_class.relisvald is a separate issue.

The startup process just looks for init forks, yes.  But it's acceptable to
leave the unlogged MV materials alone during *crash* recovery only.  When
recovering from a base backup, we once again need an init fork to refresh the
unlogged-MV relations.  In turn, we would still need a relisvalid
implementation that copes.  This is all solvable, sure, but it looks like a
trip off into the weeds relative to the core aim of this patch.

> > Why is the dependency problem of ordering MV refreshes and MV index builds 
> > so
> > different from existing pg_dump dependency problems?
> 
> If mva has indexes and is referenced by mvb, the CREATE statements
> are all properly ordered, but you want mva populated and indexed
> before you attempt to populate mvb. (Populated to get correct
> results, indexed to get them quickly.) We don't have anything else
> like that.

Is the REFRESH order just a replay of the CREATE order (with index builds
interspersed), or can it differ?

nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gistchoose vs. bloat


On 24.01.2013 22:35, Tom Lane wrote:

Alexander Korotkov  writes:

There is another cause of overhead when use randomization in gistchoose:
extra penalty calls. It could be significant when index fits to cache. In
order evade it I especially change behaviour of my patch from "look
sequentially and choose random" to "look in random order". I think we need
to include comparison of CPU time.


Hmm ... actually, isn't that an argument in favor of Heikki's method?
The way he's doing it, we can exit without making additional penalty
calls once we've found a zero-penalty tuple and decided not to look
further (which is something with a pretty high probability).


No, I think Alexander is right, although I believe the difference is 
minimal in practice.


If we assume that the there are no zero-penalty tuples on the page, with 
both approaches you have to call penalty on every tuple to know which is 
best. If there are zero-penalty tuples, then there is a small 
difference. With Alexander's method, you can stop looking as soon as you 
find a zero-penalty tuple (the order you check the tuples is random). 
With my method, you can stop looking as soon as you find a zero-penalty 
tuple, *and* you decide to not look further. With the 1/2 probability to 
stop looking further, you give up on average after 2 tuples.


So if I'm doing my math right, my patch does on average between 1x - 2x 
as many penalty calls as Alexander's, depending on how many zero-penalty 
tuples there are. The 2x case is when the page is full of zero-penalty 
tuples, in which case the difference isn't big in absolute terms; 2 
penalty calls per page versus 1.


BTW, one thing that I wondered about this: How expensive is random()? 
I'm assuming not very, but I don't really know. Alexander's patch called 
random() for every tuple on the page, while I call it only once for each 
equal-penalty tuple. If it's really cheap, I think my/Tom's patch could 
be slightly simplified by always initializing keep_current_best with 
random() at top of the function, and eliminating the -1 "haven't decided 
yet" state. OTOH, if random() is expensive, I note that we only need one 
bit of randomness, so we could avoid calling random() so often if we 
utilized all the bits from each random() call.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] text search: restricting the number of parsed words in headline generation

On Wed, Aug 15, 2012 at 11:09:18PM +0530, Sushant Sinha wrote:
> I will do the profiling and present the results.

Sushant, do you have any profiling results on this issue from August?

---


> 
> On Wed, 2012-08-15 at 12:45 -0400, Tom Lane wrote:
> > Bruce Momjian  writes:
> > > Is this a TODO?
> > 
> > AFAIR nothing's been done about the speed issue, so yes.  I didn't
> > like the idea of creating a user-visible knob when the speed issue
> > might be fixable with internal algorithm improvements, but we never
> > followed up on this in either fashion.
> > 
> > regards, tom lane
> > 
> > > ---
> > 
> > > On Tue, Aug 23, 2011 at 10:31:42PM -0400, Tom Lane wrote:
> > >> Sushant Sinha  writes:
> > >>> Doesn't this force the headline to be taken from the first N words of
> > >>> the document, independent of where the match was?  That seems rather
> > >>> unworkable, or at least unhelpful.
> > >> 
> > >>> In headline generation function, we don't have any index or knowledge of
> > >>> where the match is. We discover the matches by first tokenizing and then
> > >>> comparing the matches with the query tokens. So it is hard to do
> > >>> anything better than first N words.
> > >> 
> > >> After looking at the code in wparser_def.c a bit more, I wonder whether
> > >> this patch is doing what you think it is.  Did you do any profiling to
> > >> confirm that tokenization is where the cost is?  Because it looks to me
> > >> like the match searching in hlCover() is at least O(N^2) in the number
> > >> of tokens in the document, which means it's probably the dominant cost
> > >> for any long document.  I suspect that your patch helps not so much
> > >> because it saves tokenization costs as because it bounds the amount of
> > >> effort spent in hlCover().
> > >> 
> > >> I haven't tried to do anything about this, but I wonder whether it
> > >> wouldn't be possible to eliminate the quadratic blowup by saving more
> > >> state across the repeated calls to hlCover().  At the very least, it
> > >> shouldn't be necessary to find the last query-token occurrence in the
> > >> document from scratch on each and every call.
> > >> 
> > >> Actually, this code seems probably flat-out wrong: won't every
> > >> successful call of hlCover() on a given document return exactly the same
> > >> q value (end position), namely the last token occurrence in the
> > >> document?  How is that helpful?
> > >> 
> > >> regards, tom lane
> > >> 
> > >> -- 
> > >> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> > >> To make changes to your subscription:
> > >> http://www.postgresql.org/mailpref/pgsql-hackers
> > 
> > > -- 
> > >   Bruce Momjian  http://momjian.us
> > >   EnterpriseDB http://enterprisedb.com
> > 
> > >   + It's impossible for everything to be true. +
> > 
> > 
> > > -- 
> > > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> > > To make changes to your subscription:
> > > http://www.postgresql.org/mailpref/pgsql-hackers
> 
> 

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #6572: The example of SPI_execute is bogus

On Wed, Aug 29, 2012 at 01:13:51PM +, Rajeev rastogi wrote:
> 
> From: pgsql-bugs-ow...@postgresql.org [pgsql-bugs-ow...@postgresql.org] on 
> behalf of Bruce Momjian [br...@momjian.us]
> Sent: Wednesday, August 29, 2012 8:46 AM
> To: Tom Lane
> Cc: Robert Haas; Hitoshi Harada; pgsql-b...@postgresql.org; 
> pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] [BUGS] BUG #6572: The example of SPI_execute is bogus
> 
> On Sun, Apr 15, 2012 at 12:29:39PM -0400, Tom Lane wrote:
> > Robert Haas  writes:
> > > On Thu, Apr 5, 2012 at 2:39 AM, Hitoshi Harada  
> > > wrote:
> > >> On Wed, Apr 4, 2012 at 8:00 AM, Tom Lane  wrote:
> > >>> Given the lack of complaints since 9.0, maybe we should not fix this
> > >>> but just redefine the new behavior as being correct?  But it seems
> > >>> mighty inconsistent that the tuple limit would apply if you have
> > >>> RETURNING but not when you don't.  In any case, the ramifications
> > >>> are wider than one example in the SPI docs.
> >
> > >> To be honest, I was surprised when I found tcount parameter is said to
> > >> be applied to even INSERT.  I believe people think that parameter is
> > >> to limit memory consumption when returning tuples thus it'd be applied
> > >> for only SELECT or DML with RETURNING.  So I'm +1 for non-fix but
> > >> redefine the behavior.  Who wants to limit the number of rows
> > >> processed inside the backend, from SPI?
> >
> > > Yeah.
> >
> > Okay, apparently nobody cares about RETURNING behaving differently from
> > non-RETURNING, so the consensus is to redefine the current behavior as
> > correct.  That means what we need is to go through the docs and see what
> > places need to be updated (and, I guess, back-patch the changes to 9.0).
> > I will get to this if nobody else does, but not right away.
> 
> > Would someone make the doc change outlined above?  Thanks.
> 
> 
> I would like to work on this documentation bug.
> As per analysis I am planning to update following SPI function:
> 1. SPI_Execute: Here we will mention that argument count is used only for the 
> kind of command which returns result i.e. all kind of SELECT and DML with 
> returning clause. count is ignored for any other kind of commands. I will add 
> one example also to indicate the difference.
> 2. SPI_execute_plan_with_paramlist: Here we can give just reference to 
> SPI_execute i.e. I will mention that count has same interpretation as in 
> SPI_execute.
> 3. SPI_execp: Here we can give just reference to SPI_execute i.e. I will 
> mention that count has same interpretation as in SPI_execute.

Would someone please provide answers to these questions, or write a
patch?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-01-24 Thread Jeff Davis

On Wed, 2013-01-16 at 17:38 -0800, Jeff Davis wrote:
> New version of checksums patch.

And another new version of both patches.

Changes:
 * Rebased.
 * Rename SetBufferCommitInfoNeedsSave to MarkBufferDirtyHint. Now that
it's being used more places, it makes sense to give it a more generic
name.
 * My colleague, Yingjie He, noticed that the FSM doesn't write any WAL,
and therefore we must protect those operations against torn pages. That
seems simple enough: just use MarkBufferDirtyHint (formerly
SetBufferCommitInfoNeedsSave) instead of MarkBufferDirty. The FSM
changes are not critical, so the fact that we may lose the dirty bit is
OK.

Regards,
Jeff Davis


replace-tli-with-checksums-20130124.patch.gz
Description: GNU Zip compressed data


checksums-20130124.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BUG #6510: A simple prompt is displayed using wrong charset



On 01/24/2013 11:19 AM, Noah Misch wrote:

On Thu, Jan 24, 2013 at 08:50:36AM -0500, Andrew Dunstan wrote:

On 01/24/2013 03:42 AM, Craig Ringer wrote:

On 01/24/2013 01:06 AM, Alexander Law wrote:

Hello,
Please let me know if I can do something to get the bug fix
(https://commitfest.postgresql.org/action/patch_view?id=902)
committed.
I would like to fix other bugs related to postgres localization, but
I am not sure yet how to do it.

For anyone looking for the history, the 1st post on this topic is here:

http://www.postgresql.org/message-id/e1s3twb-0004oy...@wrigleys.postgresql.org


Yeah.

I'm happy enough with this patch. ISTM it's really a bug and should be
backpatched, no?

It is a bug, yes.  I'm neutral on whether to backpatch.



Well, that's what I did. :-)

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gistchoose vs. bloat

Heikki Linnakangas  writes:
> BTW, one thing that I wondered about this: How expensive is random()? 
> I'm assuming not very, but I don't really know. Alexander's patch called 
> random() for every tuple on the page, while I call it only once for each 
> equal-penalty tuple. If it's really cheap, I think my/Tom's patch could 
> be slightly simplified by always initializing keep_current_best with 
> random() at top of the function, and eliminating the -1 "haven't decided 
> yet" state.

I thought about that too, and concluded that random() is probably
expensive enough that we don't want to call it unnecessarily.

> OTOH, if random() is expensive, I note that we only need one 
> bit of randomness, so we could avoid calling random() so often if we 
> utilized all the bits from each random() call.

Meh.  That would hard-wire the decision that the probability of keeping
a best tuple is exactly 0.5.  I'd rather keep the flexibility to tune it
later.  The way your patch is set up, it seems unlikely that it will
call random() very many times per page, so I'm not that concerned about
minimizing the number of calls further.  (Also, in the probably-common
case that there are no exactly equally good alternatives, this would
actually be a net loss since it would add one unnecessary call.)

So basically, Alexander's method requires more random() calls and fewer
penalty() calls than yours, at least in typical cases.  It's hard to say
which is faster without benchmarking, and it would matter which opclass
you were testing on.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #6572: The example of SPI_execute is bogus

Bruce Momjian  writes:
> Would someone make the doc change outlined above?  Thanks.

Sorry, I'd nearly forgotten about this issue.  Will see about fixing the
docs.  (It looks like some of the comments in execMain.c could use work
too.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] autovacuum not prioritising for-wraparound tables

2013-01-24 Thread Alvaro Herrera

Hi,

I have a bug pending that autovacuum fails to give priority to
for-wraparound tables.  When xid consumption rate is high and dead tuple
creation is also high, it is possible that some tables are waiting for
for-wraparound vacuums that don't complete in time because the workers
are busy processing other tables that have accumulated dead tuples; the
system is then down because it's too near the Xid wraparound horizon.
Apparently this is particularly notorious in connection with TOAST
tables, because those are always put in the tables-to-process list after
regular tables.

(As far as I recall, this was already reported elsewhere, but so far I
have been unable to find the discussion in the archives.  Pointers
appreciated.)

So here's a small, backpatchable patch that sorts the list of tables to
process (not all that much tested yet).  Tables which have the
wraparound flag set are processed before those that are not.  Other
than this criterion, the order is not defined.

Now we could implement this differently, and maybe more simply (say by
keeping two lists of tables to process, one with for-wraparound tables
and one with the rest) but this way it is simpler to add additional
sorting criteria later: say within each category we could first process
smaller tables that have more dead tuples.

My intention is to clean this up and backpatch to all live branches.
Comments?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
*** a/src/backend/postmaster/autovacuum.c
--- b/src/backend/postmaster/autovacuum.c
***
*** 1890,1895  get_database_list(void)
--- 1890,1919 
  	return dblist;
  }
  
+ typedef struct vacuum_table
+ {
+ 	Oid		reloid;
+ 	bool	wraparound;
+ } vacuum_table;
+ 
+ /*
+  * qsort comparator.  Tables marked for wraparound sort first.
+  */
+ static int
+ vacpriority_comparator(const void *a, const void *b)
+ {
+ 	const vacuum_table *taba = *(vacuum_table *const *) a;
+ 	const vacuum_table *tabb = *(vacuum_table *const *) b;
+ 
+ 	if (taba->wraparound == tabb->wraparound)
+ 		return 0;
+ 
+ 	if (taba->wraparound)
+ 		return -1;
+ 	else
+ 		return 1;
+ }
+ 
  /*
   * Process a database table-by-table
   *
***
*** 1903,1917  do_autovacuum(void)
  	HeapTuple	tuple;
  	HeapScanDesc relScan;
  	Form_pg_database dbForm;
- 	List	   *table_oids = NIL;
  	HASHCTL		ctl;
  	HTAB	   *table_toast_map;
- 	ListCell   *volatile cell;
  	PgStat_StatDBEntry *shared;
  	PgStat_StatDBEntry *dbentry;
  	BufferAccessStrategy bstrategy;
  	ScanKeyData key;
  	TupleDesc	pg_class_desc;
  
  	/*
  	 * StartTransactionCommand and CommitTransactionCommand will automatically
--- 1927,1943 
  	HeapTuple	tuple;
  	HeapScanDesc relScan;
  	Form_pg_database dbForm;
  	HASHCTL		ctl;
  	HTAB	   *table_toast_map;
  	PgStat_StatDBEntry *shared;
  	PgStat_StatDBEntry *dbentry;
  	BufferAccessStrategy bstrategy;
  	ScanKeyData key;
  	TupleDesc	pg_class_desc;
+ 	vacuum_table **tables;
+ 	int			ntables;
+ 	int			maxtables;
+ 	int			i;
  
  	/*
  	 * StartTransactionCommand and CommitTransactionCommand will automatically
***
*** 1986,1991  do_autovacuum(void)
--- 2012,2022 
    &ctl,
    HASH_ELEM | HASH_FUNCTION);
  
+ 	/* initialize our tasklist */
+ 	ntables = 0;
+ 	maxtables = 32;
+ 	tables = palloc(maxtables * sizeof(vacuum_table *));
+ 
  	/*
  	 * Scan pg_class to determine which tables to vacuum.
  	 *
***
*** 2077,2085  do_autovacuum(void)
  		}
  		else
  		{
! 			/* relations that need work are added to table_oids */
  			if (dovacuum || doanalyze)
! table_oids = lappend_oid(table_oids, relid);
  
  			/*
  			 * Remember the association for the second pass.  Note: we must do
--- 2108,2133 
  		}
  		else
  		{
! 			/* relations that need work are added to our tasklist */
  			if (dovacuum || doanalyze)
! 			{
! vacuum_table *tab;
! 
! /* enlarge the array if necessary */
! if (ntables >= maxtables)
! {
! 	maxtables *= 2;
! 	tables = repalloc(tables,
! 	  maxtables * sizeof(vacuum_table *));
! }
! 
! tab = palloc(sizeof(vacuum_table));
! 
! tab->reloid = relid;
! tab->wraparound = wraparound;
! 
! tables[ntables++] = tab;
! 			}
  
  			/*
  			 * Remember the association for the second pass.  Note: we must do
***
*** 2162,2168  do_autovacuum(void)
  
  		/* ignore analyze for toast tables */
  		if (dovacuum)
! 			table_oids = lappend_oid(table_oids, relid);
  	}
  
  	heap_endscan(relScan);
--- 2210,2233 
  
  		/* ignore analyze for toast tables */
  		if (dovacuum)
! 		{
! 			vacuum_table *tab;
! 
! 			/* enlarge the array if necessary */
! 			if (ntables >= maxtables)
! 			{
! maxtables *= 2;
! tables = repalloc(tables,
!   maxtables * sizeof(vacuum_table *));
! 			}
! 
! 			tab = palloc(sizeof(vacuum_table));
! 
! 			tab->reloid = relid;
! 			tab->wraparound = wraparound;

Re: [HACKERS] COPY FREEZE has no warning

On Wed, Jan 23, 2013 at 02:02:46PM -0500, Bruce Momjian wrote:
> As a reminder, COPY FREEZE still does not issue any warning/notice if
> the freezing does not happen:
> 
>   Requests copying the data with rows already frozen, just as they
>   would be after running the VACUUM FREEZE command.
>   This is intended as a performance option for initial data loading.
>   Rows will be frozen only if the table being loaded has been created
>   in the current subtransaction, there are no cursors open and there
>   are no older snapshots held by this transaction. If those conditions
>   are not met the command will continue without error though will not
>   freeze rows. It is also possible in rare cases that the request
>   cannot be honoured for internal reasons, hence FREEZE
>   is more of a guideline than a hard rule.
> 
>   Note that all other sessions will immediately be able to see the data
>   once it has been successfully loaded. This violates the normal rules
>   of MVCC visibility and by specifying this option the user acknowledges
>   explicitly that this is understood.
> 
> Didn't we want to issue the user some kind of feedback?

As no one wanted to write this patch, I have developed the attached
version.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
new file mode 100644
index 8778e8b..c8c9821
*** a/src/backend/commands/copy.c
--- b/src/backend/commands/copy.c
*** CopyFrom(CopyState cstate)
*** 2011,2023 
  		 * where we can apply the optimization, so in those rare cases
  		 * where we cannot honour the request we do so silently.
  		 */
! 		if (cstate->freeze &&
! 			ThereAreNoPriorRegisteredSnapshots() &&
! 			ThereAreNoReadyPortals() &&
! 			(cstate->rel->rd_newRelfilenodeSubid == GetCurrentSubTransactionId() ||
! 			 cstate->rel->rd_createSubid == GetCurrentSubTransactionId()))
! 			hi_options |= HEAP_INSERT_FROZEN;
  	}
  
  	/*
  	 * We need a ResultRelInfo so we can use the regular executor's
--- 2011,2029 
  		 * where we can apply the optimization, so in those rare cases
  		 * where we cannot honour the request we do so silently.
  		 */
! 		if (cstate->freeze)
! 		{
! 			if (ThereAreNoPriorRegisteredSnapshots() &&
! ThereAreNoReadyPortals() &&
! (cstate->rel->rd_newRelfilenodeSubid == GetCurrentSubTransactionId() ||
!  cstate->rel->rd_createSubid == GetCurrentSubTransactionId()))
! hi_options |= HEAP_INSERT_FROZEN;
! 			else
! ereport(NOTICE, (errmsg("unable to honor \"freeze\" option")));
! 		}
  	}
+ 	else if (cstate->freeze)
+ 		ereport(NOTICE, (errmsg("unable to honor \"freeze\" option")));
  
  	/*
  	 * We need a ResultRelInfo so we can use the regular executor's
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
new file mode 100644
index 78c601f..126b3c7
*** a/src/test/regress/expected/copy2.out
--- b/src/test/regress/expected/copy2.out
*** SELECT * FROM vistest;
*** 334,345 
--- 334,347 
  COMMIT;
  TRUNCATE vistest;
  COPY vistest FROM stdin CSV FREEZE;
+ NOTICE:  unable to honor "freeze" option
  BEGIN;
  INSERT INTO vistest VALUES ('z');
  SAVEPOINT s1;
  TRUNCATE vistest;
  ROLLBACK TO SAVEPOINT s1;
  COPY vistest FROM stdin CSV FREEZE;
+ NOTICE:  unable to honor "freeze" option
  SELECT * FROM vistest;
   a  
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #6572: The example of SPI_execute is bogus