Re: [HACKERS] REVIEW proposal: a validator for configuration files

2011-09-29 Thread Alexander

On 09/10/2011 11:39 AM, Alexey Klyukin wrote:
Hi Andy,

On Sep 7, 2011, at 6:40 AM, Andy Colson wrote:

Hi Alexey, I was taking a quick look at this patch, and have a question for ya.

...

Where did the other warnings go?  Its right though, line 570 is bad.  It also 
seems to have killed the server.  I have not gotten through the history of 
messages regarding this patch, but is it supposed to kill the server if there 
is a syntax error in the config file?


Thank you for looking at this patch. v4 was more a "what if" concept that took 
a lot of time for somebody to look at. There were a lot of problems with it, but I think 
I've nailed down most of them.

Attached is v5. It should fix both problems you've experienced with v4. As with 
the current code, the startup process gets interrupted on any error detected in 
the configuration file. Unlike the current code, the patch tries to report all 
of them before bailing out. The behavior during configuration reload has 
changed significantly. Instead of ignoring all changes after the first error, 
the code  reports the problematic value and continues. It only skips applying 
new values completely on syntax errors and invalid configuration option names. 
In no cases  should it bring the server down during reload.

One problem I'm not sure how to address is the fact that we require 2 calls of 
set_config_option for each option, one to verify the new value and another to 
actually apply it. Normally, this function returns true for a valid value and 
false if it is unable to set the new value for whatever reason (either the 
value is invalid, or the value cannot be changed in the context of the caller). 
However, calling it once per value in the 'apply' mode during reload produces 
false for every unchanged value that can only be changed during startup (i.e. 
shared_buffers, or max_connections). If we ignore its return value, we'll lose 
the ability to detect whether invalid values were encountered during the reload 
and report this fact at the end of the function. I think the function should be 
changed, as described in my previous email 
(http://archives.postgresql.org/message-id/97A66029-9D3E-43AF-95AA-46FE1B447447(at)commandprompt(dot)com)
 and I'd like to hear other opinions on that. Meanwhile

, due t

o 2 calls to set_config_option, it currently reports all invalid values twice. 
If others will be opposed to changing the set_config_option, I'll fix this by 
removing the first, verification call and final 'errors were detected' warning 
to avoid 'false positives' on that (i.e. the WARNING you saw with the previous 
version for the valid .conf).

I'd appreciate your further comments and suggestions.

Thank you.

--
Alexey Klyukinhttp://www.commandprompt.com
The PostgreSQL Company – Command Prompt, Inc.


After a quick two minute test, this patch seems to work well.  It does just 
what I think it should.  I'll add it to the commitfest page for ya.

-Andy




Alexey,

I've included my review procedure outline and output below. Your patch 
behaves as you described in the email containing 'v5' and judging by 
Andy's earlier success, I think this is ready to go. I will mark this 
'Ready for Committer'.


Alexander

---

TESTING METHODOLOGY
===

* Start postgres with stock, valid postgresql.conf
* Record output
* Stop postgres
* Add invalid configuration option to postgresql.conf
* Start postgres
* Record output
* Stop postgres
* Add configuration option with bad syntax to postgresql.conf
* Start postgres
* Record output

SETUP
-

./configure --prefix=~/builds/pgsql/orig; make install
./configure --prefix=~/builds/pgsql/patched/guc-param-validator; make 
install


cd ~/builds/pgsql/orig; bin/initdb -D data; bin/postgres -D data
cd ~/builds/pgsql/patched/guc-param-validator; bin/initdb -D data; 
bin/postgres -D data


OUTPUT
--

orig:

LOG:  database system was shut down at 2011-09-28 20:26:17 PDT
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started

# alter postgresql.conf, add bogus param
# unknown_config_param = on
$ kill -9 `pgrep postgres | head -n1`; bin/postgres -D data

FATAL:  unrecognized configuration parameter "unknown_config_param"

# alter postgresql.conf, add param with bad syntax
# unknown_config_param @@= on
$ kill -9 `pgrep postgres | head -n1`; bin/postgres -D data

FATAL:  syntax error in file
"/home/av/builds/pgsql/orig/data/postgresql.conf" line 156, near token
"@"

patched:

LOG:  database system was shut down at 2011-09-28 20:12:39 PDT
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started

# alter postgresql.conf, add bogus param
# unknown_config_param = on
$ kill -9 `pgrep postgres | head -n1`; bin/postgres -D data

LOG:  unrecognized configuration parameter "unknown_config_param"
LOG

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-02-29 Thread Alexander Korotkov
On Sat, Feb 27, 2016 at 2:44 AM, Andres Freund  wrote:

> On 2016-02-02 13:12:50 +0300, Alexander Korotkov wrote:
> > On Tue, Feb 2, 2016 at 12:43 AM, Andres Freund 
> wrote:
> >
> > > On 2016-02-01 13:06:57 +0300, Alexander Korotkov wrote:
> > > > On Mon, Feb 1, 2016 at 11:34 AM, Alexander Korotkov <
> > > > a.korot...@postgrespro.ru> wrote:
> > > > >> ClientBasePatch
> > > > >> 11974419382
> > > > >> 8125923126395
> > > > >> 3231393151
> > > > >> 64387339496830
> > > > >> 128306412350610
> > > > >>
> > > > >> Shared Buffer= 512MB
> > > > >> max_connections=150
> > > > >> Scale Factor=300
> > > > >>
> > > > >> ./pgbench  -j$ -c$ -T300 -M prepared -S postgres
> > > > >>
> > > > >> ClientBasePatch
> > > > >> 11716916454
> > > > >> 8108547105559
> > > > >> 32241619262818
> > > > >> 64206868233606
> > > > >> 128137084217013
> > >
> > > So, there's a small regression on low client counts. That's worth
> > > addressing.
> > >
> >
> > Interesting. I'll try to reproduce it.
>
> Any progress here?
>

I didn't reproduce the regression. I had access to multicore machine but
didn't see either regression on low clients or improvements on high clients.
In the attached path spinlock delay was exposed in s_lock.h and used
in LockBufHdr().
Dilip, could you try this version of patch? Could you also run perf or
other profiler in the case of regression. It would be nice to compare
profiles with and without patch. We probably could find the cause of
regression.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


pinunpin-cas-3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-03-02 Thread Alexander Korotkov
On Tue, Mar 1, 2016 at 7:03 PM, Robert Haas  wrote:

> On Tue, Mar 1, 2016 at 10:37 AM, Bruce Momjian  wrote:
> > On Tue, Mar  1, 2016 at 10:19:45AM -0500, Robert Haas wrote:
> >> > Two reasons:
> >> > 1. There is no ideal implementation of DTM which will fit all
> possible needs
> >> > and be  efficient for all clusters.
> >>
> >> Hmm, what is the reasoning behind that statement?  I mean, it is
> >> certainly true that there are some places where we have decided that
> >> one-size-fits-all is not the right approach.  Indexing, for example.
> >
> > Uh, is that even true of indexing?  While the plug-in nature of indexing
> > allows for easier development and testing, does anyone create plug-in
> > indexing that isn't shipped by us?  I thought WAL support was something
> > that prevented external indexing solutions from working.
>
> True.  There is an API, though, and having pluggable WAL support seems
> desirable too.  At the same time, I don't think we know of anyone
> maintaining a non-core index AM ... and there are probably good
> reasons for that.


It's because we didn't offer legal mechanism for pluggable AMs.


> We end up revising the index AM API pretty
> regularly every time somebody wants to do something new, so it's not
> really a stable API that extensions can just tap into.


I can't buy this argument. One may say this about any single API. Thinking
so will lead you to rejecting any extendability. And that would be in
direct contradiction to original postgres concept.
During last 5 years we add 2 new AMs: SP-GiST and BRIN. And BRIN is very
different from any other AM we have before.
And I wouldn't say that AM API have dramatical changes during that time.
There were some changes. But it would be normal work for extension
maintainers to adopt these changes like they do for other API changes.

There is simple example where we lack of extensible AMs: fast full-text
search. We can't provide it with current GIN, because we lack of positional
information in it. And we can't push these advances into core because
current implementation has not perfect design. Ideal design would be push
all required functionality into btree, then make GIN wrapper over btree
then add required functionality. But this is roadmap for 5-10 years. These
5-10 years uses will suffer from having 3-rd party solutions for fast FTS
instead of in core one. But our design questions is actually not something
that users care about. It's not reliability questions. And having pluggable
AMs would be really chance in this situation. Users could use extension
right now. And then when after many years we finally implement the right
design, they could migrate to in-core solution. But 5-10 years of fast FTS
does matter.


> I suspect that
> a transaction manager API would end up similarly situated.
>

I disagree with you about AM API. But I agree that TM API should be in
similar situation to AM API.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] The plan for FDW-based sharding

2016-03-02 Thread Alexander Korotkov
On Tue, Mar 1, 2016 at 10:11 PM, Bruce Momjian  wrote:

> On Tue, Mar  1, 2016 at 02:02:44PM -0500, Bruce wrote:
> > On Tue, Mar  1, 2016 at 07:56:58PM +0100, Petr Jelinek wrote:
> > > Note that I am not saying that other discussed approaches are any
> > > better, I am saying that we should know approximately what we
> > > actually want and not just beat FDWs with a hammer and hope sharding
> > > will eventually emerge and call that the plan.
> >
> > I will say it again --- FDWs are the only sharding method I can think of
> > that has a chance of being accepted into Postgres core.  It is a plan,
> > and if it fails, it fails.  If is succeeds, that's good.  What more do
> > you want me to say?  I know of no other way to answer the questions you
> > asked above.
>
> I guess all I can say is that if FDWs existed when Postgres XC/XL were
> being developed, that they likely would have been used or at least
> considered.  I think we are basically making that attempt now.


If FDWs existed then Postgres XC/XL were being developed then I believe
they would try to build full-featured prototype of FDW based sharding. If
this prototype succeed then we could make a full roadmap.
For now, we don't have a full roadmap, we have only some pieces. This is
why people doubt. When you're speaking about advances that are natural to
FDW, then no problem, nobody is against FDW advances. However, other things
are unclear.
You can try to build full-featured prototype to convince people. Despite it
would take some resources it will save more resources because it would save
us from errors.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] The plan for FDW-based sharding

2016-03-02 Thread Alexander Korotkov
On Wed, Mar 2, 2016 at 9:53 PM, Josh berkus  wrote:

> On 02/24/2016 01:22 AM, Konstantin Knizhnik wrote:
>
>> Sorry, but based on this plan it is possible to make a conclusion that
>> there are only two possible cluster solutions for Postgres:
>> XC/XL and FDW-based.  From my point of view there are  much more
>> possible alternatives.
>>
>
> Definitely.
>
> Currently we have five approaches to sharding inside postgres in the
> field, in chronological order:
>
> 1. Greenplum's executor-based approach with motion nodes
>
> 2. Skype's function-based approach (PL/proxy)
>
> 3. XC/XL's approach, which I believe is also query executor-based
>
> 4. CitusDB's pg_shard which is based on query hooks
>
> 5. FDW-based (currently theoretical)
>
> One of the things which causes bad reactions and arguments, Bruce, is that
> a lot of your posts and presentations detailing plans for the FDW approach
> carry the subtext that all four of the other approaches are dead ends and
> not worth considering.  Given that the other approaches, whatever their
> limitations, have working code in the field and the FDW approach does not,
> that's more than a little offensive.
>
> If we want to move forwards on serious work on FDW-based sharding, the
> folks working on it should stop treating it as a "fait accompli" that this
> is the Chosen Way for the PostgreSQL project.  Otherwise, you'll spend all
> of your time arguing that point instead of working on features that matter.
>
> Bruce made a long comparison with built-in replication, but there's a big
> difference here.  We decided that WAL-based replication was the way to go
> for built-in as a community decision here on -hackers and at various
> conferences.  Both the plan and the implementation for replication
> transcended company backing, involving even active competitors, and
> involved discussions with maintainers of the older replication projects.
>
> In contrast, this FDW plan *still* feels very much like a small group made
> up of employees of only two companies came up with it in private and
> decided that it should be the plan for the whole project.  I know that
> Bruce and others have good reasons for starting the FDW project, but there
> hasn't been much of an attempt to obtain community consensus around it. If
> Bruce and others want contributors to work on FDWs instead of other
> sharding approaches, then they need to win over those people as to why they
> should do that.  It's how this community works.
>
> Alternately, you can just work on the individual FDW features, which
> *everyone* thinks are a good idea, and when most of them are done,
> FDW-based scaleout will be such an obvious solution that nobody will argue
> with it.


+1

Thank you, Josh. I think this is excellent summary for conversation about
FDW-based sharding.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] improving GROUP BY estimation

2016-03-03 Thread Alexander Korotkov
Hi, Tomas!

I've assigned to review this patch.

I've checked version estimate-num-groups-v2.txt by Mark Dilger.
It applies to head cleanly, passes corrected regression tests.

About correlated/uncorrelated cases. I think now optimizer mostly assumes
all distributions to be independent.
I think we should follow this assumption in this case also until we have
fundamentally better option (like your multivariate statistics).

@@ -3438,9 +3438,9 @@ estimate_num_groups(PlannerInfo *root, List
*groupExprs, double input_rows,
  reldistinct = clamp;

  /*
- * Multiply by restriction selectivity.
+ * Estimate number of distinct values expected in given number of rows.
  */
- reldistinct *= rel->rows / rel->tuples;
+ reldistinct *= (1 - powl((reldistinct - 1) / reldistinct, rel->rows));

  /*
  * Update estimate of total distinct groups.

I think we need way more explanation in comments here (possibly with
external links). For now, it looks like formula which appears from nowhere.

Also, note that rel->tuples gone away from formula parameters. So, we
actually don't care about how may tuples are in the table. This is because
this formula assumes that same tuple could be selected multiple times. For
low numbers this can lead to some errors. Consider selecting 2 from 3
distinct tuples. If you can't select same tuple twice then you always
selecting 2 distinct tuples. But according to formula you will select 5/3
in average. I think this error in small numbers is negligible, especially
because getting rid of this error would require way more computations. But
it worth mentioning in comments though.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] improving GROUP BY estimation

2016-03-03 Thread Alexander Korotkov
On Thu, Mar 3, 2016 at 7:35 PM, Tomas Vondra 
wrote:

> On 03/03/2016 12:53 PM, Alexander Korotkov wrote:
>
>> I've assigned to review this patch.
>>
>> I've checked version estimate-num-groups-v2.txt by Mark Dilger.
>> It applies to head cleanly, passes corrected regression tests.
>>
>> About correlated/uncorrelated cases. I think now optimizer mostly assumes
>> all distributions to be independent. I think we should follow this
>> assumption in this case also until we have fundamentally better option
>> (like
>> your multivariate statistics).
>>
>> @@ -3438,9 +3438,9 @@ estimate_num_groups(PlannerInfo *root, List
>> *groupExprs,
>> double input_rows,
>> reldistinct = clamp;
>>
>> /*
>> -* Multiply by restriction selectivity.
>> +* Estimate number of distinct values expected in given number of rows.
>> */
>> -reldistinct *= rel->rows / rel->tuples;
>> +reldistinct *= (1 - powl((reldistinct - 1) / reldistinct, rel->rows));
>>
>> /*
>> * Update estimate of total distinct groups.
>>
>> I think we need way more explanation in comments here (possibly with
>> external links). For now, it looks like formula which appears from
>> nowhere.
>>
>
> I thought the details (particularly the link to stackexchange, discussing
> the formula) would be enough, but let me elaborate.
>
> The current formula
>
> reldistinct *= rel->rows / rel->tuples;
>
> simply multiplies the ndistinct estimate with selectivity. So when you
> select 1% of the table, the estimate says we'll see 1% of ndistinct values.
> But clearly that's naive, because for example when you have 10k distinct
> values and you select 10M rows, you should expect nearly all of them in the
> sample.
>
> And that's what the formula does - it gives you the expected number of
> distinct values, when selecting 'k' values from 'd' distinct values with
> replacements:
>
> count(k, d) = d * (1 - ((d-1)/d) ^ k)
>
> It's assuming the distinct values are equally probable (uniform
> distribution) and that the probabilities do not change (that's the "with
> replacement").
>
> I guess we could relax those assumptions and for example use the MCV
> statistics to further improve the estimate, and also relax the 'with
> replacement' but that'd make the formula way more complicated.
>
> [1]
> http://math.stackexchange.com/questions/72223/finding-expected-number-of-distinct-values-selected-from-a-set-of-integers


Your explanation in the first mail was good enough. Not it's even better.
But actually I mean that we need to include some brief explanation into
source code (either comments or readme). It would be good if one who want
to understand this could do without searching mailing list archives or git
history.


>> Also, note that rel->tuples gone away from formula parameters. So, we
>> actually don't care about how may tuples are in the table. This is because
>> this formula assumes that same tuple could be selected multiple times. For
>> low numbers this can lead to some errors. Consider selecting 2 from 3
>> distinct tuples. If you can't select same tuple twice then you always
>> selecting 2 distinct tuples. But according to formula you will select 5/3
>> in
>> average. I think this error in small numbers is negligible, especially
>> because getting rid of this error would require way more computations. But
>> it worth mentioning in comments though.
>>
>
> Well, yeah. That's the consequence of 'with replacement' assumption.
>
> I guess we could handle this somehow, though. For example we can compute
> the minimum possible number of distinct values like this:
>
> average_rows_per_value = (tuples / ndistinct);
> min_estimate = ceil(rows / average_rows_per_value);
>
> and use that as a minimum for the estimate. In the example you've given
> this would say
>
> average_rows_per_value = (3 / 3) = 1;
> min_estimate = ceil(2 / 1) = 2;
>
> So it serves as a correction for the 'with replacement' assumption. With
> only 2 distinct values we'd get this:
>
> average_rows_per_value = (3 / 2) = 1.5;
> min_estimate = ceil(2 / 1.5) = 2;
>
> Of course, it's just a heuristics, so this may fail in some other cases.


I'm not sure we actually need it. Even in worst case error doesn't seems to
be very big. But feel free to add this heuristics, it looks OK for me.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] improving GROUP BY estimation

2016-03-03 Thread Alexander Korotkov
On Thu, Mar 3, 2016 at 10:16 PM, Tomas Vondra 
wrote:

> So yes, each estimator works great for exactly the opposite cases. But
> notice that typically, the results of the new formula is much higher than
> the old one, sometimes by two orders of magnitude (and it shouldn't be
> difficult to construct examples of much higher differences).
>
> The table also includes the 'average' estimator you propose, but it's
> rather obvious that the result is always much closer to the new value,
> simply because
>
>(small number) + (huge number)
>--
>   2
>
> is always much closer to the huge number. We're usually quite happy when
> the estimates are within the same order of magnitude, so whether it's K or
> K/2 makes pretty much no difference.


I believe that Mark means geometrical average, i.e. sqrt((small number) *
(huge number)).

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-03-04 Thread Alexander Korotkov
On Fri, Mar 4, 2016 at 7:05 AM, Amit Kapila  wrote:

> > I think the wait event types should be documented - and the wait
> > events too, perhaps.
> >
>
> As discussed upthread, I have added documentation for all the possible
> wait events and an example.  Some of the LWLocks like AsyncQueueLock and
> AsyncCtlLock are used for quite similar purpose, so I have kept there
> explanation as same.
>

Do you think it worth grouping rows in "wait_event Description" table by
wait event type? It would be nice if user a priori knows which wait event
have which type. Readability of this large table will be also improved.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-03-04 Thread Alexander Korotkov
On Fri, Mar 4, 2016 at 4:20 PM, Amit Kapila  wrote:

> On Fri, Mar 4, 2016 at 4:01 PM, Alexander Korotkov 
> wrote:
>
>> On Fri, Mar 4, 2016 at 7:05 AM, Amit Kapila 
>> wrote:
>>
>>> > I think the wait event types should be documented - and the wait
>>> > events too, perhaps.
>>> >
>>>
>>> As discussed upthread, I have added documentation for all the possible
>>> wait events and an example.  Some of the LWLocks like AsyncQueueLock and
>>> AsyncCtlLock are used for quite similar purpose, so I have kept there
>>> explanation as same.
>>>
>>
>> Do you think it worth grouping rows in "wait_event Description" table by
>> wait event type?
>>
>
> They are already grouped (implicitly), do you mean to say that we should
> add wait event type name as well in that table?
>

Yes.


> If yes, then the only slight worry is that there will lot of repetition in
> wait_event_type column, otherwise it is okay.
>

There is morerows attribute of entry tag in Docbook SGML, it behaves like
rowspan in HTML.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] pg_resetxlog reference page reorganization

2016-03-04 Thread Alexander Korotkov
Hi, Peter!

I've assigned myself as reviewer of this patch.

On Tue, Mar 1, 2016 at 2:53 AM, Peter Eisentraut  wrote:

> The pg_resetxlog reference page has grown over the years into an
> unnavigable jungle, so here is a patch that reorganizes it to be more in
> the style of the other ref pages, with a normal options list.
>

Patch applies cleanly on head, documentation compiles with no problem.
pg_resetxlog page definitely looks much better than it was before.
I don't see any problems or issues with this patch.
So, I mark it "Ready for committer".

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] syslog configurable line splitting behavior

2016-03-04 Thread Alexander Korotkov
On Sat, Feb 27, 2016 at 6:49 AM, Peter Eisentraut  wrote:

> Writing log messages to syslog caters to ancient syslog implementations
> in two ways:
>
> - sequence numbers
> - line splitting
>
> While these are arguably reasonable defaults, I would like a way to turn
> them off, because they get in the way of doing more interesting things
> with syslog (e.g., logging somewhere that is not just a text file).
>
> So I propose the two attached patches that introduce new configuration
> Boolean parameters syslog_sequence_numbers and syslog_split_lines that
> can toggle these behaviors.
>

Would it have any usage if we make PG_SYSLOG_LIMIT configurable (-1 for
disable) instead of introducing boolean?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] WIP: Upper planner pathification

2016-03-09 Thread Alexander Korotkov
Hi, Tom!

I have a question about Sort path. AFAICS this question wasn't mentioned in
the upthread discussion.
We're producing Sort plans in two ways: from explicit Sort paths, and from
other paths which implicitly assumes sorting (like MergeJoin path).
Would it be better to produce Sort plan only from explicit Sort path? Thus,
MergeJoin path would add explicit children Sort paths. That would be more
unified way.
I ask about this from point of view of my "Partial Sort" patch. The absence
of implicit sorts would help to make this patch more simple and clean.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] WIP: Upper planner pathification

2016-03-09 Thread Alexander Korotkov
On Wed, Mar 9, 2016 at 5:47 PM, Tom Lane  wrote:

> Alexander Korotkov  writes:
> > I have a question about Sort path. AFAICS this question wasn't mentioned
> in
> > the upthread discussion.
> > We're producing Sort plans in two ways: from explicit Sort paths, and
> from
> > other paths which implicitly assumes sorting (like MergeJoin path).
> > Would it be better to produce Sort plan only from explicit Sort path?
> Thus,
> > MergeJoin path would add explicit children Sort paths. That would be more
> > unified way.
>
> Meh.  I think the only real effect that would have is to make it slower to
> build MergeJoin paths (and in a typical join problem, we build a lot of
> those).  The entire point of the Path/Plan distinction is to postpone
> until createplan.c any work that's not necessary to arrive at a cost
> estimate.  So the way MergeJoinPath works now seems fine to me.
>
> > I ask about this from point of view of my "Partial Sort" patch. The
> absence
> > of implicit sorts would help to make this patch more simple and clean.
>
> There are other implicit sorts besides those.  Admittedly, the efficiency
> argument for not making the sort explicit in UniquePath or MergeAppendPath
> is a lot weaker than it is for MergeJoin, just because those are less
> common path types.  But getting rid of the behavior entirely would be
> a lot of work, and I'm not convinced it'd be an improvement.
>

Thank you for the explanation. I'll try to rebase "Partial Sort" leaving
this situation as is.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


[HACKERS] utils/misc: Simplify search of end of argv in save_ps_display_args()

2016-03-10 Thread Alexander Kuleshov
Hello,

Attached patch provides trivial simplification of the search of end of
the argv[] area by replacing for() loop with calculation based on
argc.
diff --git a/src/backend/utils/misc/ps_status.c b/src/backend/utils/misc/ps_status.c
index 892a810..d8f2105 100644
--- a/src/backend/utils/misc/ps_status.c
+++ b/src/backend/utils/misc/ps_status.c
@@ -134,11 +134,8 @@ save_ps_display_args(int argc, char **argv)
 		/*
 		 * check for contiguous argv strings
 		 */
-		for (i = 0; i < argc; i++)
-		{
-			if (i == 0 || end_of_area + 1 == argv[i])
-end_of_area = argv[i] + strlen(argv[i]);
-		}
+		if (argc)
+			end_of_area = argv[argc - 1] + strlen(argv[argc - 1]);
 
 		if (end_of_area == NULL)	/* probably can't happen? */
 		{

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-10 Thread Alexander Korotkov
On Mon, Mar 7, 2016 at 6:19 PM, Robert Haas  wrote:

> On Sat, Mar 5, 2016 at 7:22 AM, Dilip Kumar  wrote:
> > On Wed, Mar 2, 2016 at 11:05 AM, Dilip Kumar 
> wrote:
> >> And this latest result (no regression) is on X86 but on my local
> machine.
> >>
> >> I did not exactly saw what this new version of patch is doing different,
> >> so I will test this version in other machines also and see the results.
> >
> >
> > I tested this on PPC again, This time in various order (sometime patch
> first
> > and then base first).
> >  I tested with latest patch pinunpin-cas-2.patch on Power8.
> >
> > Shared Buffer = 8GB
> > ./pgbench  -j$ -c$ -T300 -M prepared -S postgres
> >
> > BASE
> > -
> > Clientsrun1run2run3
> > 1   212001875420537
> > 2   403313952038746
> >
> >
> > Patch
> > -
> > Clientsrun1run2run3
> > 1   202251980619778
> > 2   398304189836620
> >
> > I think, here we can not see any regression, (If I take median then it
> may
> > looks low with patch so posting all 3 reading).
>
> If the median looks low, how is that not a regression?
>

I don't think we can rely on median that much if we have only 3 runs.
For 3 runs we can only apply Kornfeld method which claims that confidence
interval should be between lower and upper values.
Since confidence intervals for master and patched versions are overlapping
we can't conclude that expected TPS numbers are different.
Dilip, could you do more runs? 10, for example. Using such statistics we
would be able to conclude something.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Background Processes and reporting

2016-03-11 Thread Alexander Korotkov
On Sat, Mar 12, 2016 at 12:22 AM, Andres Freund  wrote:

> On 2016-03-11 23:53:15 +0300, Vladimir Borodin wrote:
> > It was many times stated in threads about waits monitoring [0, 1, 2]
> > and supported by different people, but ultimately waits information
> > was stored in PgBackendStatus.
>
> Only that it isn't. It's stored in PGPROC.


Yes. And this is good. Original concept of single byte PgBackendStatus of
Robert looks incompatible with any further development of wait monitoring.


> This criticism is true of
> the progress reporting patch, but a quick scan of the thread doesn't
> show authors of the wait events patch participating there.
>
>
> > Can’t we think one more time about implementation provided by Ildus
> > and Alexander here [3]?
>
> I don't think so. Afaics the proposed patch tried to do too many things
> at once, and it's authors didn't listen well to criticism.


Original patch really did too many things at once.  But it was good as
prototype.
We should move forward by splitting it into many smaller parts.  But we
didn't because disagreement about design.

Idea of individual time measurement of every wait event met criticism
because it might have high overhead [1].  This is really so at least for
Windows [2]. But accessing only current values wouldn't be very useful.  We
anyway need to gather some statistics.  Gathering it by sampling would be
both more expensive and less accurate for majority of systems.  This is why
I proposed hooks to make possible platform dependent extensions.  Robert
rejects hook because he is "not a big fan of hooks as a way of resolving
disagreements about the design" [3].  Besides that is actually not design
issues but platform issues...

Another question is wait parameters.  We want to expose wait event with
some parameters.  Robert rejects that because it *might* add additional
overhead [3]. When I proposed to fit something useful into hard-won
4-bytes, Roberts claims that it is "too clever" [4].

So, situation looks like dead-end.  I have no idea how to convince Robert
about any kind of advanced functionality of wait monitoring to PostgreSQL.
I'm thinking about implementing sampling extension over current
infrastructure just to make community see that it sucks. Andres, it would
be very nice if you have any idea how to move this situation forward.

Another aspect is that EnterpriseDB offers waits monitoring in proprietary
fork [5].

> SQL Session/System wait tuning diagnostics
> The Dynamic Runtime Instrumentation Tools Architecture (DRITA) allows a
> DBA to query catalog views to determine the wait events that affect the
> performance of individual sessions or the system as a whole. DRITA records
> the *number of times each event occurs as well as the time spent waiting*;
> you can use this information to diagnose performance problems.

And they declare to measure time of individual wait events.  This is
exactly thing which Robert resist so much.  So, I suspect some kind of
hidden motivation here.  Probably, EBD guys just don't want to loose
majority of their proprietary product over open source PostgreSQL.

1.
http://www.postgresql.org/message-id/ca+tgmobdc4fbfpa49j6ok-ikazyvhw71vssotpvmf4tdyzr...@mail.gmail.com
2.
http://www.postgresql.org/message-id/capphfdvflevfetzgjprnrexfliyrksgkdu98xo5xx0hw5gu...@mail.gmail.com
3.
http://www.postgresql.org/message-id/ca+tgmoz-8zpoum9bgtbup1u4duqhc-9epedlzyk0dg4pkmd...@mail.gmail.com
4.
http://www.postgresql.org/message-id/ca+tgmobyf2vchghvoks7yrxa2wde+wmmrmxyughvx_w-0wx...@mail.gmail.com
5.
http://www.enterprisedb.com/products-services-training/products/postgres-plus-advanced-server?quicktabs_advanceservertab=3#quicktabs-advanceservertab

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


[HACKERS] [PATCH] Use MemoryContextAlloc() in the MemoryContextAllocZero() and MemoryContextAllocZeroAligned()

2016-03-11 Thread Alexander Kuleshov
Hello all,

Attached patch simplifies the MemoryContextAllocZero() and
MemoryContextAllocZeroAligned().
The MemoryContextAllocZero() and MemoryContextAllocZeroAligned()
functions does almost the
same that MemoryContextAlloc() does. Additionally these functions
fills allocated memory context
with zeros via MemSetAligned() and MemSetLoop(). Let's call
MemoryContextAlloc() in these functions
instead of setting isReset to false, call alloc() callback of the
context and etc., to prevent code duplication.
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 2bfd364..2845089 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -788,26 +788,7 @@ MemoryContextAllocZero(MemoryContext context, Size size)
 {
 	void	   *ret;
 
-	AssertArg(MemoryContextIsValid(context));
-	AssertNotInCriticalSection(context);
-
-	if (!AllocSizeIsValid(size))
-		elog(ERROR, "invalid memory alloc request size %zu", size);
-
-	context->isReset = false;
-
-	ret = (*context->methods->alloc) (context, size);
-	if (ret == NULL)
-	{
-		MemoryContextStats(TopMemoryContext);
-		ereport(ERROR,
-(errcode(ERRCODE_OUT_OF_MEMORY),
- errmsg("out of memory"),
- errdetail("Failed on request of size %zu.", size)));
-	}
-
-	VALGRIND_MEMPOOL_ALLOC(context, ret, size);
-
+	ret = MemoryContextAlloc(context, size);
 	MemSetAligned(ret, 0, size);
 
 	return ret;
@@ -825,26 +806,7 @@ MemoryContextAllocZeroAligned(MemoryContext context, Size size)
 {
 	void	   *ret;
 
-	AssertArg(MemoryContextIsValid(context));
-	AssertNotInCriticalSection(context);
-
-	if (!AllocSizeIsValid(size))
-		elog(ERROR, "invalid memory alloc request size %zu", size);
-
-	context->isReset = false;
-
-	ret = (*context->methods->alloc) (context, size);
-	if (ret == NULL)
-	{
-		MemoryContextStats(TopMemoryContext);
-		ereport(ERROR,
-(errcode(ERRCODE_OUT_OF_MEMORY),
- errmsg("out of memory"),
- errdetail("Failed on request of size %zu.", size)));
-	}
-
-	VALGRIND_MEMPOOL_ALLOC(context, ret, size);
-
+	ret = MemoryContextAlloc(context, size);
 	MemSetLoop(ret, 0, size);
 
 	return ret;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Background Processes and reporting

2016-03-12 Thread Alexander Korotkov
On Sat, Mar 12, 2016 at 2:45 AM, Andres Freund  wrote:

> On 2016-03-12 02:24:33 +0300, Alexander Korotkov wrote:
> > Idea of individual time measurement of every wait event met criticism
> > because it might have high overhead [1].
>
> Right. And that's actually one of the point which I meant with "didn't
> listen to criticism". There've been a lot of examples, on an off list,
> where taking timings trigger significant slowdowns.  Yes, in some
> bare-metal environments, which a coherent tsc, the overhead can be
> low. But that doesn't make it ok to have a high overhead on a lot of
> other systems.
>
> Just claiming that that's not a problem will only lead to your position
> not being taken serious.
>
>
> > This is really so at least for Windows [2].
>
> Measuring timing overhead for a simplistic workload on a single system
> doesn't mean that.  Try doing such a test on a vmware esx virtualized
> windows machine, on a multi-socket server; in a lot of instances you'll
> see two-three orders of magnitude longer average times; with peaks going
> into 4-5 orders of magnitude.  And, as sad it is, realistically most
> postgres instances will run in virtualized environments.
>
>
> > But accessing only current values wouldn't be very useful.  We
> > anyway need to gather some statistics.  Gathering it by sampling would be
> > both more expensive and less accurate for majority of systems.  This is
> why
> > I proposed hooks to make possible platform dependent extensions.  Robert
> > rejects hook because he is "not a big fan of hooks as a way of resolving
> > disagreements about the design" [3].
>
> I think I agree with Robert here. Providing hooks into very low level
> places tends to lead to problems in my experience; tight control over
> what happens is often important - I certainly don't want any external
> code to run while we're waiting for an lwlock.
>

So, I get following.

1) Detailed wait monitoring might cause high overhead on some systems.
2) We want wait monitoring to be always on. And we don't want options to
enable additional features of wait monitoring.
3) We don't want hook of wait events to be exposed.

Can I conclude that we reject detailed wait monitoring by design?
If it's so and not only Robert thinks so, then let's just admit it and add
it to FAQ and etc.

> Besides that is actually not design issues but platform issues...
>
> I don't see how that's the case.
>
>
> > Another question is wait parameters.  We want to expose wait event with
> > some parameters.  Robert rejects that because it *might* add additional
> > overhead [3]. When I proposed to fit something useful into hard-won
> > 4-bytes, Roberts claims that it is "too clever" [4].
>
> I think stopping to treat this as "Robert/EDB vs. pgpro" would be a good
> first step to make progress here.
>
>
> It seems entirely possible to extend the current API in an incremental
> fashion, either allowing to disable the individual pieces, or providing
> sufficient measurements that it's not needed.
>
>
> > So, situation looks like dead-end.  I have no idea how to convince Robert
> > about any kind of advanced functionality of wait monitoring to
> PostgreSQL.
> > I'm thinking about implementing sampling extension over current
> > infrastructure just to make community see that it sucks. Andres, it would
> > be very nice if you have any idea how to move this situation forward.
>
> I've had my share of conflicts with Robert. But if I were in his shoes,
> targeted by this kind of rhetoric, I'd be very tempted to just ignore
> any further arguments from the origin.  So I think the way forward is
> for everyone to cool off, and to see how we can incrementally make
> progress from here on.
>
>
> > Another aspect is that EnterpriseDB offers waits monitoring in
> proprietary
> > fork [5].
>

> So?


So, we'll end up with every company providing fork with detailed wait
monitoring. While community PostgreSQL resists from providing such
functionality.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] PoC: Partial sort

2016-03-13 Thread Alexander Korotkov
Hi!

Tom committed upper planner pathification patch.
Partial sort patch rebased to master is attached.  It was quite huge rebase
in planner part of the patch.  But I think now patch becomes better, much
more logical.
It's probably, something was missed after rebase.  I'm going to examine
this path more carefully next week.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


partial-sort-basic-7.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-13 Thread Alexander Korotkov
On Fri, Mar 11, 2016 at 7:08 AM, Dilip Kumar  wrote:

>
> On Thu, Mar 10, 2016 at 8:26 PM, Alexander Korotkov <
> a.korot...@postgrespro.ru> wrote:
>
>> I don't think we can rely on median that much if we have only 3 runs.
>> For 3 runs we can only apply Kornfeld method which claims that confidence
>> interval should be between lower and upper values.
>> Since confidence intervals for master and patched versions are
>> overlapping we can't conclude that expected TPS numbers are different.
>> Dilip, could you do more runs? 10, for example. Using such statistics we
>> would be able to conclude something.
>>
>
> Here is the reading for 10 runs
>
>
> Median Result
> ---
>
> Client   Base  Patch
> ---
> 1  1987319739
> 2  3865838276
> 4  6881262075
>
> Full Results of 10 runs...
>
>  Base
> -
>  Runs  1 Client2 Client  4 Client
> -
> 1194423486649023
> 2196053513951695
> 3197263710453899
> 4198353843955708
> 5198663863867473
> 6198803867970152
> 7199733872070920
> 8200483873771342
> 9200573881571403
> 10  203444142377953
> -
>
>
> Patch
> ---
> Runs  1 Client 2 Client  4 Client
> --
> 1188813016154928
> 2194153203159741
> 3195643502261060
> 4196273671261839
> 5196703765962011
> 6198083889462139
> 7198573908162983
> 8199103992375358
> 9201693993777481
> 10  201814000378462
> --
>

I've drawn graphs for these measurements. The variation doesn't look random
here.  TPS is going higher from measurement to measurement.  I bet you did
measurements sequentially.
I think we should do more measurements until TPS stop growing and beyond to
see accumulate average statistics.  Could you please do the same tests but
50 runs.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Background Processes and reporting

2016-03-15 Thread Alexander Korotkov
On Tue, Mar 15, 2016 at 12:57 AM, Robert Haas  wrote:

> On Mon, Mar 14, 2016 at 4:42 PM, Andres Freund  wrote:
> > On 2016-03-14 16:16:43 -0400, Robert Haas wrote:
> >> > I have already shown [0, 1] the overhead of measuring timings in
> linux on
> >> > representative workload. AFAIK, these tests were the only one that
> showed
> >> > any numbers. All other statements about terrible performance have
> been and
> >> > remain unconfirmed.
> >>
> >> Of course, those numbers are substantial regressions which would
> >> likely make it impractical to turn this on on a heavily-loaded
> >> production system.
> >
> > A lot of people operating production systems are fine with trading a <=
> > 10% impact for more insight into the system; especially if that
> > configuration can be changed without a restart.  I know a lot of systems
> > that use pg_stat_statements, track_io_timing = on, etc; just to get
> > that. In fact there's people running perf more or less continuously in
> > production environments; just to get more insight.
> >
> > I think it's important to get as much information out there without
> > performance overhead, so it can be enabled by default. But I don't think
> > it makes sense to not allow features in that cannot be enabled by
> > default, *if* we tried to make them cheap enough beforehand.
>
> Hmm, OK.  I would have expected you to be on the other side of this
> question, so maybe I'm all wet.  One point I am concerned about is
> that, right now, we have only a handful of types of wait events.  I'm
> very interested in seeing us add more, like I/O and client wait.  So
> any overhead we pay here is likely to eventually be paid in a lot of
> places - thus it had better be extremely small.
>

OK. Let's start to produce light, not heat.

As I get we have two features which we suspect to introduce overhead:
1) Recording parameters of wait events which requires some kind of
synchronization protocol.
2) Recording time of wait events because time measurements might be
expensive on some platforms.

Simultaneously there are machines and workloads where both of these
features doesn't produce measurable overhead.  And, we're talking not about
toy databases. Vladimir is DBA from Yandex which is in TOP-20 (by traffic)
internet companies in the world.  They do run both of this features in
production highload database without noticing any overhead of them.

It would be great progress, if we decide that we could add both of these
features controlled by GUC (off by default).

If we decide so, then let's start working on this. At first, we should
construct list of machines and workloads for testing. Each list of machines
and workloads would be not comprehensive. But let's find something that
would be enough for testing of GUC controlled, off by default features.
Then we can turn our conversation from theoretical thoughts to particular
benchmarks which would be objective and convincing to everybody.

Otherwise, let's just add these features to the list of unwanted
functionality and close this question.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Declarative partitioning

2016-03-19 Thread Alexander Korotkov
Hi, Amit!

I tried to apply your patch.  It still applies, but has some duplicate
oids.  After fixing duplicate oids, I've noticed following warning during
compilation by clang-700.1.81.

scan.c:10308:23: warning: unused variable 'yyg' [-Wunused-variable]
struct yyguts_t * yyg = (struct yyguts_t*)yyscanner; /* This var may be
unused depending upon options. */
  ^
tablecmds.c:12922:6: warning: variable 'is_subpart' is used uninitialized
whenever 'if' condition is false [-Wsometimes-uninitialized]
if (parent != NULL)
^~
tablecmds.c:12931:12: note: uninitialized use occurs here
partKey = is_subpart ? list_nth(rootParent->rd_partkeys,
parent_level) :
  ^~
tablecmds.c:12922:2: note: remove the 'if' if its condition is always true
if (parent != NULL)
^~~
tablecmds.c:12912:19: note: initialize the variable 'is_subpart' to silence
this warning
boolis_subpart;
  ^
   = '\0'
tablecmds.c:13375:37: warning: variable 'operoid' is uninitialized when
used here [-Wuninitialized]
comp_left_expr =
make_opclause(operoid, BOOLOID, false,

 ^~~
tablecmds.c:13326:17: note: initialize the variable 'operoid' to silence
this warning
Oid operoid;
   ^
= 0

Regression tests passed cleanly for me.  I also examined code a bit.  As I
get, for DML queries, declarative partitioning works like inheritance.  It
just provides alternative way for collecting append_rel_list.
We're working on the other side of partitioning problem.  Without solving
syntax problem, we're solving performance problems in pg_pathman extension:
https://github.com/postgrespro/pg_pathman.  We already have interesting
results which you can see in blog posts [1], [2], [3].

We already have fast algorithm for partition selection in optimizer [1] and
effective optimization of filter conditions [3]. And we're planning to
implement following features:

   - Execute time partitions selection (useful for nested loops and
   prepared statements);
   - Optimization of ordered output from patitioned tables (use Append
   instead of MergeAppend when possible);
   - Optimization of hash join when both tables are patitioned by join key.

9.6 feature freeze in coming, and we're planning our development resources
for 9.7. Besides providing an extension, we would like these features to
eventually arrive to core.  In order to achieve this we need to port out
functionality over your declarative syntax.  At first, we would need some
way for caching partition metadata suitable for fast partition selection.
For range partition it could be sorted array or RB-tree of partition
bounds.  When we have this infrastructure, we can start porting pieces of
pg_pathman functionality to declarative partitiong.

So, our draft plan of patches would be following:

   - Implement partition metadata cache suitable for fast partition
   selection.
   - Fast partition selection using metadata cache.
   - Optimization of filter conditions passed to partitions.
   - Execute time partitions selection (useful for nested loops and
   prepared statements);
   - Optimization of ordered output from patitioned tables (use Append
   instead of MergeAppend when possible);
   - Optimization of hash join when both tables are patitioned by join key.

I'd like to validate that this development plan doesn't overlaps with your
plans.  If out plans are not overlapping then let's accept this plan of
work for 9.7.

Links.

[1] http://www.postgrespro.com/blog/pgsql/pg_pathman
[2] http://akorotkov.github.io/blog/2016/03/04/pg_pathman-beta-release/
[3]
http://akorotkov.github.io/blog/2016/03/14/pg_pathman-condition-processing/

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


[HACKERS] initdb: introduce PG_CMD_PRINTF intestead of PG_CMD_PRINTF{1,2,3}

2016-03-19 Thread Alexander Kuleshov
Hello all,

The src/bin/initdb/initdb.c provides three macros to write data to
cmdfd. All of these macro do the same, but with different amount of
arguments for fprintf().

Attached patch introduces PG_CMD_PRINTF macro which will take set of
variadic arguments via __VA_ARGS__ to replace the PG_CMD_PRINTF{1,2,3}
macros.

Any objections?
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index ed3ba7b..cd0ea72 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -307,21 +307,9 @@ do { \
 		output_failed = true, output_errno = errno; \
 } while (0)
 
-#define PG_CMD_PRINTF1(fmt, arg1) \
+#define PG_CMD_PRINTF(fmt, ...) \
 do { \
-	if (fprintf(cmdfd, fmt, arg1) < 0 || fflush(cmdfd) < 0) \
-		output_failed = true, output_errno = errno; \
-} while (0)
-
-#define PG_CMD_PRINTF2(fmt, arg1, arg2) \
-do { \
-	if (fprintf(cmdfd, fmt, arg1, arg2) < 0 || fflush(cmdfd) < 0) \
-		output_failed = true, output_errno = errno; \
-} while (0)
-
-#define PG_CMD_PRINTF3(fmt, arg1, arg2, arg3)		\
-do { \
-	if (fprintf(cmdfd, fmt, arg1, arg2, arg3) < 0 || fflush(cmdfd) < 0) \
+	if (fprintf(cmdfd, fmt, __VA_ARGS__) < 0 || fflush(cmdfd) < 0) \
 		output_failed = true, output_errno = errno; \
 } while (0)
 
@@ -1604,8 +1592,8 @@ get_set_pwd(FILE *cmdfd)
 
 	}
 
-	PG_CMD_PRINTF2("ALTER USER \"%s\" WITH PASSWORD E'%s';\n\n",
-   username, escape_quotes(pwd1));
+	PG_CMD_PRINTF("ALTER USER \"%s\" WITH PASSWORD E'%s';\n\n",
+  username, escape_quotes(pwd1));
 
 	free(pwd1);
 }
@@ -1724,8 +1712,8 @@ setup_description(FILE *cmdfd)
 "	objsubid int4, "
 "	description text) WITHOUT OIDS;\n\n");
 
-	PG_CMD_PRINTF1("COPY tmp_pg_description FROM E'%s';\n\n",
-   escape_quotes(desc_file));
+	PG_CMD_PRINTF("COPY tmp_pg_description FROM E'%s';\n\n",
+  escape_quotes(desc_file));
 
 	PG_CMD_PUTS("INSERT INTO pg_description "
 " SELECT t.objoid, c.oid, t.objsubid, t.description "
@@ -1737,8 +1725,8 @@ setup_description(FILE *cmdfd)
 " classname name, "
 " description text) WITHOUT OIDS;\n\n");
 
-	PG_CMD_PRINTF1("COPY tmp_pg_shdescription FROM E'%s';\n\n",
-   escape_quotes(shdesc_file));
+	PG_CMD_PRINTF("COPY tmp_pg_shdescription FROM E'%s';\n\n",
+  escape_quotes(shdesc_file));
 
 	PG_CMD_PUTS("INSERT INTO pg_shdescription "
 " SELECT t.objoid, c.oid, t.description "
@@ -1881,8 +1869,8 @@ setup_collation(FILE *cmdfd)
 
 		quoted_locale = escape_quotes(localebuf);
 
-		PG_CMD_PRINTF3("INSERT INTO tmp_pg_collation VALUES (E'%s', E'%s', %d);\n\n",
-	   quoted_locale, quoted_locale, enc);
+		PG_CMD_PRINTF("INSERT INTO tmp_pg_collation VALUES (E'%s', E'%s', %d);\n\n",
+	  quoted_locale, quoted_locale, enc);
 
 		/*
 		 * Generate aliases such as "en_US" in addition to "en_US.utf8" for
@@ -1893,15 +1881,15 @@ setup_collation(FILE *cmdfd)
 		{
 			char	   *quoted_alias = escape_quotes(alias);
 
-			PG_CMD_PRINTF3("INSERT INTO tmp_pg_collation VALUES (E'%s', E'%s', %d);\n\n",
-		   quoted_alias, quoted_locale, enc);
+			PG_CMD_PRINTF("INSERT INTO tmp_pg_collation VALUES (E'%s', E'%s', %d);\n\n",
+		  quoted_alias, quoted_locale, enc);
 			free(quoted_alias);
 		}
 		free(quoted_locale);
 	}
 
 	/* Add an SQL-standard name */
-	PG_CMD_PRINTF1("INSERT INTO tmp_pg_collation VALUES ('ucs_basic', 'C', %d);\n\n", PG_UTF8);
+	PG_CMD_PRINTF("INSERT INTO tmp_pg_collation VALUES ('ucs_basic', 'C', %d);\n\n", PG_UTF8);
 
 	/*
 	 * When copying collations to the final location, eliminate aliases that
@@ -1914,6 +1902,7 @@ setup_collation(FILE *cmdfd)
 	 * Also, eliminate any aliases that conflict with pg_collation's
 	 * hard-wired entries for "C" etc.
 	 */
+
 	PG_CMD_PUTS("INSERT INTO pg_collation (collname, collnamespace, collowner, collencoding, collcollate, collctype) "
 " SELECT DISTINCT ON (collname, encoding)"
 "   collname, "
@@ -2058,16 +2047,16 @@ setup_schema(FILE *cmdfd)
 
 	free(lines);
 
-	PG_CMD_PRINTF1("UPDATE information_schema.sql_implementation_info "
-   "  SET character_value = '%s' "
-   "  WHERE implementation_info_name = 'DBMS VERSION';\n\n",
-   infoversion);
+	PG_CMD_PRINTF("UPDATE information_schema.sql_implementation_info "
+  "  SET character_value = '%s' "
+  "  WHERE implementation_info_name = 'DBMS VERSION';\n\n",
+  infoversion);
 
-	PG_CMD_PRINTF1("COPY information_schema.sql_features "
-   "  (feature_id, feature_name, sub_feature_id, "
-   "  sub_feature_name, is_supported, comments) "
-   " FROM E'%s';\n\n",
-   escape_quotes(features_file));
+	PG_CMD_PRINTF("COPY information_schema.sql_features "
+  "  (feature_id, feature_name, sub_feature_id, "
+  "  sub_feature_name, is_supported, comments) "
+  " FROM E'%s';\n\n",
+  escape_quotes(features_file));
 }
 
 /*

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning

2016-03-19 Thread Alexander Korotkov
On Wed, Mar 16, 2016 at 5:29 PM, Amit Langote 
wrote:

> > 9.6 feature freeze in coming, and we're planning our development
> resources
> > for 9.7. Besides providing an extension, we would like these features to
> > eventually arrive to core.  In order to achieve this we need to port out
> > functionality over your declarative syntax.  At first, we would need some
> > way for caching partition metadata suitable for fast partition selection.
> > For range partition it could be sorted array or RB-tree of partition
> bounds.
> > When we have this infrastructure, we can start porting pieces of
> pg_pathman
> > functionality to declarative partitiong.
>
> I had to think about the internal metadata representation (and its
> caching) when developing the tuple routing solution.  I am hopeful
> that it is suitable for other executor mechanisms we will build for
> partitioned tables.
>

Yes, it appears that I missed it.  You already have sorted array for range
partitioning and binary search implementation.  This is good.
I'm a bit worrying about list partitioning because in this case you scan
array sequentially.  We could optimize this case for better handling of
many list partitions. This is probably not most common case though.

> So, our draft plan of patches would be following:
> >
> > Implement partition metadata cache suitable for fast partition selection.
> > Fast partition selection using metadata cache.
> > Optimization of filter conditions passed to partitions.
> > Execute time partitions selection (useful for nested loops and prepared
> > statements);
> > Optimization of ordered output from patitioned tables (use Append
> instead of
> > MergeAppend when possible);
> > Optimization of hash join when both tables are patitioned by join key.
> >
> > I'd like to validate that this development plan doesn't overlaps with
> your
> > plans.  If out plans are not overlapping then let's accept this plan of
> work
> > for 9.7.
>
> It looks OK to me.  Thanks for sharing it.
>

Great! Let's work together.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-19 Thread Alexander Korotkov
On Sat, Mar 19, 2016 at 3:22 PM, Dilip Kumar  wrote:

>
> On Mon, Mar 14, 2016 at 3:09 AM, Alexander Korotkov <
> a.korot...@postgrespro.ru> wrote:
>
>> I've drawn graphs for these measurements. The variation doesn't look
>> random here.  TPS is going higher from measurement to measurement.  I bet
>> you did measurements sequentially.
>> I think we should do more measurements until TPS stop growing and beyond
>> to see accumulate average statistics.  Could you please do the same tests
>> but 50 runs.
>>
>
> I have taken reading at different client count (1,2 and 4)
>
> 1. Avg: Is taken average after discarding lowest 5 and highest 5 readings.
> 2. With 4 client I have only 30 reading.
>
> Summary:
>
> I think if we check avg or median atleast at 1 or 2 client head always
> looks winner, but same is not the case with 4 clients. And with 4 clients I
> can see much more fluctuation in the reading.
>
>
> Head(1 Client) Patch (1 Client)
> Head (2 Client) Patch (2 Client)
> Head (4 Client) Patch (4 Client)
> Avg: 19628 19578
> 37180 36536
> 70044 70731
> Median: 19663 19581
> 37967 37484
> 73003 75376
> Below is all the reading. (Arranged in sorted order)
>
> *Runs* *Head (1 Client)* *Patch (1 Client)*
> *Head (2 Client)* *Patch (2 Client)*
> *Head (4 Client)* *Patch (4 Client)*
> 1 18191 18153
> 29454 26128
> 49931 47210
> 2 18365 18768
> 31218 26956
> 53358 47287
> 3 19053 18769
> 31808 29286
> 53575 55458
> 4 19128 18915
> 31860 30558
> 54282 55794
> 5 19160 18948
> 32411 30945
> 56629 56253
> 6 19177 19055
> 32453 31316
> 57595 58147
> 7 19351 19232
> 32571 31703
> 59490 58543
> 8 19353 19239
> 33294 32029
> 63887 58990
> 9 19361 19255
> 33936 32217
> 64718 60233
> 10 19390 19297
> 34361 32767
> 65737 68210
> 11 19452 19368
> 34563 34487
> 65881 71409
> 12 19478 19387
> 36110 34907
> 67151 72108
> 13 19488 19388
> 36221 34936
> 70974 73977
> 14 19490 19395
> 36461 35068
> 72212 74891
> 15 19512 19406
> 36712 35298
> 73003 75376
> 16 19538 19450
> 37104 35492
> 74842 75468
> 17 19547 19487
> 37246 36648
> 75400 75515
> 18 19592 19521
> 37567 37263
> 75573 75626
> 19 19627 19536
> 37707 37430
> 75798 75745
> 20 19661 19556
> 37958 37461
> 75834 75868
> 21 19666 19607
> 37976 37507
> 76240 76161
> 22 19701 19624
> 38060 37557
> 76426 76162
> 23 19708 19643
> 38244 37717
> 76770 76333
> 24 19748 19684
> 38272 38285
> 77011 77009
> 25 19751 19694
> 38467 38294
> 77114 77168
> 26 19776 19695
> 38524 38469
> 77630 77318
> 27 19781 19709
> 38625 38642
> 77865 77550
> 28 19786 19765
> 38756 38643
> 77912 77904
> 29 19796 19823
> 38939 38649
> 78242 78736
> 30 19826 19847
> 39139 38659
>
>
> 31 19837 19899
> 39208 38713
>
>
> 32 19849 19909
> 39211 38837
>
>
> 33 19854 19932
> 39230 38876
>
>
> 34 19867 19949
> 39249 39088
>
>
> 35 19891 19990
> 39259 39148
>
>
> 36 20038 20085
> 39286 39453
>
>
> 37 20083 20128
> 39435 39563
>
>
> 38 20143 20166
> 39448 39959
>
>
> 39 20191 20198
> 39475 40495
>
>
> 40 20437 20455
> 40375 40664
>
>
>
So, I think it's really some regression here on small number clients.  I
have following hypothesis about that.  In some cases we use more atomic
operations than before. For instace, in BufferAlloc we have following block
of code.  Previous code deals with that without atomic operations relying
on spinlock.  So, we have 2 extra atomic operations here, and similar
situation in other places.

pg_atomic_fetch_and_u32(&buf->state, ~(BM_VALID | BM_DIRTY |
BM_JUST_DIRTIED | BM_CHECKPOINT_NEEDED | BM_IO_ERROR | BM_PERMANENT |
BUF_USAGECOUNT_MASK)); if (relpersistence == RELPERSISTENCE_PERMANENT)
pg_atomic_fetch_or_u32(&buf->state, BM_TAG_VALID | BM_PERMANENT |
BUF_USAGECOUNT_ONE); else pg_atomic_fetch_or_u32(&buf->state, BM_TAG_VALID
| BUF_USAGECOUNT_ONE);
Actually, we behave like old code and do such modifications without
increasing number of atomic operations.  We can just calculate new value of
state (including unset of BM_LOCKED flag) and write it to the buffer.  In
this case we don't require more atomic operations.  However, it becomes not
safe to use atomic decrement in UnpinBuffer().  We can use there loop of
CAS which wait buffer header to be unlocked like PinBuffer() does.  I'm not
sure if this affects multicore scalability, but I think it worth trying.

So, this idea is implemented in attached patch.  Please, try it for both
regression on lower number of clients and scalability on large number of
clients.

Other changes in this patch:
1) PinBuffer() and UnpinBuffer() use exponential backoff in spindealy
like LockBufHdr() does.
2) Previous patch contained revert
of ac1d7945f866b1928c2554c0f80fd52d7f92.  This was not intentional.
No, it doesn't contains this revert.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


pinunpin-cas-4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dealing with extension dependencies that aren't quite 'e'

2016-03-21 Thread Alexander Korotkov
On Mon, Mar 21, 2016 at 9:34 AM, Abhijit Menon-Sen 
wrote:

> At 2016-03-19 17:46:25 -0300, alvhe...@2ndquadrant.com wrote:
> >
> > I don't think the first patch is acceptable standalone -- we need both
> > things together.
>
> OK.
>
> > But in reality, pg_depend handling is mixed up with other changes all
> > over the place.
>
> Yes, I noticed that. :-/
>
> > Anyway I think this should be something along the lines of
> > ALTER FUNCTION foo() DEPENDS ON EXTENSION bar;
>
> OK. That's reasonable.
>
> > ALTER FUNCTION foo() OWNED BY EXTENSION bar;
>
> If the function is really OWNED BY EXTENSION, then the right way to
> declare it would seem to be ALTER EXTENSION … ADD FUNCTION. I prefer
> DEPENDS ON EXTENSION for this reason, there's no ambiguity about what
> we're declaring.
>

I'm not sure why we want to make new dependency type by ALTER FUNCTION
command, not ALTER EXTENSION?
Since, we already add 'e' dependencies by ALTER EXTENSION command, why it
should be different for 'x' dependencies.
The argument could be that 'x' dependency type would be used for other
objects not extensions.  But this is much more general problem and it's
unclear, that we would end up with this behaviour and this dependency type.

So, I would prefer this patch to extend ALTER EXTENSION command while it's
aimed to this particular problem.

I even think we could extend existent grammar rule

| ALTER EXTENSION name add_drop FUNCTION function_with_argtypes
> *** AlterExtensionContentsStmt:
> *** 3982,3987 
> --- 3987,3993 
> n->objtype = OBJECT_FUNCTION;
> n->objname = $6->funcname;
> n->objargs = $6->funcargs;
> +   n->deptype = 'e';
> $$ = (Node *)n;
> }


instead of adding another

+   | ALTER EXTENSION name add_drop DEPENDENT FUNCTION
> function_with_argtypes
> +   {
> +   AlterExtensionContentsStmt *n =
> makeNode(AlterExtensionContentsStmt);
> +   n->extname = $3;
> +   n->action = $4;
> +   n->objtype = OBJECT_FUNCTION;
> +   n->objname = $7->funcname;
> +   n->objargs = $7->funcargs;
> +   n->deptype = 'x';
> $$ = (Node *)n;
>     }


by introducing separate rule extension_dependency_type.

In the same way we could dependency type parameter to each ALTER EXTENSION
grammar rule.  Therefore, existent functionality would be extended in
natural way with not large changes in the code.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] improving GROUP BY estimation

2016-03-21 Thread Alexander Korotkov
Hi, Dean!

On Fri, Mar 18, 2016 at 1:20 PM, Dean Rasheed 
wrote:

> Probably a better URL to give is
> http://www.adellera.it/investigations/distinct_balls/ which has a link
> to the PDF version of the paper and also some supporting material.
>
> However, while that paper is in general very clear, I don't think it
> gives a very clear explanation of that particular formula, and it
> doesn't explain what it represents. It merely says that if "i" can be
> ignored "for some reason (e.g. i << Nr)", then that formula is an
> approximation to the exact "without replacement" formula, which is the
> subject of that paper.
>
> But actually, that formula is the exact formula for the expected
> number of distinct values when selecting *with replacement* from a
> collection where each of the distinct values occurs an equal number of
> times. So I think we should say that.
>
> Perhaps something along the lines of:
>
> /*
>  * Update the estimate based on the restriction selectivity.
>  *
>  * This uses the formula for the expected number of distinct
> values
>  * when selecting with replacement from a collection where
> each of
>  * the distinct values occurs an equal number of times (a
> uniform
>  * distribution of values). This is a very close approximation
> to
>  * the more correct method of selecting without replacement
> when
>  * the number of distinct values is quite large --- for
> example,
>  * see http://www.adellera.it/investigations/distinct_balls/.
>  * Additionally, it also turns out to work very well even when
> the
>  * number of distinct values is small.
>  */
>

+1
Thank you for work on this patch. The formula you propose and explanation
look great!

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] [PATCH] Phrase search ported to 9.6

2016-03-21 Thread Alexander Korotkov
Hi!

I see that patch changes existing regression tests in tsearch2.out.

*** a/contrib/tsearch2/expected/tsearch2.out
--- b/contrib/tsearch2/expected/tsearch2.out
*** SELECT '(!1|2)&3'::tsquery;
*** 278,292 
  (1 row)

  SELECT '1|(2|(4|(5|6)))'::tsquery;
!  tsquery
! -
!  '1' | ( '2' | ( '4' | ( '5' | '6' ) ) )
  (1 row)

  SELECT '1|2|4|5|6'::tsquery;
!  tsquery
! -
!  ( ( ( '1' | '2' ) | '4' ) | '5' ) | '6'
  (1 row)

  SELECT '1&(2&(4&(5&6)))'::tsquery;
--- 278,292 
  (1 row)

  SELECT '1|(2|(4|(5|6)))'::tsquery;
!tsquery
! -
!  '1' | '2' | '4' | '5' | '6'
  (1 row)

  SELECT '1|2|4|5|6'::tsquery;
!tsquery
! -
!  '1' | '2' | '4' | '5' | '6'
  (1 row)


This change looks like improvement, without braces tsquery readability is
much better.

*** select rewrite('moscow & hotel', 'select
*** 461,469 
  (1 row)

  select rewrite('bar &  new & qq & foo & york', 'select keyword, sample
from test_tsquery'::text );
!rewrite
!
-
!  'citi' & 'foo' & ( 'bar' | 'qq' ) & ( 'nyc' | ( 'big' & 'appl' | 'new' &
'york' ) )
  (1 row)

  select rewrite( ARRAY['moscow', keyword, sample] ) from test_tsquery;
--- 461,469 
  (1 row)

  select rewrite('bar &  new & qq & foo & york', 'select keyword, sample
from test_tsquery'::text );
!  rewrite
!
-
!  ( 'nyc' | 'big' & 'appl' | 'new' & 'york' ) & 'citi' & 'foo' & ( 'bar' |
'qq' )
  (1 row)

  select rewrite( ARRAY['moscow', keyword, sample] ) from test_tsquery;

However, such reorderings look unclear and need motivation.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] dealing with extension dependencies that aren't quite 'e'

2016-03-21 Thread Alexander Korotkov
On Mon, Mar 21, 2016 at 2:19 PM, Abhijit Menon-Sen 
wrote:

> At 2016-03-21 13:04:33 +0300, a.korot...@postgrespro.ru wrote:
> >
> > I'm not sure why we want to make new dependency type by ALTER FUNCTION
> > command, not ALTER EXTENSION?
>
> It's a matter of semantics. It means something very different than what
> an 'e' dependency means. The extension doesn't own the function (and so
> pg_dump shouldn't ignore it), but the function depends on the extension
> (and so dropping the extension should drop it).
>
> > The argument could be that 'x' dependency type would be used for other
> > objects not extensions.
>
> I suppose this is possible, but yes, I agree with you that it's not
> clear how or why this would be useful.
>
> > So, I would prefer this patch to extend ALTER EXTENSION command while
> > it's aimed to this particular problem.
>
> OK, so that's what the patch does, and it's certainly the simplest
> approach for reasons discussed earlier (though perhaps not as clear
> semantically as the ALTER FUNCTION approach). But:
>
> > I even think we could extend existent grammar rule
> >
> > | ALTER EXTENSION name add_drop FUNCTION
> function_with_argtypes
> > > *** AlterExtensionContentsStmt:
> > > *** 3982,3987 
> > > --- 3987,3993 
> > > n->objtype = OBJECT_FUNCTION;
> > > n->objname = $6->funcname;
> > > n->objargs = $6->funcargs;
> > > +   n->deptype = 'e';
> > > $$ = (Node *)n;
> > > }
>
> How exactly do you propose to do this, i.e., what would the final
> command to declare an 'x' dependency look like?
>

I'm proposed something like this.

extension_dependency_type:
DEPENDENT { $$ = 'x'; }
| /*EMPTY*/ { $$ = 'e'; }
;

...
| ALTER EXTENSION name add_drop extension_dependency_type  FUNCTION
function_with_argtypes
{
AlterExtensionContentsStmt *n = makeNode(AlterExtensionContentsStmt);
n->extname = $3;
n->action = $4;
n->objtype = OBJECT_FUNCTION;
n->objname = $7->funcname;
n->objargs = $7->funcargs;
n->deptype = $5;
$$ = (Node *)n;
}

I didn't try it.  Probably it causes a grammar conflicts.  In this case I
don't insist on it.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] WIP: Access method extendability

2016-03-22 Thread Alexander Korotkov
On Tue, Mar 22, 2016 at 1:26 AM, Alvaro Herrera 
wrote:

> So.  Before this version of the patch was posted in Nov 4th 2015, both
> Tom and Heikki had said essentially "CREATE ACCESS METHOD is worthless,
> let's pursue this stuff without those commands".
> http://www.postgresql.org/message-id/54730dfd.2060...@vmware.com (Nov
> 2014)
>

Yes, that's it. In this email Heikki asserts that INSERTs into pg_am tables
in extensions would survive pg_upgrade, because pg_dump will dump CREATE
EXTENSION command which execute extension script.
In my response I noticed that it's not correct.  pg_dump in binary upgrade
mode works so that we will miss records in pg_am.  I haven't any answer
here.
http://www.postgresql.org/message-id/capphfdsbkntlchypngr0ffu9wlhh__nukdp13qlqukgtnv_...@mail.gmail.com
And, as Petr Jelinek mentioned, there is also problem of dependencies in
DROP EXTENSION.


> http://www.postgresql.org/message-id/26822.1414516...@sss.pgh.pa.us (Oct
> 2014)
>

> And then Alexander posted this version, without any discussion that
> evidenced that those old objections were overridden.  What happened
> here?  Did somebody discuss this in unarchived meetings?  If so, it
> would be good to know about that in this thread.


After that Tom Lane committed very huge patch about rework AM API.  One of
aims of this patch was support for CREATE ACCESS METHOD command.
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=65c5fcd353a859da9e61bfb2b92a99f12937de3b
I've initially positioned this patch as refactoring for support CREATE
ACCESS METHOD command.
http://www.postgresql.org/message-id/capphfdtlisxmxk2b4tw+4+no_1-t0rao5foyszho6+sn2om...@mail.gmail.com
During discussing Tom mentioned future CREATE ACCESS METHOD command.
http://www.postgresql.org/message-id/16164.1439222...@sss.pgh.pa.us
I don't think Tom would put so much efforts in this direction if he would
remain opposed to this command.

I noticed this state of affairs because I started reading the complete
> thread in order to verify whether a consensus had been reached regarding
> the move of IndexQualInfo and GenericCosts to selfuncs.h.  But I only
> see that Jim Nasby mentioned it and Alexander said "yes they move";
> nothing else.  The reason for raising this is that, according to
> Alexander, new index AMs will want to use those structs; but current
> index AMs only include index_selfuncs.h, and none of them includes
> selfuncs.h, so it seems completely wrong to be importing all the planner
> knowledge embodied in selfuncs.h into extension index AMs.
>
> One observation is that the bloom AM patch (0003 of this series) uses
> GenericCosts but not IndexQualInfo.  But btcostestimate() and
> gincostestimate() both use IndexQualInfo, so other AMs might well want
> to use it too.
>
> We could move GenericCosts and the prototype for genericcostestimate()
> to index_selfuncs.h; that much is pretty reasonable.  But I'm not sure
> what to do about IndexQualInfo.  We could just add forward struct
> declarations for RestrictInfo and Node to index_selfuncs.h.  But then
> the extension code is going to have to import the includes were those
> are defined anyway.  Certainly it seems that deconstruct_indexquals()
> (which returns a list of IndexQualInfo) is going to be needed by
> extension index AMs, at least ...
>

Thank you for highlighting these issues.


> I've made a few edits to the patch already, but I need to do some more
> testing.


Did you already fix the issues above or do you need me to fix them?


> Particularly I would like to understand the relcache issues to
> figure out whether the current one is right.


Good point.  I'll also try to find relcache issues.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-22 Thread Alexander Korotkov
On Tue, Mar 22, 2016 at 7:57 AM, Dilip Kumar  wrote:

>
> On Tue, Mar 22, 2016 at 12:31 PM, Dilip Kumar 
> wrote:
>
>> ! pg_atomic_write_u32(&bufHdr->state, state);
>>   } while (!StartBufferIO(bufHdr, true));
>>
>> Better Write some comment, about we clearing the BM_LOCKED from stage
>> directly and need not to call UnlockBufHdr explicitly.
>> otherwise its confusing.
>>
>
> Few more comments..
>
> *** 828,837 
> */
>do
>{
> !  LockBufHdr(bufHdr);
> *!  Assert(bufHdr->flags & BM_VALID);*
> !  bufHdr->flags &= ~BM_VALID;
> !  UnlockBufHdr(bufHdr);
>} while (!StartBufferIO(bufHdr, true));
>}
>}
> --- 826,834 
> */
>do
>{
> !  uint32 state = LockBufHdr(bufHdr);
> !  state &= ~(BM_VALID | BM_LOCKED);
> !  pg_atomic_write_u32(&bufHdr->state, state);
>} while (!StartBufferIO(bufHdr, true));
>
> 1. Previously there was a Assert, any reason why we removed it ?
>  Assert(bufHdr->flags & BM_VALID);
>

It was missed.  In the attached patch I've put it back.

2. And also if we don't need assert then can't we directly clear BM_VALID
> flag from state variable (if not locked) like pin/unpin buffer does,
> without taking buffer header lock?
>

In this version of patch it could be also done as loop of CAS operation.
But I'm not intended to it so because it would significantly complicate
code.  It's not yes clear that traffic in this place is high enough to make
such optimizations.
Since v4 patch implements slightly different approach.  Could you please
test it?  We need to check that this approach worth putting more efforts on
it.  Or through it away otherwise.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


pinunpin-cas-5.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump dump catalog ACLs

2016-03-22 Thread Alexander Korotkov
Hi, Stephen!

I'm signed to review this patch. At first, patch wasn't applied cleanly, it
had a conflict at the end of system_views.sql. But that way very easy to
fix.

On Mon, Mar 14, 2016 at 7:00 PM, Stephen Frost  wrote:

> I've not included it in this patch, but my thinking here would be to add
> a check to the GRANT functions which checks 'if (creating_extension)'
> and records/updates the entries in pg_init_privs for those objects.
> This is similar to how we handle dependencies for extensions currently.
> That way, extensions don't have to do anything and if the user changes
> the permissions on an extension's object, the permissions for that
> object would then be dumped out.
>
> This still requires more testing, documentation, and I'll see about
> making it work completely for extensions also, but wanted to provide an
> update and solicit for review/comments.
>
> The patches have been broken up in what I hope is a logical way for
> those who wish to review/comment:
>
> #1 - Add the pg_init_privs catalog
> #2 - Change pg_dump to use a bitmap instead of boolean for 'dump'
> #3 - Split 'dump' into 'dump' and 'dump_contains'
> #4 - Make pg_dump include changed ACLs in dump of 'pg_catalog'
>

+  "CASE WHEN pip.initprivs IS NOT DISTINCT FROM n.nspacl "
+  "THEN false ELSE true END as dumpacl "

Why don't just write " pip.initprivs IS DISTINCT FROM n.nspacl AS dumpacl"?
I think there are few places whene CASE could be eliminated.

+ appendPQExpBuffer(query, "SELECT tableoid, oid, proname AS aggname, "
+  "pronamespace AS aggnamespace, "
+  "pronargs, proargtypes, "
+  "(%s proowner) AS rolname, "
+  "proacl AS aggacl "
+  "FROM pg_proc p "
+  "WHERE proisagg AND ("
+  "pronamespace != "
+  "(SELECT oid FROM pg_namespace "
+  "WHERE nspname = 'pg_catalog') OR "
+  "EXISTS (SELECT * FROM pg_init_privs pip "
+  "WHERE p.oid = pip.objoid AND pip.classoid = "
+  "(SELECT oid FROM pg_class "
+  "WHERE relname = 'pg_proc') "
+  "AND p.proacl <> pip.initprivs)",
+  username_subquery);

Why do we compare ACLs using <> funcs and using IS DISTINCT for other
objects?  Is it intentional?  Assuming most of proacl values are NULLs when
you change it, acl wouldn't be dumped since NULL <> value IS NULL.

In general, these checks using subqueries looks complicated for me, hard to
validate and needs a lot of testing.  Could we implement function "bool
need_acl_dump(oid objoid, oid classoid, int objsubid, proacl acl)" and call
it?


> #5 - Remove 'if (!superuser()) ereport()' calls and REVOKE EXECUTE
>

Other things in the patches looks good for me except they need tests and
documentation.
I'm marking this waiting for author.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] improving GROUP BY estimation

2016-03-22 Thread Alexander Korotkov
Hi, Tomas!

On Mon, Mar 21, 2016 at 2:39 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Fri, Mar 18, 2016 at 1:20 PM, Dean Rasheed 
> wrote:
>
>> Probably a better URL to give is
>> http://www.adellera.it/investigations/distinct_balls/ which has a link
>> to the PDF version of the paper and also some supporting material.
>>
>> However, while that paper is in general very clear, I don't think it
>> gives a very clear explanation of that particular formula, and it
>> doesn't explain what it represents. It merely says that if "i" can be
>> ignored "for some reason (e.g. i << Nr)", then that formula is an
>> approximation to the exact "without replacement" formula, which is the
>> subject of that paper.
>>
>> But actually, that formula is the exact formula for the expected
>> number of distinct values when selecting *with replacement* from a
>> collection where each of the distinct values occurs an equal number of
>> times. So I think we should say that.
>>
>> Perhaps something along the lines of:
>>
>> /*
>>  * Update the estimate based on the restriction selectivity.
>>  *
>>  * This uses the formula for the expected number of distinct
>> values
>>  * when selecting with replacement from a collection where
>> each of
>>  * the distinct values occurs an equal number of times (a
>> uniform
>>  * distribution of values). This is a very close
>> approximation to
>>  * the more correct method of selecting without replacement
>> when
>>  * the number of distinct values is quite large --- for
>> example,
>>  * see http://www.adellera.it/investigations/distinct_balls/.
>>  * Additionally, it also turns out to work very well even
>> when the
>>  * number of distinct values is small.
>>  */
>>
>
> +1
> Thank you for work on this patch. The formula you propose and explanation
> look great!
>

I think you should send a revision of patch including comments proposed by
Deam Rasheed.
I'm switching patch status to waiting on author in commitfest.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] WIP: Access method extendability

2016-03-23 Thread Alexander Korotkov
On Tue, Mar 22, 2016 at 11:53 PM, Alvaro Herrera 
wrote:

> Alexander Korotkov wrote:
> > On Tue, Mar 22, 2016 at 1:26 AM, Alvaro Herrera <
> alvhe...@2ndquadrant.com>
> > wrote:
>
> > > I noticed this state of affairs because I started reading the complete
> > > thread in order to verify whether a consensus had been reached
> regarding
> > > the move of IndexQualInfo and GenericCosts to selfuncs.h.  But I only
> > > see that Jim Nasby mentioned it and Alexander said "yes they move";
> > > nothing else.  The reason for raising this is that, according to
> > > Alexander, new index AMs will want to use those structs; but current
> > > index AMs only include index_selfuncs.h, and none of them includes
> > > selfuncs.h, so it seems completely wrong to be importing all the
> planner
> > > knowledge embodied in selfuncs.h into extension index AMs.
> > >
> > > One observation is that the bloom AM patch (0003 of this series) uses
> > > GenericCosts but not IndexQualInfo.  But btcostestimate() and
> > > gincostestimate() both use IndexQualInfo, so other AMs might well want
> > > to use it too.
> > >
> > > We could move GenericCosts and the prototype for genericcostestimate()
> > > to index_selfuncs.h; that much is pretty reasonable.  But I'm not sure
> > > what to do about IndexQualInfo.  We could just add forward struct
> > > declarations for RestrictInfo and Node to index_selfuncs.h.  But then
> > > the extension code is going to have to import the includes were those
> > > are defined anyway.  Certainly it seems that deconstruct_indexquals()
> > > (which returns a list of IndexQualInfo) is going to be needed by
> > > extension index AMs, at least ...
> >
> > Thank you for highlighting these issues.
>
> After thinking some more about this, I think it's all right to move both
> IndexQualInfo and GenericCosts to selfuncs.h.  The separation that we
> want is this: the selectivity functions themselves need access to a lot
> of planner knowledge, and so selfuncs.h knows a lot about planner.  But
> the code that _calls_ the selectivity function doesn't; and
> index_selfuncs.h is about the latter.  Properly architected extension
> index AMs are going to have their selectivity functions in a separate .c
> file which knows a lot about planner, and which includes selfuncs.h (and
> maybe index_selfuncs.h if it wants to piggyback on the existing
> selecticity functions, but that's unlikely); only the prototype for the
> selfunc is going to be needed elsewhere, and the rest of the index code
> is not going to know anything about planner stuff so it will not need to
> include selfuncs.h nor index_selfuncs.h.


Right, the purposes of headers are:
 * selfuncs.h – for those who going to estimate selectivity,
 * index_selfuncs.h – for those who going to call selectivity estimation
functions of built-in AMs.

>From this point for view we should keep both IndexQualInfo and GenericCosts
in selfuncs.h.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-03-24 Thread Alexander Korotkov
Hi!

Since, patch for exposing current wait event information in PGPROC was
committed, it becomes possible to collect wait event statistics using
sampling.  Despite I'm not fan of this approach, it is still useful and
definitely better than nothing.
In PostgresPro, we actually already had it.  Now, it's too late to include
something new to 9.6.  This is why I've rework it and publish at github as
an extension for 9.6: https://github.com/postgrespro/pg_wait_sampling/
Hopefully, it could be considered as contrib for 9.7.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

2016-03-24 Thread Alexander Korotkov
On Thu, Mar 24, 2016 at 5:17 PM, Robert Haas  wrote:

> On Fri, Mar 18, 2016 at 1:19 PM, Anastasia Lubennikova
>  wrote:
> > Please, find the new version of the patch attached. Now it has WAL
> > functionality.
> >
> > Detailed description of the feature you can find in README draft
> > https://goo.gl/50O8Q0
> >
> > This patch is pretty complicated, so I ask everyone, who interested in
> this
> > feature,
> > to help with reviewing and testing it. I will be grateful for any
> feedback.
> > But please, don't complain about code style, it is still work in
> progress.
> >
> > Next things I'm going to do:
> > 1. More debugging and testing. I'm going to attach in next message
> couple of
> > sql scripts for testing.
> > 2. Fix NULLs processing
> > 3. Add a flag into pg_index, that allows to enable/disable compression
> for
> > each particular index.
> > 4. Recheck locking considerations. I tried to write code as less
> invasive as
> > possible, but we need to make sure that algorithm is still correct.
> > 5. Change BTMaxItemSize
> > 6. Bring back microvacuum functionality.
>
> I really like this idea, and the performance results seem impressive,
> but I think we should push this out to 9.7.  A btree patch that didn't
> have WAL support until two and a half weeks into the final CommitFest
> just doesn't seem to me like a good candidate.  First, as a general
> matter, if a patch isn't code-complete at the start of a CommitFest,
> it's reasonable to say that it should be reviewed but not necessarily
> committed in that CommitFest.  This patch has had some review, but I'm
> not sure how deep that review is, and I think it's had no code review
> at all of the WAL logging changes, which were submitted only a week
> ago, well after the CF deadline.  Second, the btree AM is a
> particularly poor place to introduce possibly destabilizing changes.
> Everybody depends on it, all the time, for everything.  And despite
> new tools like amcheck, it's not a particularly easy thing to debug.
>

It's all true.  But:
1) It's a great feature many users dream about.
2) Patch is not very big.
3) Patch doesn't introduce significant infrastructural changes.  It just
change some well-isolated placed.

Let's give it a chance.  I've signed as additional reviewer and I'll do my
best in spotting all possible issues in this patch.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] WIP: Access method extendability

2016-03-25 Thread Alexander Korotkov
On Thu, Mar 24, 2016 at 6:19 PM, Alvaro Herrera 
wrote:

> .. Oh crap.  I just noticed I forgot to update a comment in pg_dump's
> getAccessMethods.  And we're missing psql tab-complete support for the
> new commands.


Attached patches fix both these issues.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


fix-pg_dump-comment.patch
Description: Binary data


fix-am-tab-complete.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-25 Thread Alexander Korotkov
On Tue, Mar 22, 2016 at 1:08 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Tue, Mar 22, 2016 at 7:57 AM, Dilip Kumar 
> wrote:
>
>>
>> On Tue, Mar 22, 2016 at 12:31 PM, Dilip Kumar 
>> wrote:
>>
>>> ! pg_atomic_write_u32(&bufHdr->state, state);
>>>   } while (!StartBufferIO(bufHdr, true));
>>>
>>> Better Write some comment, about we clearing the BM_LOCKED from stage
>>> directly and need not to call UnlockBufHdr explicitly.
>>> otherwise its confusing.
>>>
>>
>> Few more comments..
>>
>> *** 828,837 
>> */
>>do
>>{
>> !  LockBufHdr(bufHdr);
>> *!  Assert(bufHdr->flags & BM_VALID);*
>> !  bufHdr->flags &= ~BM_VALID;
>> !  UnlockBufHdr(bufHdr);
>>} while (!StartBufferIO(bufHdr, true));
>>}
>>}
>> --- 826,834 
>> */
>>do
>>{
>> !  uint32 state = LockBufHdr(bufHdr);
>> !  state &= ~(BM_VALID | BM_LOCKED);
>> !  pg_atomic_write_u32(&bufHdr->state, state);
>>} while (!StartBufferIO(bufHdr, true));
>>
>> 1. Previously there was a Assert, any reason why we removed it ?
>>  Assert(bufHdr->flags & BM_VALID);
>>
>
> It was missed.  In the attached patch I've put it back.
>
> 2. And also if we don't need assert then can't we directly clear BM_VALID
>> flag from state variable (if not locked) like pin/unpin buffer does,
>> without taking buffer header lock?
>>
>
> In this version of patch it could be also done as loop of CAS operation.
> But I'm not intended to it so because it would significantly complicate
> code.  It's not yes clear that traffic in this place is high enough to make
> such optimizations.
> Since v4 patch implements slightly different approach.  Could you please
> test it?  We need to check that this approach worth putting more efforts on
> it.  Or through it away otherwise.
>

Could anybody run benchmarks?  Feature freeze is soon, but it would be
*very nice* to fit it into 9.6 release cycle, because it greatly improves
scalability on large machines.  Without this patch PostgreSQL 9.6 will be
significantly behind competitors like MySQL 5.7.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] [PATCH] Phrase search ported to 9.6

2016-03-25 Thread Alexander Korotkov
On Fri, Mar 25, 2016 at 6:42 PM, David Steele  wrote:

> On 3/16/16 12:38 PM, Dmitry Ivanov wrote:
>
> I've made an attempt to fix some of the issues you've listed, although
>> there's
>> still much work to be done. I'll add some comments later.
>>
>
> Do you know when you'll have a chance to respond to reviews and provide a
> new patch?
>
> Time is short and it's not encouraging that you say there is "still much
> work to be done".  Perhaps it would be best to mark this "returned with
> feedback"?
>

Dmitry will provide a new version of patch today.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-25 Thread Alexander Korotkov
Hi, Dilip!

On Fri, Mar 25, 2016 at 8:32 PM, Dilip Kumar  wrote:

> On Fri, Mar 25, 2016 at 8:09 PM, Alexander Korotkov <
> a.korot...@postgrespro.ru> wrote:
>
>> Could anybody run benchmarks?  Feature freeze is soon, but it would be
>> *very nice* to fit it into 9.6 release cycle, because it greatly improves
>> scalability on large machines.  Without this patch PostgreSQL 9.6 will be
>> significantly behind competitors like MySQL 5.7.
>
>
> I have run the performance and here are the results.. With latest patch I
> did not see any regression at lower client count (median of 3 reading).
>
> scale factor 1000 shared buffer 8GB readonly
> *Client Base patch*
> 1 12957 13068
> 2 24931 25816
> 4 46311 48767
> 32 300921 310062
> 64 387623 493843
> 128 249635 583513
> scale factor 300 shared buffer 8GB readonly
> *Client Base patch*
> 1 14537 14586--> one thread number looks little less, generally I get
> ~18000 (will recheck).
> 2 34703 33929--> may be run to run variance (once I get time, will
> recheck)
> 4 67744 69069
> 32 312575 336012
> 64 213312 539056
> 128 190139 380122
>
> *Summary:*
>
> Actually with 64 client we have seen ~470,000 TPS with head also, by
> revering commit 6150a1b0.
> refer this thread: (
> http://www.postgresql.org/message-id/caa4ek1+zeb8pmwwktf+3brs0pt4ux6rs6aom0uip8c6shjw...@mail.gmail.com
> )
>
> I haven't tested this patch by reverting commit 6150a1b0, so not sure can
> this patch give even better performance ?
>
> It also points to the case, what Andres has mentioned in this thread.
>
>
> http://www.postgresql.org/message-id/20160226191158.3vidtk3ktcmhi...@alap3.anarazel.de
>

Thank you very much for testing!
I also got access to 4 x 18 Intel server with 144 threads. I'm going to
post results of tests on this server in next Monday.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-27 Thread Alexander Korotkov
On Sat, Mar 26, 2016 at 1:26 AM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> Thank you very much for testing!
> I also got access to 4 x 18 Intel server with 144 threads. I'm going to
> post results of tests on this server in next Monday.
>

I've run pgbench tests on this machine: pgbench -s 1000 -c $clients -j 100
-M prepared -T 300.
See results in the table and chart.

clients master  v3  v5
1   11671   12507   12679
2   24650   26005   25010
4   49631   48863   49811
8   96790   96441   99946
10  121275  119928  124100
20  243066  243365  246432
30  359616  342241  357310
40  431375  415310  441619
50  489991  489896  500590
60  538057  636473  554069
70  588659  714426  738535
80  405008  923039  902632
90  295443  1181247 1155918
100 258695  1323125 1325019
110 238842  1393767 1410274
120 226018  1432504 1474982
130 215102  1465459 1503241
140 206415  1470454 1505380
150 197850  1475479 1519908
160 190935  1420915 1484868
170 185835  1438965 1453128
180 182519  1416252 1453098

My conclusions are following:
1) We don't observe any regression in v5 in comparison to master.
2) v5 in most of cases slightly outperforms v3.

I'm going to do some code cleanup of v5 in Monday

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-27 Thread Alexander Korotkov
On Sun, Mar 27, 2016 at 3:10 PM, Andres Freund  wrote:

> On 2016-03-27 12:38:25 +0300, Alexander Korotkov wrote:
> > On Sat, Mar 26, 2016 at 1:26 AM, Alexander Korotkov <
> > a.korot...@postgrespro.ru> wrote:
> >
> > > Thank you very much for testing!
> > > I also got access to 4 x 18 Intel server with 144 threads. I'm going to
> > > post results of tests on this server in next Monday.
> > >
> >
> > I've run pgbench tests on this machine: pgbench -s 1000 -c $clients -j
> 100
> > -M prepared -T 300.
> > See results in the table and chart.
> >
> > clients master  v3  v5
> > 1   11671   12507   12679
> > 2   24650   26005   25010
> > 4   49631   48863   49811
> > 8   96790   96441   99946
> > 10  121275  119928  124100
> > 20  243066  243365  246432
> > 30  359616  342241  357310
> > 40  431375  415310  441619
> > 50  489991  489896  500590
> > 60  538057  636473  554069
> > 70  588659  714426  738535
> > 80  405008  923039  902632
> > 90  295443  1181247 1155918
> > 100 258695  1323125 1325019
> > 110 238842  1393767 1410274
> > 120 226018  1432504 1474982
> > 130 215102  1465459 1503241
> > 140 206415  1470454 1505380
> > 150 197850  1475479 1519908
> > 160 190935  1420915 1484868
> > 170 185835  1438965 1453128
> > 180 182519  1416252 1453098
> >
> > My conclusions are following:
> > 1) We don't observe any regression in v5 in comparison to master.
> > 2) v5 in most of cases slightly outperforms v3.
>
> What commit did you base these tests on? I guess something recent, after
> 98a64d0bd?
>

Yes, more recent than 98a64d0bd. It was based on 676265eb7b.


> > I'm going to do some code cleanup of v5 in Monday
>
> Ok, I'll try to do a review and possibly commit after that.
>

Sounds good.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] [PATCH] Phrase search ported to 9.6

2016-03-28 Thread Alexander Korotkov
Hi!

On Sat, Mar 26, 2016 at 12:02 AM, David Steele  wrote:

> On 3/25/16 3:54 PM, Artur Zakirov wrote:
> > On 25.03.2016 21:42, Dmitry Ivanov wrote:
> >> Sorry for the delay, I desperately needed some time to finish a bunch of
> >> dangling tasks.
> >>
> >> I've added some new comments and clarified the ones that were obscure.
> >> Moreover, I felt an urge to recheck most parts of the code since
> >> apparently
> >> nobody (besides myself) has gone so far yet.
> >>
> >> On 25.03.16 18:42 MSK, David Steele wrote:
> >>> Time is short and it's not encouraging that you say there is "still
> much
> >> work to be done".
> >>
> >> Indeed, I was inaccurate. I am more than interested in the positive
> >> outcome.
> >>
> >
> > I tested the patch and take a look on it. All regression tests passed.
> > The code looks good and the patch introduce a great functionality.
> >
> > I think the patch can be marked as "Ready for Commiter". But I do not
> > feel the force to do that myself.
> >
> > Also I agree with you about tsvector_setweight(). There is not a problem
> > with it because this weights are immutable and so there is not benefits
> > from new function.
>
> Ideally Alexander can also look at it.  If not, then you should mark it
> "ready for committer" since I doubt any more reviewers will sign on this
> late in the CF.
>

I've checked the last version of the patch. Patch applies cleanly on
master, builds without warnings, regression tests pass.
The issue I reported about tsquery textual representation is fixed.
I'm going to mark this "Ready for Committer".

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


[HACKERS] Autovacuum to prevent wraparound tries to consume xid

2016-03-28 Thread Alexander Korotkov
Hackers,

one our customer meet near xid wraparound situation.  xid counter
reached xidStopLimit value.  So, no transactions could be executed in
normal mode.  But what I noticed is strange behaviour of autovacuum to
prevent wraparound.  It vacuums tables, updates pg_class and pg_database,
but then falls with "database is not accepting commands to avoid wraparound
data loss in database" message.  We end up with situation that according to
pg_database  maximum age of database was less than 200 mln., but
transactions couldn't be executed, because ShmemVariableCache wasn't
updated (checked by gdb).

I've reproduced this situation on my laptop as following:

1) Connect gdb, do "set ShmemVariableCache->nextXid =
ShmemVariableCache->xidStopLimit"
2) Stop postgres
3) Make some fake clog: "dd bs=1m if=/dev/zero
of=/usr/local/pgsql/data/pg_clog/07FF count=1024"
4) Start postgres

Then I found the same situation as in customer database.  Autovacuum to
prevent wraparound regularly produced following messages in the log:

ERROR:  database is not accepting commands to avoid wraparound data loss in
database "template1"
HINT:  Stop the postmaster and vacuum that database in single-user mode.
You might also need to commit or roll back old prepared transactions.

Finally all databases was frozen

# SELECT datname, age(datfrozenxid) FROM pg_database;
  datname  │   age
───┼──
 template1 │0
 template0 │0
 postgres  │ 5000
(3 rows)

but no transactions could be executed (ShmemVariableCache wasn't updated).

After some debugging I found that vac_truncate_clog consumes xid just to
produce warning.  I wrote simple patch which replaces
GetCurrentTransactionId() with ShmemVariableCache->nextXid.  That
completely fixes this situation for me: ShmemVariableCache was successfully
updated.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


fix_vac_truncate_clog_xid_consume.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Access method extendability

2016-03-28 Thread Alexander Korotkov
Hi, Petr!

On Thu, Mar 17, 2016 at 10:56 AM, Petr Jelinek  wrote:

> On 16/03/16 15:31, Teodor Sigaev wrote:
>
>> Good catch, thanks! Tests were added.
>>>
>>
>> I don't see any objection, is consensus reached? I'm close to comiit
>> that...
>>
>>
> I did only cursory review on the bloom contrib module and don't really
> have complaints there, but I know you can review that one. I'd like the
> English of the generic_xlog.c description improved but I won't get to it
> before weekend.


What is your plans about English of the generic_xlog.c?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-28 Thread Alexander Korotkov
On Sun, Mar 27, 2016 at 4:31 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Sun, Mar 27, 2016 at 3:10 PM, Andres Freund  wrote:
>
>> On 2016-03-27 12:38:25 +0300, Alexander Korotkov wrote:
>> > On Sat, Mar 26, 2016 at 1:26 AM, Alexander Korotkov <
>> > a.korot...@postgrespro.ru> wrote:
>> >
>> > > Thank you very much for testing!
>> > > I also got access to 4 x 18 Intel server with 144 threads. I'm going
>> to
>> > > post results of tests on this server in next Monday.
>> > >
>> >
>> > I've run pgbench tests on this machine: pgbench -s 1000 -c $clients -j
>> 100
>> > -M prepared -T 300.
>> > See results in the table and chart.
>> >
>> > clients master  v3  v5
>> > 1   11671   12507   12679
>> > 2   24650   26005   25010
>> > 4   49631   48863   49811
>> > 8   96790   96441   99946
>> > 10  121275  119928  124100
>> > 20  243066  243365  246432
>> > 30  359616  342241  357310
>> > 40  431375  415310  441619
>> > 50  489991  489896  500590
>> > 60  538057  636473  554069
>> > 70  588659  714426  738535
>> > 80  405008  923039  902632
>> > 90  295443  1181247 1155918
>> > 100 258695  1323125 1325019
>> > 110 238842  1393767 1410274
>> > 120 226018  1432504 1474982
>> > 130 215102  1465459 1503241
>> > 140 206415  1470454 1505380
>> > 150 197850  1475479 1519908
>> > 160 190935  1420915 1484868
>> > 170 185835  1438965 1453128
>> > 180 182519  1416252 1453098
>> >
>> > My conclusions are following:
>> > 1) We don't observe any regression in v5 in comparison to master.
>> > 2) v5 in most of cases slightly outperforms v3.
>>
>> What commit did you base these tests on? I guess something recent, after
>> 98a64d0bd?
>>
>
> Yes, more recent than 98a64d0bd. It was based on 676265eb7b.
>
>
>> > I'm going to do some code cleanup of v5 in Monday
>>
>> Ok, I'll try to do a review and possibly commit after that.
>>
>
> Sounds good.
>

There is next revision of patch.  It contains mostly cosmetic changes.
Comment are adjusted to reflect changes of behaviour.
I also changed atomic AND/OR for local buffers to read/write pair which
would be cheaper I suppose.  However, I don't insist on it, and it could be
reverted.
The patch is ready for your review.  It's especially interesting what do
you think about the way I abstracted exponential back off of spinlock.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


pinunpin-cas-6.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Access method extendability

2016-03-29 Thread Alexander Korotkov
On Tue, Mar 29, 2016 at 6:20 PM, David Steele  wrote:

> On 3/28/16 1:26 PM, Alvaro Herrera wrote:
>
>> Alexander Korotkov wrote:
>>
>>> On Thu, Mar 24, 2016 at 6:19 PM, Alvaro Herrera <
>>> alvhe...@2ndquadrant.com>
>>> wrote:
>>>
>>> .. Oh crap.  I just noticed I forgot to update a comment in pg_dump's
>>>> getAccessMethods.  And we're missing psql tab-complete support for the
>>>> new commands.
>>>>
>>>
>>> Attached patches fix both these issues.
>>>
>>
>> I see Teodor just pushed these two fixes.  Thanks!
>>
>
> Does that mean this patch should be closed or is there more remaining to
> commit?
>

There are still two pending patches:
 * Generic WAL records
 * Bloom filter contrib

Teodor is going to commit them.  But he is waiting Petr Jelinek to review
the English of the first one.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] PoC: Partial sort

2016-03-29 Thread Alexander Korotkov
On Tue, Mar 29, 2016 at 4:56 PM, David Steele  wrote:

> On 3/23/16 8:39 PM, Peter Geoghegan wrote:
>
> This looks like an old change you missed:
>>
>> - * compare_path_fractional_costs
>> + * compare_fractional_path_costs
>>
>> All in all, this looks significantly better. Thanks for your work on
>> this. Sorry for the delay in my response, and that my review was
>> relatively rushed, but I'm rather busy at the moment with fighting
>> fires.
>>
>
> It looks like a new patch is required before this can be marked "ready for
> committer".  Will you have that ready soon?
>

Yes, that's it.  I'm working on it now.  I'm going to post it until
tomorrow.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] WIP: Access method extendability

2016-03-29 Thread Alexander Korotkov
On Tue, Mar 29, 2016 at 8:29 PM, Teodor Sigaev  wrote:

> I incorporated your changes and did some additional refinements on top of
>> them
>> still.
>>
>> Attached is delta against v12, that should cause less issues when merging
>> for
>> Teodor.
>>
>
> But last version is 13th?
>
> BTW, it would be cool to add odcs in VII. Internals chapter, description
> should be similar to index's interface description, i.e. how to use generic
> WAL interface.


We already have generic WAL interface description in comments to
generic_xlog.c.  I'm not fan of duplicating things.  What about moving
interface description from comments to docs completely?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] SQL access to access method details

2016-03-30 Thread Alexander Korotkov
On Mon, Mar 28, 2016 at 11:02 PM, Jim Nasby 
wrote:

> While working on a tool to capture catalog/stats info and looking at cross
> version compatibility, I noticed that the pg_am changes removed SQL access
> to a bunch of AM info. [1] indicates that's part of the purpose of the
> patch; are we sure no tools are using this info?


Idea is that this kind of information is internal API and shouldn't be
exposed at SQL level.  I'm not quite sure there is no tool using this
info.  Do you know such tools?


> Unlike previous catalog compatibility breaks, this one completely removes
> that information, so if someone was using it they're now completely hosed.
>
> [1] http://www.postgresql.org/message-id/55fec1ab.8010...@2ndquadrant.com


Do you have any example of this information usage?  If so, we can think
about providing some way of exposing it.  If not, default assumption is
that this information shouldn't be exposed.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] WIP: Access method extendability

2016-03-30 Thread Alexander Korotkov
On Tue, Mar 29, 2016 at 8:34 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Tue, Mar 29, 2016 at 8:29 PM, Teodor Sigaev  wrote:
>
>> I incorporated your changes and did some additional refinements on top of
>>> them
>>> still.
>>>
>>> Attached is delta against v12, that should cause less issues when
>>> merging for
>>> Teodor.
>>>
>>
>> But last version is 13th?
>>
>> BTW, it would be cool to add odcs in VII. Internals chapter, description
>> should be similar to index's interface description, i.e. how to use generic
>> WAL interface.
>
>
> We already have generic WAL interface description in comments to
> generic_xlog.c.  I'm not fan of duplicating things.  What about moving
> interface description from comments to docs completely?
>

I heard no objections.  There is revision of patch where generic WAL
interface description was moved to documentation.  This description
contains improvements by Petr Jelinek, Alvaro Herrera and Markus Nullmeier
(who sent it to me privately).

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


0002-generic-xlog.14.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-30 Thread Alexander Korotkov
On Wed, Mar 30, 2016 at 10:16 AM, Andres Freund  wrote:

> On 2016-03-30 07:13:16 +0530, Dilip Kumar wrote:
> > On Tue, Mar 29, 2016 at 10:43 PM, Andres Freund 
> wrote:
> >
> > > My gut feeling is that we should do both 1) and 2).
> > >
> > > Dilip, could you test performance of reducing ppc's spinlock to 1 byte?
> > > Cross-compiling suggest that doing so "just works".  I.e. replace the
> > > #if defined(__ppc__) typedef from an int to a char.
> > >
> >
> > I set that, but after that it hangs, even Initdb hangs..
>
> Yea, as Tom pointed out that's not going to work.  I'll try to write a
> patch for approach 1).
>

Great!  Do you need any improvements for pinunpin-cas-7.patch from me?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] PoC: Partial sort

2016-03-30 Thread Alexander Korotkov
instead. That would call ExecMaterializesOutput() for non-Sort cases.
>

I've added ExecPlanMaterializesOutput() function.


> >> Optimizer
> >> -
> >>
>
> >> * cost_sort() needs way way more comments.  Doesn't even mention
> >> indexes. Not worth commenting further on until I know what it's
> >> *supposed* to do, though.
> >
> >
> > I've added some comments.
>
> Looking at cost_sort() now, it's a bit clearer. I think that you
> should make sure that everything is costed as a quicksort, though, if
> you accept that we should try and make every small sort done by the
> partial sort a quicksort. Which I think is a good idea. The common
> case is that groups are small, but the qsort() insertion sort will be
> very very fast for that case.
>

I'm not sure that in partial sort we should estimate sorting of one group
in other way than simple sort does.  I see following reasons:
1) I'd like partial sort to be able to use other sorting methods to provide
graceful degradation in the case of planner mistakes as I pointed above.
2) Planner should don't choose partial sort plans in some cases.  That
should happen because higher cost of these plans including high cost of
partial sort.  Estimation of other sort methods looks like good way for
reporting such high costs.


> This looks like an old change you missed:
>
> - * compare_path_fractional_costs
> + * compare_fractional_path_costs
>

I think this is rather a typo fix.  Because now comment doesn't meet
function name.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


partial-sort-basic-8.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PoC: Partial sort

2016-01-24 Thread Alexander Korotkov
Hi, Tomas!

On Sat, Jan 23, 2016 at 3:07 PM, Tomas Vondra 
wrote:

> On 10/20/2015 01:17 PM, Alexander Korotkov wrote:
>
>> On Fri, Oct 16, 2015 at 7:11 PM, Alexander Korotkov
>> mailto:aekorot...@gmail.com>> wrote:
>>
>> On Sun, Jun 7, 2015 at 11:01 PM, Peter Geoghegan > <mailto:p...@heroku.com>> wrote:
>>
>> On Sun, Jun 7, 2015 at 8:10 AM, Andreas Karlsson
>> mailto:andr...@proxel.se>> wrote:
>> > Are you planning to work on this patch for 9.6?
>>
>> FWIW I hope so. It's a nice patch.
>>
>>
>> I'm trying to to whisk dust. Rebased version of patch is attached.
>> This patch isn't passing regression tests because of plan changes.
>> I'm not yet sure about those changes: why they happens and are they
>> really regression?
>> Since I'm not very familiar with planning of INSERT ON CONFLICT and
>> RLS, any help is appreciated.
>>
>>
>> Planner regression is fixed in the attached version of patch. It appears
>> that get_cheapest_fractional_path_for_pathkeys() behaved wrong when no
>> ordering is required.
>>
>>
> Alexander, are you working on this patch? I'd like to look at the patch,
> but the last available version (v4) no longer applies - there's plenty of
> bitrot. Do you plan to send an updated / rebased version?
>

I'm sorry that I didn't found time for this yet. I'm certainly planning to
get back to this in near future. The attached version is just rebased
without any optimization.

The main thing I'm particularly interested in is how much is this coupled
> with the Sort node, and whether it's possible to feed partially sorted
> tuples into other nodes.
>
> I'm particularly thinking about Hash Aggregate, because the partial sort
> allows to keep only the "current group" in a hash table, making it much
> more memory efficient / faster. What do you think?
>

This seems to me very reasonable optimization. And it would be nice to
implement some generalized way of presorted group processing. For instance,
we could have some special node, say "Group Scan" which have 2 children:
source and node which process every group. For "partial sort" the second
node would be Sort node. But it could be Hash Aggregate node as well.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


partial-sort-basic-5.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PoC: Partial sort

2016-01-24 Thread Alexander Korotkov
Hi!

On Sat, Jan 23, 2016 at 10:07 PM, Peter Geoghegan  wrote:

> On Sat, Jan 23, 2016 at 4:07 AM, Tomas Vondra
>  wrote:
> > The main thing I'm particularly interested in is how much is this coupled
> > with the Sort node, and whether it's possible to feed partially sorted
> > tuples into other nodes.
>
> That's cool, but I'm particularly interested in seeing Alexander get
> back to this because it's an important project on its own. We should
> really have this.
>

Thank you for your review and interest in this patch! I'm sorry for huge
delay I made. I'm going to get back to this soon.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] [PATCH] Refactoring of LWLock tranches

2016-01-29 Thread Alexander Korotkov
On Thu, Jan 21, 2016 at 12:37 AM, Alvaro Herrera 
wrote:

> So far as I can tell, there are three patches in flight here:
>
> * replication slot IO lwlocks
> * ability of extensions to request tranches dynamically
> * PGPROC
>
> The first one hasn't been reviewed at all, but the other two have seen a
> bit of discussion and evolution.  Is anyone doing any more reviewing?


I'd like to add another one: fixed tranche id for each SLRU.
And I'm going to make review over others.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


slru_tranche_ids_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Refactoring of LWLock tranches

2016-01-29 Thread Alexander Korotkov
On Mon, Jan 4, 2016 at 5:58 PM, Robert Haas  wrote:

> On Mon, Jan 4, 2016 at 1:20 AM, Amit Kapila 
> wrote:
> > If we do that way, then user of API needs to maintain the interlock
> > guarantee that the requested number of locks is same as allocations
> > done from that tranche and also if it is not allocating all the
> > LWLocks in one function, it needs to save the base address of the
> > array and then allocate from it by maintaining some counter.
> > I agree that looking up for tranche names multiple times is not good,
> > if there are many number of lwlocks, but that is done at startup
> > time and not at operation-level.  Also if we look currently at
> > the extensions in contrib, then just one of them allocactes one
> > LWLock in this fashion, now there could be extnsions apart from
> > extensions in contrib, but not sure if they require many number of
> > LWLocks, so I feel it is better to give an API which is more
> > convinient for user to use.
>
> Well, I agree with you that we should provide the most convenient API
> possible, but I don't agree that yours is more convenient than the one
> I proposed.  I think it is less convenient.  In most cases, if the
> user asks for a large number of LWLocks, they aren't going to be each
> for a different purpose - they will all be for the same purpose, like
> one per buffer or one per hash partition.  The case where the user
> wants to allocate 8 lwlocks from an extension, each for a different
> purpose, and spread those allocations across a bunch of different
> functions probably does not exist.  *Maybe* there is somebody
> allocating lwlocks from an extension for unrelated purposes, but I'd
> be willing to bet that, if so, all of those allocations happen one
> right after the other in a single function, because anything else
> would be completely nuts.
>

I'm not sure if we should return LWLockPadded*.

+LWLockPadded *
> +GetLWLockAddinTranche(const char *tranche_name)


It would be nice to isolate extension from knowledge about padding. Could
we one day change it from LWLockPadded * to LWLockMinimallyPadded * or any?

+LWLock **
> +GetLWLockAddinTranche(const char *tranche_name)


Could we use this signature?


> >> > Also I have retained NUM_USER_DEFINED_LWLOCKS in
> >> > MainLWLockArray so that if any extensions want to assign a LWLock
> >> > after startup, it can be used from this pool.  The tranche for such
> >> > locks
> >> > will be reported as main.
> >>
> >> I don't like this.  I think we should get rid of
> >> NUM_USER_DEFINED_LWLOCKS altogether.  It's just an accident waiting to
> >> happen, and I don't think there are enough people using LWLocks from
> >> extensions that we'll annoy very many people by breaking backward
> >> compatibility here.  If we are going to care about backward
> >> compatibility, then I think the old-style lwlocks should go in their
> >> own tranche, not main.  But personally, I think it'd be better to just
> >> force a hard break and make people switch to using the new APIs.
> >
> > One point to think before removing it altogether, is that the new API's
> > provide a way to allocate LWLocks at the startup-time and if any user
> > wants to allocate a new LWLock after start-up, it will not be possible
> > with the proposed API's.  Do you think for such cases, we should ask
> > user to use it the way we have exposed or do you have something else
> > in mind?
>
> Allocating a new lock after startup mostly doesn't work, because there
> will be at most 3 available and sometimes less.  And even if it does
> work, it often isn't very useful because you probably need some other
> shared memory space as well - otherwise, what is the lock protecting?
> And that space might not be available either.
>
> I'd be interested in knowing whether there are cases where useful
> extensions can be loaded apart from shared_preload_libraries because
> of NUM_USER_DEFINED_LWLOCKS and our tendency to allocate a little bit
> of extra shared memory, but my suspicion is that it rarely works out
> and is too flaky to be useful to anybody.


+1 for dropping old API
I don't think it's useful to have LWLocks without having shared memory.

There is another thing I'd like extensions to be able to do. It would be
nice if extensions could use dynamic shared memory instead of static. Then
extensions could use shared memory without being in
shared_preload_libraries. But if extension register DSM, then there is no
way to tell other backends to use it for that extension. Also DSM would be
deallocated when all backends detached from it. This it out of scope for
this patch though.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] [PATCH] Refactoring of LWLock tranches

2016-01-29 Thread Alexander Korotkov
On Tue, Jan 5, 2016 at 4:04 PM, Amit Kapila  wrote:

> On Mon, Jan 4, 2016 at 8:28 PM, Robert Haas  wrote:
> >
> > On Mon, Jan 4, 2016 at 1:20 AM, Amit Kapila 
> wrote:
> > > On Mon, Jan 4, 2016 at 4:49 AM, Robert Haas 
> wrote:
> > >> On Thu, Dec 31, 2015 at 3:36 AM, Amit Kapila  >
> > >> wrote:
> > >> > LWLock *LWLockAssignFromTranche(const char *tranche_name) will
> > >> > assign a lock for the specified tranche.  This also ensures that no
> > >> > more than requested number of LWLocks can be assigned from a
> > >> > specified tranche.
> > >>
> > >> However, this seems like an awkward API, because if there are many
> > >> LWLocks you're going to need to look up the tranche name many times,
> > >> and then you're going to need to make an array of pointers to them
> > >> somehow, introducing a thoroughly unnecessary layer of indirection.
> > >> Instead, I suggest just having a function LWLockPadded
> > >> *GetLWLockAddinTranche(const char *tranche_name) that returns a
> > >> pointer to the base of the array.
> > >
> > > If we do that way, then user of API needs to maintain the interlock
> > > guarantee that the requested number of locks is same as allocations
> > > done from that tranche and also if it is not allocating all the
> > > LWLocks in one function, it needs to save the base address of the
> > > array and then allocate from it by maintaining some counter.
> > > I agree that looking up for tranche names multiple times is not good,
> > > if there are many number of lwlocks, but that is done at startup
> > > time and not at operation-level.  Also if we look currently at
> > > the extensions in contrib, then just one of them allocactes one
> > > LWLock in this fashion, now there could be extnsions apart from
> > > extensions in contrib, but not sure if they require many number of
> > > LWLocks, so I feel it is better to give an API which is more
> > > convinient for user to use.
> >
> > Well, I agree with you that we should provide the most convenient API
> > possible, but I don't agree that yours is more convenient than the one
> > I proposed.  I think it is less convenient.  In most cases, if the
> > user asks for a large number of LWLocks, they aren't going to be each
> > for a different purpose - they will all be for the same purpose, like
> > one per buffer or one per hash partition.  The case where the user
> > wants to allocate 8 lwlocks from an extension, each for a different
> > purpose, and spread those allocations across a bunch of different
> > functions probably does not exist.
>
> Probably, but the point is to make user of API do what he or she wants
> to accomplish without much knowledge of underlying stuff.  However,
> I think it is not too much details for user to know, so I have changed the
> API as per your suggestion.
>
> >
> >   *Maybe* there is somebody
> > allocating lwlocks from an extension for unrelated purposes, but I'd
> > be willing to bet that, if so, all of those allocations happen one
> > right after the other in a single function, because anything else
> > would be completely nuts.
> >
> > >> > Also I have retained NUM_USER_DEFINED_LWLOCKS in
> > >> > MainLWLockArray so that if any extensions want to assign a LWLock
> > >> > after startup, it can be used from this pool.  The tranche for such
> > >> > locks
> > >> > will be reported as main.
> > >>
> >
> > I'd be interested in knowing whether there are cases where useful
> > extensions can be loaded apart from shared_preload_libraries because
> > of NUM_USER_DEFINED_LWLOCKS and our tendency to allocate a little bit
> > of extra shared memory, but my suspicion is that it rarely works out
> > and is too flaky to be useful to anybody.
> >
>
> I am not aware of such cases, however the reason I have kept it was for
> backward-compatability, but now I have removed it in the attached patch.
>
> Apart from that, I have updated the docs to reflect the changes related
> to new API's.
>
> Fe things to Note -
> Some change is needed in LWLockCounter[1] if this goes after 2
> other patches (separate tranche for PGProcs and separate tranche
> for ReplicationSlots).  Also, LWLockAssign() can be removed after
> those patches
>

Also couple of minor comments from me.

I think this

+ StrNCpy(LWLockTrancheRequestArray[LWLockTrancheRequestsCount].tranche_name,
> tranche_name, strlen(tranche_name) + 1);


should be

+ StrNCpy(LWLockTrancheRequestArray[LWLockTrancheRequestsCount].tranche_name,
> tranche_name,
> sizeof(LWLockTrancheRequestArray[LWLockTrancheRequestsCount].tranche_name));


And as far as I know english "it's" should be "its" in the sentence below.

+ from _PG_init.  Tranche repersents an array of LWLocks
> and
> + can be accessed by it's name.  First parameter
> tranche_name


--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] [PATCH] Refactoring of LWLock tranches

2016-01-29 Thread Alexander Korotkov
On Tue, Dec 29, 2015 at 11:56 AM, Amit Kapila 
wrote:

> On Wed, Dec 16, 2015 at 12:26 AM, Robert Haas 
> wrote:
> >
> >
> > In terms of this project overall, NumLWLocks() now knows about only
> > four categories of stuff: fixed lwlocks, backend locks (proc.c),
> > replication slot locks, and locks needed by extensions.  I think it'd
> > probably be fine to move the backend locks into PGPROC directly, and
> > the replication slot locks into ReplicationSlot.
> >
>
> IIdus has written a patch to move backend locks into PGPROC which
> I am reviewing and will do performance tests once he responds to
> my review comments and I have written a patch to move replication
> slot locks into ReplicationSlot which is attached with this mail.
>

This patch looks good for me.
It compiles without warnings, passes regression tests.
I also did small testing of replication slots in order to check that it
works correctly.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Sequence Access Method WIP

2016-01-29 Thread Alexander Korotkov
On Fri, Jan 29, 2016 at 6:36 PM, Alvaro Herrera 
wrote:

> Petr Jelinek wrote:
> > On 29 January 2016 at 14:48, Tom Lane  wrote:
> > > Alvaro Herrera  writes:
>
> > > Uh, what?  Surely we would provide a bespoke command for each possible
> > > sort of handler.  As an example, CREATE INDEX ACCESS METHOD ought to
> check
> > > that the provided function has the right signature, and then it would
> put
> > > the correct amkind into the pg_am entry automatically.
>
> I'm thinking we'd do CREATE ACCESS METHOD foobar TYPE INDEX or something
> like that.
>

Ok! Let's nail down the syntax and I can integrate it into my createam
patch.
I would prefer "CREATE {INDEX | SEQUENCE | ... } ACCESS METHOD name HANDLER
handler;", but I don't insist.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Proposal: Generic WAL logical messages

2016-01-29 Thread Alexander Korotkov
Hi, Petr!

On Sat, Jan 23, 2016 at 1:22 AM, Petr Jelinek  wrote:

> here is updated version of this patch, calling the messages logical
> (decoding) messages consistently everywhere and removing any connection to
> standby messages. Moving this to it's own module gave me place to write
> some brief explanation about this so the code documentation has hopefully
> improved as well.
>
> The functionality itself didn't change.


I'd like to mention that there is my upcoming patch which is named generic
WAL records.
*http://www.postgresql.org/message-id/capphfdsxwzmojm6dx+tjnpyk27kt4o7ri6x_4oswcbyu1rm...@mail.gmail.com
<http://www.postgresql.org/message-id/capphfdsxwzmojm6dx+tjnpyk27kt4o7ri6x_4oswcbyu1rm...@mail.gmail.com>*
But it has to be distinct feature from your generic WAL logical messages.
Theoretically, we could have generic messages with arbitrary content and
both having custom WAL reply function and being decoded by output plugin.
But custom WAL reply function would let extension bug break recovery,
archiving and physical replication. And that doesn't seem to be acceptable.
This is why we have to develop these as separate features.

Should we think more about naming? Does two kinds of generic records
confuse people?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-01-29 Thread Alexander Korotkov
Hi, Dilip!

On Tue, Jan 19, 2016 at 6:00 PM, Dilip Kumar  wrote:

>
> On Tue, Jan 19, 2016 at 5:44 PM, Michael Paquier <
> michael.paqu...@gmail.com> wrote:
>
>> > Test3:
>> > pgbench -i -s 100 postgres
>> > pgbench -c$ -j$ -Mprepared -S postgres
>> >
>> > Client Base  Pached
>> >
>> > 1  2055519404
>> > 32  375919  332670
>> > 64  509067  440680
>> > 128431346  415121
>> > 256380926  379176
>>
>> It seems like you did a copy-paste of the results with s=100 and
>> s=300. Both are showing the exact same numbers.
>
>
> Oops, my mistake, re-pasting the correct results for s=100
>
> pgbench -i -s 100 postgres
> pgbench -c$ -j$ -Mprepared -S postgres
>
> Client Base  Pached
>
> 12054820791
> 32  372633  355356
> 64  532052  552148
> 128412755  478826
> 256     346701 372057
>

Could you please re-run these tests few times?
Just to be sure it's a reproducible regression with s=300 and not a
statistical error.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Optimizer questions

2016-01-29 Thread Alexander Korotkov
On Fri, Jan 8, 2016 at 11:58 AM, Konstantin Knizhnik <
k.knizh...@postgrespro.ru> wrote:

> Attached please find improved version of the optimizer patch for LIMIT
> clause.
> Now I try to apply this optimization only for non-trivial columns
> requiring evaluation.
> May be it will be better to take in account cost of this columns
> evaluation but right now I just detect non-variable columns.
>

We may think about more general feature: LIMIT pushdown. In the
Konstantin's patch planner push LIMIT before tlist calculation.
But there are other cases when calculating LIMIT first would be beneficial.
For instance, we can do LIMIT before JOINs. That is possible only for LEFT
JOIN which is not used in filter and ordering clauses. See the example
below.

# create table t1 as (select i, random() v from generate_series(1,100)
i);
SELECT 100

# create table t2 as (select i, random() v from generate_series(1,100)
i);
SELECT 100

# explain analyze select * from t1 left join t2 on t1.i = t2.i order by
t1.v limit 10;
  QUERY PLAN

 Limit  (cost=87421.64..87421.67 rows=10 width=24) (actual
time=1486.276..1486.278 rows=10 loops=1)
   ->  Sort  (cost=87421.64..89921.64 rows=100 width=24) (actual
time=1486.275..1486.275 rows=10 loops=1)
 Sort Key: t1.v
 Sort Method: top-N heapsort  Memory: 25kB
 ->  Hash Left Join  (cost=27906.00..65812.00 rows=100
width=24) (actual time=226.180..1366.238 rows=100
   Hash Cond: (t1.i = t2.i)
   ->  Seq Scan on t1  (cost=0.00..15406.00 rows=100
width=12) (actual time=0.010..77.041 rows=100 l
   ->  Hash  (cost=15406.00..15406.00 rows=100 width=12)
(actual time=226.066..226.066 rows=100 loop
 Buckets: 131072  Batches: 1  Memory Usage: 46875kB
 ->  Seq Scan on t2  (cost=0.00..15406.00 rows=100
width=12) (actual time=0.007..89.002 rows=100
 Planning time: 0.123 ms
 Execution time: 1492.118 ms
(12 rows)

# explain analyze select * from (select * from t1 order by t1.v limit 10)
t1 left join t2 on t1.i = t2.i;
  QUERY PLAN

 Hash Right Join  (cost=37015.89..56171.99 rows=10 width=24) (actual
time=198.478..301.278 rows=10 loops=1)
   Hash Cond: (t2.i = t1.i)
   ->  Seq Scan on t2  (cost=0.00..15406.00 rows=100 width=12) (actual
time=0.005..74.303 rows=100 loops=1)
   ->  Hash  (cost=37015.77..37015.77 rows=10 width=12) (actual
time=153.584..153.584 rows=10 loops=1)
 Buckets: 1024  Batches: 1  Memory Usage: 1kB
 ->  Limit  (cost=37015.64..37015.67 rows=10 width=12) (actual
time=153.579..153.580 rows=10 loops=1)
   ->  Sort  (cost=37015.64..39515.64 rows=100 width=12)
(actual time=153.578..153.578 rows=10 loops=1)
 Sort Key: t1.v
 Sort Method: top-N heapsort  Memory: 25kB
 ->  Seq Scan on t1  (cost=0.00..15406.00 rows=100
width=12) (actual time=0.012..78.828 rows=100
 Planning time: 0.132 ms
 Execution time: 301.308 ms
(12 rows)

In this example LIMIT pushdown makes query 5 times faster. It would be very
nice if optimizer make this automatically.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] [PATCH] Refactoring of LWLock tranches

2016-01-30 Thread Alexander Korotkov
On Sat, Jan 30, 2016 at 10:58 AM, Amit Kapila 
wrote:

> On Fri, Jan 29, 2016 at 6:47 PM, Robert Haas 
> wrote:
> >
> > On Fri, Jan 29, 2016 at 6:59 AM, Alexander Korotkov
> >  wrote:
> > > On Thu, Jan 21, 2016 at 12:37 AM, Alvaro Herrera <
> alvhe...@2ndquadrant.com>
> > > wrote:
> > >> So far as I can tell, there are three patches in flight here:
> > >>
> > >> * replication slot IO lwlocks
> > >> * ability of extensions to request tranches dynamically
> > >> * PGPROC
> > >>
> > >> The first one hasn't been reviewed at all, but the other two have
> seen a
> > >> bit of discussion and evolution.  Is anyone doing any more reviewing?
> > >
> > > I'd like to add another one: fixed tranche id for each SLRU.
> >
> > +1 for this.  The patch looks good and I will commit it if nobody
> objects.
> >
>
> +1. Patch looks good to me as well, but I have one related question:
> Is there a reason why we should not assign ReplicationOrigins a
> fixed tranche id  and then we might want to even get away with
> LWLockRegisterTranche()?
>

+1. I think we should do this.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-02-01 Thread Alexander Korotkov
On Mon, Feb 1, 2016 at 7:05 AM, Dilip Kumar  wrote:

>
> On Sun, Jan 31, 2016 at 11:44 AM, Dilip Kumar 
> wrote:
>
>> By looking at the results with scale factor 1000 and 100 i don't see any
>> reason why it will regress with scale factor 300.
>>
>> So I will run the test again with scale factor 300 and this time i am
>> planning to run 2 cases.
>> 1. when data fits in shared buffer
>> 2. when data doesn't fit in shared buffer.
>>
>
> I have run the test again with 300 S.F and found no regression, in fact
> there is improvement with the patch like we saw with 1000 scale factor.
>
> Shared Buffer= 8GB
> max_connections=150
> Scale Factor=300
>
> ./pgbench  -j$ -c$ -T300 -M prepared -S postgres
>
> ClientBasePatch
> 11974419382
> 8125923126395
> 3231393151
> 64387339496830
> 128306412350610
>
> Shared Buffer= 512MB
> max_connections=150
> Scale Factor=300
>
> ./pgbench  -j$ -c$ -T300 -M prepared -S postgres
>
> ClientBasePatch
> 11716916454
> 8108547105559
> 32241619262818
> 64206868233606
> 128137084217013
>

Great, thanks!

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Proposal: Generic WAL logical messages

2016-02-01 Thread Alexander Korotkov
On Sat, Jan 30, 2016 at 11:58 AM, Simon Riggs  wrote:

> On 29 January 2016 at 21:11, Alexander Korotkov  > wrote:
>
>> Hi, Petr!
>>
>> On Sat, Jan 23, 2016 at 1:22 AM, Petr Jelinek 
>> wrote:
>>
>>> here is updated version of this patch, calling the messages logical
>>> (decoding) messages consistently everywhere and removing any connection to
>>> standby messages. Moving this to it's own module gave me place to write
>>> some brief explanation about this so the code documentation has hopefully
>>> improved as well.
>>>
>>> The functionality itself didn't change.
>>
>>
>> I'd like to mention that there is my upcoming patch which is named
>> generic WAL records.
>> *http://www.postgresql.org/message-id/capphfdsxwzmojm6dx+tjnpyk27kt4o7ri6x_4oswcbyu1rm...@mail.gmail.com
>> <http://www.postgresql.org/message-id/capphfdsxwzmojm6dx+tjnpyk27kt4o7ri6x_4oswcbyu1rm...@mail.gmail.com>*
>> But it has to be distinct feature from your generic WAL logical messages.
>> Theoretically, we could have generic messages with arbitrary content and
>> both having custom WAL reply function and being decoded by output plugin.
>> But custom WAL reply function would let extension bug break recovery,
>> archiving and physical replication. And that doesn't seem to be acceptable.
>> This is why we have to develop these as separate features.
>>
>> Should we think more about naming? Does two kinds of generic records
>> confuse people?
>>
>
> Logical messages
>
> Generic WAL records
>
> Seems like I can tell them apart. Worth checking, but I think we're OK.
>

I was worrying because topic name is "Generic WAL logical messages". But if
we name them just "Logical messages" then it's OK for me.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] [PATCH] Refactoring of LWLock tranches

2016-02-01 Thread Alexander Korotkov
On Mon, Feb 1, 2016 at 10:26 AM, Amit Kapila 
wrote:

> On Sat, Jan 30, 2016 at 2:15 PM, Alexander Korotkov <
> a.korot...@postgrespro.ru> wrote:
> >
> > On Sat, Jan 30, 2016 at 10:58 AM, Amit Kapila 
> wrote:
> >>
> >> On Fri, Jan 29, 2016 at 6:47 PM, Robert Haas 
> wrote:
> >> >
> >> > On Fri, Jan 29, 2016 at 6:59 AM, Alexander Korotkov
> >> >  wrote:
> >> > > On Thu, Jan 21, 2016 at 12:37 AM, Alvaro Herrera <
> alvhe...@2ndquadrant.com>
> >> > > wrote:
> >> > >> So far as I can tell, there are three patches in flight here:
> >> > >>
> >> > >> * replication slot IO lwlocks
> >> > >> * ability of extensions to request tranches dynamically
> >> > >> * PGPROC
> >> > >>
> >> > >> The first one hasn't been reviewed at all, but the other two have
> seen a
> >> > >> bit of discussion and evolution.  Is anyone doing any more
> reviewing?
> >> > >
> >> > > I'd like to add another one: fixed tranche id for each SLRU.
> >> >
> >> > +1 for this.  The patch looks good and I will commit it if nobody
> objects.
> >> >
> >>
> >> +1. Patch looks good to me as well, but I have one related question:
> >> Is there a reason why we should not assign ReplicationOrigins a
> >> fixed tranche id  and then we might want to even get away with
> >> LWLockRegisterTranche()?
> >
> >
> > +1. I think we should do this.
> >
>
> Okay, Attached patch assigns fixed trancheid for ReplicationOrigins.
> I don't think we can remove LWLockRegisterTranche(), as that will be
> required for assigning transcheid's for extensions, so didn't change that
> part of code.
>

OK. This one looks good for me too.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-02-01 Thread Alexander Korotkov
On Mon, Feb 1, 2016 at 11:34 AM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Mon, Feb 1, 2016 at 7:05 AM, Dilip Kumar  wrote:
>
>>
>> On Sun, Jan 31, 2016 at 11:44 AM, Dilip Kumar 
>> wrote:
>>
>>> By looking at the results with scale factor 1000 and 100 i don't see any
>>> reason why it will regress with scale factor 300.
>>>
>>> So I will run the test again with scale factor 300 and this time i am
>>> planning to run 2 cases.
>>> 1. when data fits in shared buffer
>>> 2. when data doesn't fit in shared buffer.
>>>
>>
>> I have run the test again with 300 S.F and found no regression, in fact
>> there is improvement with the patch like we saw with 1000 scale factor.
>>
>> Shared Buffer= 8GB
>> max_connections=150
>> Scale Factor=300
>>
>> ./pgbench  -j$ -c$ -T300 -M prepared -S postgres
>>
>> ClientBasePatch
>> 11974419382
>> 8125923126395
>> 3231393151
>> 64387339496830
>> 128306412350610
>>
>> Shared Buffer= 512MB
>> max_connections=150
>> Scale Factor=300
>>
>> ./pgbench  -j$ -c$ -T300 -M prepared -S postgres
>>
>> ClientBasePatch
>> 11716916454
>> 8108547105559
>> 32241619262818
>> 64206868233606
>> 128137084217013
>>
>
> Great, thanks!
>

Attached patch is rebased and have better comments.
Also, there is one comment which survive since original version by Andres.

/* Add exponential backoff? Should seldomly be contended tho. */


Andres, did you mean we should twice the delay with each unsuccessful try
to lock?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

>


pinunpin-cas-2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-02-01 Thread Alexander Korotkov
On Sun, Jan 31, 2016 at 6:55 AM, Amit Kapila 
wrote:

> On Thu, Jan 28, 2016 at 9:16 AM, Amit Kapila 
> wrote:
> >
> > On Thu, Jan 28, 2016 at 2:12 AM, Robert Haas 
> wrote:
> > >
> > > On Tue, Jan 26, 2016 at 3:10 AM, and...@anarazel.de <
> and...@anarazel.de> wrote:
> > > > I do think there's a considerable benefit in improving the
> > > > instrumentation here, but his strikes me as making live more complex
> for
> > > > more users than it makes it easier. At the very least this should be
> > > > split into two fields (type & what we're actually waiting on). I also
> > > > strongly suspect we shouldn't use in band signaling ("process not
> > > > waiting"), but rather make the field NULL if we're not waiting on
> > > > anything.
> > >
> > > +1 for splitting it into two fields.
> > >
> >
> > I will take care of this.
> >
>
> As discussed, I have added a new field wait_event_type along with
> wait_event in pg_stat_activity.  Changed the code return NULL, if
> backend is not waiting.  Updated the docs as well.
>

I wonder if we can use 4-byte wait_event_info more efficiently.
LWLock number in the tranche would be also useful information to expose.
Using lwlock number user can determine if there is high concurrency for
single lwlock in tranche or it is spread over multiple lwlocks.
I think it would be enough to have 6 bits for event class id and 10 bit for
event id. So, it would be maximum 64 event classes and maximum 1024 events
per class. These limits seem to be fair enough for me.
And then we save 16 bits for lock number. It's certainly not enough for
some tranches. For instance, number of buffers could be easily more than
2^16. However, we could expose at least lower 16 bits. It would be at least
something. Using this information user at least can make a conclusion like
"It MIGHT be a high concurrency for single buffer content. Other way it is
coincidence that a lot of different buffers have the same 16 lower bits.".

Any thoughts?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


[HACKERS] Fix KNN GiST ordering type

2016-02-01 Thread Alexander Korotkov
Hi!

KNN GiST detects which type it should return by returning type of ordering
operator.
But it appears that type of sk_func is detected after it was replaced with
distance function. That is wrong: distance function should always return
float8.
I think it is just a typo.
Should be backpatched to 9.5.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


fix-knn-gist-ordering-type.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-02-02 Thread Alexander Korotkov
On Tue, Feb 2, 2016 at 12:43 AM, Andres Freund  wrote:

> On 2016-02-01 13:06:57 +0300, Alexander Korotkov wrote:
> > On Mon, Feb 1, 2016 at 11:34 AM, Alexander Korotkov <
> > a.korot...@postgrespro.ru> wrote:
> > >> ClientBasePatch
> > >> 11974419382
> > >> 8125923126395
> > >> 3231393151
> > >> 64387339496830
> > >> 128306412350610
> > >>
> > >> Shared Buffer= 512MB
> > >> max_connections=150
> > >> Scale Factor=300
> > >>
> > >> ./pgbench  -j$ -c$ -T300 -M prepared -S postgres
> > >>
> > >> ClientBasePatch
> > >> 11716916454
> > >> 8108547105559
> > >> 32241619262818
> > >> 64206868233606
> > >> 128137084217013
>
> So, there's a small regression on low client counts. That's worth
> addressing.
>

Interesting. I'll try to reproduce it.


> > Attached patch is rebased and have better comments.
> > Also, there is one comment which survive since original version by
> Andres.
> >
> > /* Add exponential backoff? Should seldomly be contended tho. */
> >
> >
> > Andres, did you mean we should twice the delay with each unsuccessful try
> > to lock?
>
> Spinning on a lock as fast as possible leads to rapid cacheline bouncing
> without anybody making progress. See s_lock() in s_lock.c.
>

I didn't notice that s_lock() behaves so. Thank you.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Fix KNN GiST ordering type

2016-02-02 Thread Alexander Korotkov
On Mon, Feb 1, 2016 at 7:31 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> KNN GiST detects which type it should return by returning type of ordering
> operator.
> But it appears that type of sk_func is detected after it was replaced with
> distance function. That is wrong: distance function should always return
> float8.
> I think it is just a typo.
>

I found this was introduced by this commit.
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=821b821a2421beaa58225ff000833df69fb962c5;hp=284bef297733e553c73f1c858e0ce1532f754d18

However, commit message doesn't say anything special about this change.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] [PATCH] Refactoring of LWLock tranches

2016-02-02 Thread Alexander Korotkov
On Tue, Feb 2, 2016 at 2:54 PM, Robert Haas  wrote:

> On Mon, Feb 1, 2016 at 3:47 AM, Alexander Korotkov
>  wrote:
> > OK. This one looks good for me too.
>
> All right, I pushed both this and the other one as a single commit,
> since they were so closely related and the second only one line.
>

Great, thanks!
It seems that we have only extension tranches patch pending in this thread.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Sequence Access Method WIP

2016-02-16 Thread Alexander Korotkov
On Sat, Jan 30, 2016 at 3:37 PM, Petr Jelinek  wrote:

> On 29 January 2016 at 23:59, Robert Haas  wrote:
> > On Fri, Jan 29, 2016 at 5:24 PM, Tom Lane  wrote:
> >> Alexander Korotkov  writes:
> >>> On Fri, Jan 29, 2016 at 6:36 PM, Alvaro Herrera <
> alvhe...@2ndquadrant.com>
> >>> wrote:
> >>>> I'm thinking we'd do CREATE ACCESS METHOD foobar TYPE INDEX or
> something
> >>>> like that.
> >>
> >>> I would prefer "CREATE {INDEX | SEQUENCE | ... } ACCESS METHOD name
> HANDLER
> >>> handler;", but I don't insist.
> >>
> >> I think that Alvaro's idea is less likely to risk future grammar
> >> conflicts.  We've had enough headaches from CREATE [ UNIQUE ] INDEX
> >> [ CONCURRENTLY ] to make me really wary of variables in the
> statement-name
> >> part of the syntax.
> >
> > Strong +1.  If we put the type of access method immediately after
> > CREATE, I'm almost positive we'll regret it for exactly that reason.
> >
>
> Just as a note, CREATE SEQUENCE ACCESS METHOD already causes grammar
> conflict now, that's why my proposal was different, I didn't want to
> add more keywords. I think Alvaro's proposal is fine as well.
>
> The other point is that we are creating ACCESS METHOD object so that's
> what should be after CREATE.
>
> In any case this is slightly premature IMHO as DDL is somewhat unless
> until we have sequence access methods implementation we can agree on,
> or the generic WAL logging so that custom indexes can be crash safe.


I've changed syntax of CREATE ACCESS METHOD syntax in the thread about
index access method extendability.
Now it is "CREATE ACCESS METHOD name TYPE INDEX HANDLER handler;". New
column amtype of pg_am stores access method type.
http://www.postgresql.org/message-id/capphfdu9gln7kuicwegsp50caamwx8q-jwzbpehc92rvfhz...@mail.gmail.com
It could be easily extended to "CREATE ACCESS METHOD name TYPE SEQUENCE
HANDLER handler;".

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Figures in docs

2016-02-17 Thread Alexander Lakhin
17.02.2016 09:17, Tatsuo Ishii wrote:
>> Hi. 
>>
>> In DocBook 4.2 sgml dtd, figure tag is supported already.
>> that was implemented for multi output format.
> Ok, there's no technical problems with figures then.  MySQL docs has
> some nice figures. I am jealous.
The "figure" tag is just a placeholder in the Docbook
(http://www.docbook.org/tdg/en/html/figure.html).
The question is what to place inside this tag: "graphic" (not inline),
"mediaobject/imageobject" (alternative object, supports inline contents
and SVG), or something else.
So you surely can insert some picture from an external file (.png,
.jpg), but if it could contain title or some labels that should be
translated, it's not a best solution.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Figures in docs

2016-02-17 Thread Alexander Lakhin

Hello,

In fact we use "make postgres.xml" to get a single XML, which we translate.
We convert it to .po using xml2po (with some modifications), and then 
convert it back to xml (and then to sgml files with our custom script).
So we have not changed sgml to xml files, and if it's more appropriate 
than having a single large XML, then we will propose a procedure to 
perform such conversion (that will result in corresponding mega patch).


Best regards,
Alexander Lakhin

17.02.2016 08:41, Tatsuo Ishii пишет:

On Wed, Feb 17, 2016 at 4:17 AM, Tatsuo Ishii  wrote:


It seems there's no figures/diagrams in our docs. I vaguely recall that
we used to have a few diagrams in our docs. If so, was there any
technical reason to remove them?


I don't know the reason, but it's shame, we are still in sgml !

We already do our translations (as others) in xml using custom scripting.
xml provides us better integration with available tools and ability to
insert graphics. Last time we asked in -docs about moving to xml and
Alexander demonstrated acceptable speed of xml build, but there were no
reply from Peter, who is (?) responsible for our documentation
infrastructure. Probably, we should just created a patch and submit to
commitfest.  You can check this thread
http://www.postgresql.org/message-id/1428009501118.85...@postgrespro.ru

Well, there are some PostgreSQL doc translation projects are running
including translation for Japanese, which I am working on.

If we are going to change the manual format and/or the build system, I
need to confirm it does work for Japanese as well. In theory because
the Japanese translation project uses UTF-8, there should be no
problem as far as the whole build system works for UTF-8. But I want
to confirm first...

BTW, are you going to propose a mega patch which changes all the sgml
files to xml files?

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] about google summer of code 2016

2016-02-18 Thread Alexander Korotkov
On Wed, Feb 17, 2016 at 10:40 AM, Amit Langote <
langote_amit...@lab.ntt.co.jp> wrote:

> On 2016/02/17 16:27, Shubham Barai wrote:
> > Hello everyone,
> >
> > I am currently pursuing my bachelor of engineering in computer science
> > at Maharashtra
> > Institute of Technology, Pune ,India. I am very excited about
> contributing
> > to postgres through google summer of code program.
> >
> > Is postgres   applying for gsoc 2016 as mentoring organization ?
>
> I think it does.  Track this page for updates:
> http://www.postgresql.org/developer/summerofcode/
>
> You can contact one of the people listed on that page for the latest.
>
> I didn't find for 2016 but here is the PostgreSQL wiki page for the last
> year's GSoC page: https://wiki.postgresql.org/wiki/GSoC_2015#Project_Ideas


I've created wiki page for GSoC 2016. It contains unimplemented ideas from
2015 page.
Now, GSoC accepting proposals from organizations. Typically, we have call
for mentors in hackers mailing list in this period.
Thom, do we apply this year?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] The plan for FDW-based sharding

2016-02-24 Thread Alexander Korotkov
.
> Postgres XC has always used the Postgres license.)
>
> If we are not willing to add code for the missing Postgres XC/XL
> features, Postgres XC/XL will probably remain a separate fork of
> Postgres.  I don't think anyone knows the answer to this question, and I
> don't know how to find the answer except to keep going with our current
> FDW sharding approach.
>

I have nothing against particular FDW advances. However, it's unclear for
me that FDW should be the only sharding approach.
It's unproven that FDW can do work that Postgres XC/XL does. With FDW we
can have some low-hanging fruits. That's good.
But it's unclear we can have high-hanging fruits (like data redistribution)
with FDW approach. And if we can it's unclear that it would be easier than
with other approaches.
Just let's don't call this community chosen plan for implementing sharding.
Until we have full picture we can't select one way and reject others.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Relation cache invalidation on replica

2016-02-26 Thread Alexander Korotkov
On Fri, Feb 26, 2016 at 3:41 PM, Васильев Дмитрий  wrote:

> Session opened on replica doesn't see concurrently created indexes at this
> time on master.
>

As I get, on standby index is visible when you run SQL queries on catalog
tables (that is what \d+ does), but planner doesn't pick it. Assuming that
in new session planner picks this index, it seems to be bug for me.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Pluggable storage

2016-08-18 Thread Alexander Korotkov
On Thu, Aug 18, 2016 at 10:58 AM, Simon Riggs  wrote:

> On 16 August 2016 at 19:46, Andres Freund  wrote:
> > On 2016-08-15 12:02:18 -0400, Robert Haas wrote:
> >> Thanks for taking a stab at this.  I'd like to throw out a few concerns.
> >>
> >> One, I'm worried that adding an additional layer of pointer-jumping is
> >> going to slow things down and make Andres' work to speed up the
> >> executor more difficult.  I don't know that there is a problem there,
> >> and if there is a problem I don't know what to do about it, but I
> >> think it's something we need to consider.
> >
> > I'm quite concerned about that as well.
>
> This objection would apply to all other proposals as well, FDW etc..
>
> Do you see some way to add flexibility yet without adding a branch
> point in the code?
>

It's impossible without branch point in code.  The question is where this
branch should be located.
In particular, be can put this branch point into planner by defining
distinct executor nodes for each pluggable storage.  In this case, each
storage would have own optimized executor nodes.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


[HACKERS] Should we cacheline align PGXACT?

2016-08-18 Thread Alexander Korotkov
Hackers,

originally this idea was proposed by Andres Freund while experimenting with
lockfree Pin/UnpinBuffer [1].
The patch is attached as well as results of pgbench -S on 72-cores
machine.  As before it shows huge benefit in this case.
For sure, we should validate that it doesn't cause performance regression
in other cases.  At least we should test read-write and smaller machines.
Any other ideas?

1.
https://www.postgresql.org/message-id/20160411214029.ce3fw6zxim5k6...@alap3.anarazel.de

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


pgxact-align-1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should we cacheline align PGXACT?

2016-08-19 Thread Alexander Korotkov
On Fri, Aug 19, 2016 at 4:46 PM, Tom Lane  wrote:

> Alexander Korotkov  writes:
> > originally this idea was proposed by Andres Freund while experimenting
> with
> > lockfree Pin/UnpinBuffer [1].
> > The patch is attached as well as results of pgbench -S on 72-cores
> > machine.  As before it shows huge benefit in this case.
>
> That's one mighty ugly patch.  Can't you do it without needing to
> introduce the additional layer of struct nesting?
>

That's worrying me too.
We could use anonymous struct, but it seems to be prohibited in C89 which
we stick to.
Another idea, which comes to my mind, is to manually calculate size of
padding and insert it directly to PGXACT struct.  But that seems rather
ugly too.  However, it would be ugly definition not ugly usage...
Do you have better ideas?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Should we cacheline align PGXACT?

2016-08-19 Thread Alexander Korotkov
On Fri, Aug 19, 2016 at 3:19 PM, Amit Kapila 
wrote:

> On Fri, Aug 19, 2016 at 11:42 AM, Alexander Korotkov
>  wrote:
> > Hackers,
> >
> > originally this idea was proposed by Andres Freund while experimenting
> with
> > lockfree Pin/UnpinBuffer [1].
> > The patch is attached as well as results of pgbench -S on 72-cores
> machine.
> > As before it shows huge benefit in this case.
> > For sure, we should validate that it doesn't cause performance
> regression in
> > other cases.  At least we should test read-write and smaller machines.
> > Any other ideas?
> >
>
> may be test on Power m/c as well.
>

Good idea.  I don't have any such machine at hand now.  Do you have one?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Tracking wait event for latches

2016-08-22 Thread Alexander Korotkov
Hi, Michael!

On Thu, Aug 4, 2016 at 8:26 AM, Michael Paquier 
wrote:

> On Tue, Aug 2, 2016 at 10:31 PM, Michael Paquier
>  wrote:
> > Attached is an updated patch.
>
> Updated version for 2 minor issues:
> 1) s/stram/stream/
> 2) Docs used incorrect number
>

I took a look at your patch. Couple of notes from me.

const char *
> GetEventIdentifier(uint16 eventId)
> {
> const char *res;
> switch (eventId)
> {
> case EVENT_ARCHIVER_MAIN:
> res = "ArchiverMain";
> break;
> ... long long list of events ...
> case EVENT_WAL_SENDER_WRITE_DATA:
> res = "WalSenderWriteData";
> break;
> default:
> res = "???";
> }
> return res;
> }


Would it be better to use an array here?

typedef enum EventIdentifier
> {


EventIdentifier seems too general name for me, isn't it?  Could we name it
WaitEventIdentifier? Or WaitEventId for shortcut?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Tracking wait event for latches

2016-08-22 Thread Alexander Korotkov
On Mon, Aug 22, 2016 at 12:09 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> I took a look at your patch. Couple of notes from me.
>
> const char *
>> GetEventIdentifier(uint16 eventId)
>> {
>> const char *res;
>> switch (eventId)
>> {
>> case EVENT_ARCHIVER_MAIN:
>> res = "ArchiverMain";
>> break;
>> ... long long list of events ...
>> case EVENT_WAL_SENDER_WRITE_DATA:
>> res = "WalSenderWriteData";
>> break;
>> default:
>> res = "???";
>> }
>> return res;
>> }
>
>
> Would it be better to use an array here?
>
> typedef enum EventIdentifier
>> {
>
>
> EventIdentifier seems too general name for me, isn't it?  Could we name it
> WaitEventIdentifier? Or WaitEventId for shortcut?
>

I'm also not sure about handling of secure_read() and secure_write()
functions.
In the current patch we're tracking latch event wait inside them.  But we
setup latch only for blocking sockets and can do it multiple times in loop.
Would it be better to make separate wait events NETWORK_READ and
NETWORK_WRITE and setup them for the whole time spent in secure_read()
and secure_write()?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Block level parallel vacuum WIP

2016-08-23 Thread Alexander Korotkov
On Tue, Aug 23, 2016 at 3:32 PM, Tom Lane  wrote:

> Claudio Freire  writes:
> > Not only that, but from your description (I haven't read the patch,
> > sorry), you'd be scanning the whole index multiple times (one per
> > worker).
>
> What about pointing each worker at a separate index?  Obviously the
> degree of concurrency during index cleanup is then limited by the
> number of indexes, but that doesn't seem like a fatal problem.
>

+1
We could eventually need some effective way of parallelizing vacuum of
single index.
But pointing each worker at separate index seems to be fair enough for
majority of cases.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Proposal for CSN based snapshots

2016-08-24 Thread Alexander Korotkov
On Wed, Aug 24, 2016 at 11:54 AM, Heikki Linnakangas 
wrote:

> On 08/23/2016 06:18 PM, Heikki Linnakangas wrote:
>
>> On 08/22/2016 08:38 PM, Andres Freund wrote:
>>
>>> On 2016-08-22 20:32:42 +0300, Heikki Linnakangas wrote:
>>>
>>>> I
>>>> remember seeing ProcArrayLock contention very visible earlier, but I
>>>> can't
>>>> hit that now. I suspect you'd still see contention on bigger hardware,
>>>> though, my laptop has oly 4 cores. I'll have to find a real server for
>>>> the
>>>> next round of testing.
>>>>
>>>
>>> Yea, I think that's true. I can just about see ProcArrayLock contention
>>> on my more powerful laptop, to see it really bad you need bigger
>>> hardware / higher concurrency.
>>>
>>
>> As soon as I sent my previous post, Vladimir Borodin kindly offered
>> access to a 32-core server for performance testing. Thanks Vladimir!
>>
>> I installed Greg Smith's pgbench-tools kit on that server, and ran some
>> tests. I'm seeing some benefit on "pgbench -N" workload, but only after
>> modifying the test script to use "-M prepared", and using Unix domain
>> sockets instead of TCP to connect. Apparently those things add enough
>> overhead to mask out the little difference.
>>
>> Attached is a graph with the results. Full results are available at
>> https://hlinnaka.iki.fi/temp/csn-4-results/. In short, the patch
>> improved throughput, measured in TPS, with >= 32 or so clients. The
>> biggest difference was with 44 clients, which saw about 5% improvement.
>>
>> So, not phenomenal, but it's something. I suspect that with more cores,
>> the difference would become more clear.
>>
>> Like on a cue, Alexander Korotkov just offered access to a 72-core
>> system :-). Thanks! I'll run the same tests on that.
>>
>
> And here are the results on the 72 core machine (thanks again,
> Alexander!). The test setup was the same as on the 32-core machine, except
> that I ran it with more clients since the system has more CPU cores. In
> summary, in the best case, the patch increases throughput by about 10%.
> That peak is with 64 clients. Interestingly, as the number of clients
> increases further, the gain evaporates, and the CSN version actually
> performs worse than unpatched master. I don't know why that is. One theory
> that by eliminating one bottleneck, we're now hitting another bottleneck
> which doesn't degrade as gracefully when there's contention.
>

Did you try to identify this second bottleneck with perf or something?
It would be nice to also run pgbench -S.  Also, it would be nice to check
something like 10% of writes, 90% of reads (which is quite typical workload
in real life I believe).

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] PoC: Partial sort

2016-09-13 Thread Alexander Korotkov
On Fri, Apr 8, 2016 at 10:09 PM, Peter Geoghegan  wrote:

> On Wed, Mar 30, 2016 at 8:02 AM, Alexander Korotkov
>  wrote:
> > Hmm... I'm not completely agree with that. In typical usage partial sort
> > should definitely use quicksort.  However, fallback to other sort
> methods is
> > very useful.  Decision of partial sort usage is made by planner.  But
> > planner makes mistakes.  For example, our HashAggregate is purely
> in-memory.
> > In the case of planner mistake it causes OOM.  I met such situation in
> > production and not once.  This is why I'd like partial sort to have
> graceful
> > degradation for such cases.
>
> I think that this should be moved to the next CF, unless a committer
> wants to pick it up today.
>

Patch was rebased to current master.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


partial-sort-basic-9.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Incremental sort

2017-10-03 Thread Alexander Korotkov
On Tue, Oct 3, 2017 at 2:52 PM, Robert Haas  wrote:

> On Mon, Oct 2, 2017 at 12:37 PM, Alexander Korotkov
>  wrote:
> > I've applied patch on top of c12d570f and rerun the same benchmarks.
> > CSV-file with results is attached.  There is no dramatical changes.
> There
> > is still minority of performance regression cases while majority of cases
> > has improvement.
>
> Yes, I think these results look pretty good.  But are these times in
> seconds?  You might need to do some testing with bigger sorts.


Good point.  I'll rerun benchmarks with larger dataset size.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] 64-bit queryId?

2017-10-03 Thread Alexander Korotkov
On Tue, Oct 3, 2017 at 5:55 PM, Robert Haas  wrote:

> On Mon, Oct 2, 2017 at 8:07 PM, Alexander Korotkov
>  wrote:
> > +1,
> > I see 3 options there:
> > 1) Drop high-order bit, as you proposed.
> > 2) Allow negative queryIds.
> > 3) Implement unsigned 64-type.
> >
> > #1 causes minor loss of precision which looks rather insignificant in
> given
> > context.
> > #2 might be rather unexpected for users whose previously had non-negative
> > queryIds.  Changing queryId from 32-bit to 64-bit itself might require
> some
> > adoption from monitoring software. But queryIds are user-visible, and
> > negative queryIds would look rather nonlogical.
> > #3 would be attaching hard and long-term problem by insufficient reason.
> > Thus, #1 looks like most harmless solution.
>
> I think we should just allow negative queryIds.  I mean, the hash
> functions have behaved that way for a long time:
>
> rhaas=# select hashtext('');
>   hashtext
> -
>  -1477818771
> (1 row)
>
> It seems silly to me to throw away a perfectly good bit from the hash
> value just because of some risk of minor user confusion.  I do like
> Michael's suggestion of outputing hexa-like text, but changing the
> types of the user-visible output columns seems like a job for another
> patch.  Similarly, if we were to adopt Andres's suggestions of a new
> type or using numeric or Alexander's suggestion of implementing a
> 64-bit unsigned type, I think it should be a separate patch from this
> one.
>
> I would note that such a patch will actually create real
> incompatibility -- extension upgrades might fail if there are
> dependent views, for example -- whereas merely having query IDs start
> to sometimes be negative only creates an incompatibility for people
> who assumed that the int64 type wouldn't actually use its full range
> of allowable values.  I don't deny that someone may have done that, of
> course, but I wouldn't guess that it's a common thing... maybe I'm
> wrong.
>

BTW, you didn't comment Tom's suggestion about dropping high-order bit
which trades minor user user confusion to minor loss of precision.
TBH, for me it's not so important whether we allow negative queryIds or
drop high-order bit.  I would be anyway very good to have 64-(or 63-)bit
queryIds committed.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


[HACKERS] Add TOAST to system tables with ACL?

2017-10-03 Thread Alexander Korotkov
Hi!

This topic was already discussed (at least one time) in 2011.  See [1] for
details.  I'd like to raise that again.

Currently, we have table in system catalog with ACL, but without TOAST.
Thus, it might happen that we can't fit new ACL item, because row becomes
too large for in-line storage.

You can easily reproduce this situation in psql.

create table t (col int);
\copy (select 'create user u' || i || ';' from generate_series(1,1) i)
to 'script.sql'
\i script.sql
\copy (select 'grant all on t to u' || i || ';' from
generate_series(1,1) i) to 'script.sql'
\i script.sql

Eventually GRANT statements start to raise error.
psql:script.sql:2447: ERROR:  row is too big: size 8168, maximum size 8160

I understand that we shouldn't endorse users to behave like this.  We
should rather advise them to evade adding too many ACL items to single
object by using roles.  And that would be way easier to manage too.

However, current PostgreSQL behavior is rather unexpected and
undocumented.  Thus, I would say it's a bug.  This bug would be nice to fix
even if long ACL lists would work slower than normal.

In the discussion to the post [1] Tom comments that he isn't excited about
out-of-line ACL entry unless it's proven that performance doesn't
completely tank in this case.

I've done some basic experiments in this field on my laptop.  Attached
draft patch adds TOAST to all system catalog tables with ACL.  I've run
pgbench with custom script "SELECT * FROM t;" where t is empty table with
long ACL entry.  I've compared results with 1000 ACL items (in-line
storage) and 1 ACL items (out-of-line storage).

Also, I've notice performance degradation of GRANT statements themselves.
 1000 GRANT statements are executed in 1.5 seconds while 1 GRANT
statements are executed in 42 seconds.  In average single GRANT statements
becomes 2.8 times slower.  That's significant degradation, but it doesn't
seem to be fatal degradation for me.

Results of pgbench are presented in following table.

  Number of ACL items   | -M simple | -M prepared
+---+-
 1000 (in-line storage) |  6623 |7242
1 (out-of-line storage) | 14498 |   17827

So, it's 2.1-2.4 times degradation in this case.  That's very significant
degradation, but I wouldn't day that "performance completely tank".

Any thoughts?  Should we consider TOASTing ACL entries or should we give up
with this?

Links:

 1. https://www.commandprompt.com/blog/grant_schema_usage_
to_2500_users_no_can_do/

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


acl-toast-1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add TOAST to system tables with ACL?

2017-10-03 Thread Alexander Korotkov
On Tue, Oct 3, 2017 at 9:19 PM, Tom Lane  wrote:

> Alexander Korotkov  writes:
> > This topic was already discussed (at least one time) in 2011.  See [1]
> for
> > details.  I'd like to raise that again.
>
> I'm a bit worried about adding a toast table to pg_class, and more so
> about pg_database, because both of those have to be accessed in situations
> where it's not clear that we could successfully fetch from a toast table,
> because too little of the catalog access infrastructure is alive.
>
> pg_class is probably all right as long as only the ACL field could ever
> get toasted, since it's unlikely that any low-level accesses would be
> paying attention to that field anyway.
>
> For pg_database, you'd have to make sure that the startup-time check of
> database CONNECT privilege still works if the ACL's been pushed out of
> line.
>

Thank you for pointing.  I'll check this.

> Also, I've notice performance degradation of GRANT statements themselves.
> >  1000 GRANT statements are executed in 1.5 seconds while 1 GRANT
> > statements are executed in 42 seconds.  In average single GRANT
> statements
> > becomes 2.8 times slower.  That's significant degradation, but it doesn't
> > seem to be fatal degradation for me.
>
> Seems all right, since we could just say "we don't really recommend that
> usage pattern".
>

Yes, sure.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] [PATCH] Pageinspect - add functions on GIN and GiST indexes from gevel

2017-10-03 Thread Alexander Korotkov
On Wed, Sep 13, 2017 at 10:57 AM, Ashutosh Sharma 
wrote:

> On Wed, Sep 13, 2017 at 1:15 PM, Alexey Chernyshov
>  wrote:
> > On Sat, 9 Sep 2017 13:53:35 +0530
> > Ashutosh Sharma  wrote:
> >
> >>
> >> Finally, i got some time to look into this patch and surprisingly I
> >> didn't find any function returning information at page level instead
> >> all the SQL functions are returning information at index level.
> >> Therefore, i too feel that it should be either integrated with
> >> pgstattuple which could a better match as Tomas also mentioned or may
> >> be add a new contrib module itself. I think, adding a new contrib
> >> module looks like a better option. The reason being, it doesn't simply
> >> include the function for showing index level statistics (for e.g..
> >> index size, no of rows, values..e.t.c) like pgstattuple does but,
> >> also, displays information contained in a page for e.g. the object
> >> stored in the page,  it's tid e.t.c. So, more or less, I would say
> >> that, it's the mixture of pageinspect and pgstattuple module and
> >> therefore, i feel, adding it as a new extension would be a better
> >> choice. Thought?
> >
> > Thank you for your interest, I will add a new contrib module named,
> > say, indexstat. I think we can add statistics on other indexes in the
> > future.
> >
>
> I think we should wait for experts opinion and then take a call. I am
> not expert. I just gave my opinion as i have worked in this area
> earlier when working for hash index. Thanks.


I'm not sure that we should port these functions from gevel directly.  We
could try to re-implement similar functionality which fits pageinspect
approach better.  In particular, we can implement some low-level functions
whose show detailed information at page level.  And on top of them we can
implement analogues of gevel functions in SQL or PL/pgSQL.  Is it feasible?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Add TOAST to system tables with ACL?

2017-10-04 Thread Alexander Korotkov
On Tue, Oct 3, 2017 at 9:19 PM, Tom Lane  wrote:

> For pg_database, you'd have to make sure that the startup-time check of
> database CONNECT privilege still works if the ACL's been pushed out of
> line.
>

I've tried this case.

At first, make database temp with no connect privilege from public and
1 users.
create database temp;
revoke connect on database temp from public;
\copy (select 'create user u' || i || ';' from generate_series(1,1) i)
to 'script.sql'
\i script.sql

I've checked that user u1 can't login to database temp.
$ psql temp -U u1
psql: FATAL:  permission denied for database "temp"
DETAIL:  User does not have CONNECT privilege.

Than I grant connect privilege to all that 1 users.
\copy (select 'grant connect on database temp to u' || i || ';' from
generate_series(1,1) i) to 'script.sql'
\i script.sql

Then user u1 can login successfully.
$ psql temp -U u1
psql (11devel)
Type "help" for help.

u10000@temp=#

Thus, in this simple case database CONNECT privilege works with out-of-line
ACL for me.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] PoC: full merge join on comparison clause

2017-10-04 Thread Alexander Kuzmenkov
As discussed earlier, I changed the way we work with mergeopfamilies. I 
use the "is_equality" flag to indicate whether the clause is an equality 
one, and fill mergeopfamilies for both equality and inequality operators.

The updated patch is attached (rebased to 20b6552242).

--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c
index 925b4cf553..73e6a4ca74 100644
--- a/src/backend/executor/nodeMergejoin.c
+++ b/src/backend/executor/nodeMergejoin.c
@@ -172,31 +172,32 @@ typedef enum
  * to the opfamily and collation, with nulls at the indicated end of the range.
  * This allows us to obtain the needed comparison function from the opfamily.
  */
-static MergeJoinClause
+static void
 MJExamineQuals(List *mergeclauses,
 			   Oid *mergefamilies,
 			   Oid *mergecollations,
 			   int *mergestrategies,
 			   bool *mergenullsfirst,
-			   PlanState *parent)
+			   MergeJoinState *parent)
 {
-	MergeJoinClause clauses;
 	int			nClauses = list_length(mergeclauses);
 	int			iClause;
 	ListCell   *cl;
 
-	clauses = (MergeJoinClause) palloc0(nClauses * sizeof(MergeJoinClauseData));
+	parent->mj_Clauses = (MergeJoinClause) palloc0(nClauses * sizeof(MergeJoinClauseData));
+	parent->mj_UseEqual = (bool *) palloc0(nClauses * sizeof(bool));
+	parent->mj_UseLesser = (bool *) palloc0(nClauses * sizeof(bool));
 
 	iClause = 0;
 	foreach(cl, mergeclauses)
 	{
 		OpExpr	   *qual = (OpExpr *) lfirst(cl);
-		MergeJoinClause clause = &clauses[iClause];
+		MergeJoinClause clause = &parent->mj_Clauses[iClause];
 		Oid			opfamily = mergefamilies[iClause];
 		Oid			collation = mergecollations[iClause];
-		StrategyNumber opstrategy = mergestrategies[iClause];
+		StrategyNumber sort_op_strategy = mergestrategies[iClause];
 		bool		nulls_first = mergenullsfirst[iClause];
-		int			op_strategy;
+		int			join_op_strategy;
 		Oid			op_lefttype;
 		Oid			op_righttype;
 		Oid			sortfunc;
@@ -207,28 +208,55 @@ MJExamineQuals(List *mergeclauses,
 		/*
 		 * Prepare the input expressions for execution.
 		 */
-		clause->lexpr = ExecInitExpr((Expr *) linitial(qual->args), parent);
-		clause->rexpr = ExecInitExpr((Expr *) lsecond(qual->args), parent);
+		clause->lexpr = ExecInitExpr((Expr *) linitial(qual->args), (PlanState *) parent);
+		clause->rexpr = ExecInitExpr((Expr *) lsecond(qual->args), (PlanState *) parent);
 
 		/* Set up sort support data */
 		clause->ssup.ssup_cxt = CurrentMemoryContext;
 		clause->ssup.ssup_collation = collation;
-		if (opstrategy == BTLessStrategyNumber)
+		if (sort_op_strategy == BTLessStrategyNumber)
 			clause->ssup.ssup_reverse = false;
-		else if (opstrategy == BTGreaterStrategyNumber)
+		else if (sort_op_strategy == BTGreaterStrategyNumber)
 			clause->ssup.ssup_reverse = true;
 		else	/* planner screwed up */
-			elog(ERROR, "unsupported mergejoin strategy %d", opstrategy);
+			elog(ERROR, "unsupported mergejoin strategy %d", sort_op_strategy);
 		clause->ssup.ssup_nulls_first = nulls_first;
 
 		/* Extract the operator's declared left/right datatypes */
 		get_op_opfamily_properties(qual->opno, opfamily, false,
-   &op_strategy,
+   &join_op_strategy,
    &op_lefttype,
    &op_righttype);
-		if (op_strategy != BTEqualStrategyNumber)	/* should not happen */
-			elog(ERROR, "cannot merge using non-equality operator %u",
- qual->opno);
+
+		/*
+		 * Determine whether we accept lesser and/or equal tuples of the inner
+		 * relation.
+		 */
+		switch (join_op_strategy)
+		{
+			case BTEqualStrategyNumber:
+parent->mj_UseEqual[iClause] = true;
+break;
+
+			case BTLessEqualStrategyNumber:
+parent->mj_UseEqual[iClause] = true;
+/* fall through */
+
+			case BTLessStrategyNumber:
+parent->mj_UseLesser[iClause] = true;
+break;
+
+			case BTGreaterEqualStrategyNumber:
+parent->mj_UseEqual[iClause] = true;
+/* fall through */
+
+			case BTGreaterStrategyNumber:
+parent->mj_UseLesser[iClause] = true;
+break;
+
+			default:
+elog(ERROR, "unsupported join strategy %d", join_op_strategy);
+		}
 
 		/*
 		 * sortsupport routine must know if abbreviation optimization is
@@ -265,8 +293,6 @@ MJExamineQuals(List *mergeclauses,
 
 		iClause++;
 	}
-
-	return clauses;
 }
 
 /*
@@ -378,6 +404,14 @@ MJEvalInnerValues(MergeJoinState *mergestate, TupleTableSlot *innerslot)
 	return result;
 }
 
+/* Tuple comparison result */
+typedef enum
+{
+	MJCR_NextInner = 1,
+	MJCR_NextOuter = -1,
+	MJCR_Join = 0
+} MJCompareResult;
+
 /*
  * MJCompare
  *
@@ -388,10 +422,10 @@ MJEvalInnerValues(MergeJoinState *mergestate, TupleTableSlot *innerslot)
  * MJEvalOuterValues and MJEvalInnerValues must already have been called
  * for the current outer and inner tuples, re

Re: [HACKERS] Proposal: Improve bitmap costing for lossy pages

2017-10-06 Thread Alexander Kuzmenkov



Analysis: The estimated value of the lossy_pages is way higher than
its actual value and reason is that the total_pages calculated by the
"Mackert and Lohman formula" is not correct.


I think the problem might be that the total_pages includes cache effects 
and rescans. For bitmap entries we should use something like relation 
pages * selectivity.


Meanwhile, I ran TPC-H briefly with the v3 patch: scale factor 25, 2 
workers, SSD storage.

It shows significant improvement on 4MB work_mem and no change on 2GB.

Here are the results (execution time in seconds, average of 5 runs):
work_mem4MB2GB
Query masterpatchmasterpatch
4179174168167
5190168155156
628087227229
8197114179172
10269227190192
14110108106    105

--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Predicate Locks for writes?

2017-10-09 Thread Alexander Korotkov
On Sat, Oct 7, 2017 at 2:26 PM, Simon Riggs  wrote:

> SERIALIZABLE looks for chains of rw cases.
>
> When we perform UPDATEs and DELETEs we search for rows and then modify
> them. The current implementation views that as a read followed by a
> write because we issue PredicateLockTuple() during the index fetch.
>
> Is it correct that a statement that only changes data will add
> predicate locks for the rows that it modifies?
>

I'm not very aware of how this SI code exactly works.  But it seems that
once we update row, read SIReadLock on that tuple is released, because
we're already holding stronger lock.  I made simple experiment.

# begin;
BEGIN
Time: 0,456 ms
smagen@postgres=*# select * from test where id = 1;
 id │ val
┼─
  1 │ xyz
(1 row)

*# select locktype, database, relation, page, tuple, mode, granted from
pg_locks where pid = pg_backend_pid();
  locktype  │ database │ relation │ page │ tuple │  mode   │ granted
┼──┼──┼──┼───┼─┼─
 relation   │12558 │11577 │ NULL │  NULL │ AccessShareLock │ t
 relation   │12558 │65545 │ NULL │  NULL │ AccessShareLock │ t
 relation   │12558 │65538 │ NULL │  NULL │ AccessShareLock │ t
 virtualxid │ NULL │ NULL │ NULL │  NULL │ ExclusiveLock   │ t
 page   │12558 │65545 │1 │  NULL │ SIReadLock  │ t
 tuple  │12558 │65538 │0 │ 3 │ SIReadLock  │ t
(6 rows)

*# update test set val = 'xyz' where id = 1;

*# select locktype, database, relation, page, tuple, mode, granted from
pg_locks where pid = pg_backend_pid();
   locktype│ database │ relation │ page │ tuple │   mode   │
granted
───┼──┼──┼──┼───┼──┼─
 relation  │12558 │11577 │ NULL │  NULL │ AccessShareLock  │ t
 relation  │12558 │65545 │ NULL │  NULL │ AccessShareLock  │ t
 relation  │12558 │65545 │ NULL │  NULL │ RowExclusiveLock │ t
 relation  │12558 │65538 │ NULL │  NULL │ AccessShareLock  │ t
 relation  │12558 │65538 │ NULL │  NULL │ RowExclusiveLock │ t
 virtualxid│ NULL │ NULL │ NULL │  NULL │ ExclusiveLock│ t
 transactionid │ NULL │ NULL │ NULL │  NULL │ ExclusiveLock│ t
 page  │12558 │65545 │1 │  NULL │ SIReadLock   │ t
(8 rows)

Once we update the same tuple that we read before, SIReadLock on that tuple
disappears.

PredicateLockTuple() specifically avoids adding an SIRead lock if the
> tuple already has a write lock on it, so surely it must also be
> correct to skip the SIRead lock if we are just about to update the
> row?
>
> I am suggesting that we ignore PredicateLockTuple() for cases where we
> are about to update or delete the found tuple.
>

If my thoughts above are correct, than it would be a reasonable
optimization.  We would skip cycle of getting SIReadLock on tuple and then
releasing it, without any change of behavior.


> ISTM that a Before Row Trigger would need to add an SIRead lock since
> that is clearly a read.
>

Yes, because depending on result of "Before Row Trigger" update might not
really happen.  That SIReadLock on tuple is necessary.
However, ISTM that we could place SIReadLock on tuple after Before Row
Trigger and only in the case when trigger has cancelled an update.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] GSoC 2017 : Patch for predicate locking in Gist index

2017-10-09 Thread Alexander Korotkov
On Thu, Oct 5, 2017 at 9:48 PM, Shubham Barai 
wrote:

> On 3 October 2017 at 00:32, Alexander Korotkov 
> wrote:
>
>> On Mon, Oct 2, 2017 at 9:11 PM, Andrew Borodin 
>> wrote:
>>
>>> On Mon, Oct 2, 2017 at 8:00 PM, Alexander Korotkov
>>>  wrote:
>>> > What happen if exactly this "continue" fires?
>>> >
>>> >> if (GistFollowRight(stack->page))
>>> >> {
>>> >> if (!xlocked)
>>> >> {
>>> >> LockBuffer(stack->buffer, GIST_UNLOCK);
>>> >> LockBuffer(stack->buffer, GIST_EXCLUSIVE);
>>> >> xlocked = true;
>>> >> /* someone might've completed the split when we unlocked */
>>> >> if (!GistFollowRight(stack->page))
>>> >> continue;
>>> >
>>> >
>>> > In this case we might get xlocked == true without calling
>>> > CheckForSerializableConflictIn().
>>> Indeed! I've overlooked it. I'm remembering this issue, we were
>>> considering not fixing splits if in Serializable isolation, but
>>> dropped the idea.
>>>
>>
>> Yeah, current insert algorithm assumes that split must be fixed before we
>> can correctly traverse the tree downwards.
>>
>>
>>> CheckForSerializableConflictIn() must be after every exclusive lock.
>>>
>>
>> I'm not sure, that fixing split is the case to necessary call
>> CheckForSerializableConflictIn().  This lock on leaf page is not taken
>> to do modification of the page.  It's just taken to ensure that nobody else
>> is fixing this split the same this.  After fixing the split, it might
>> appear that insert would go to another page.
>>
>> > I think it would be rather safe and easy for understanding to more
>>> > CheckForSerializableConflictIn() directly before gistinserttuple().
>>> The difference is that after lock we have conditions to change page,
>>> and before gistinserttuple() we have actual intent to change page.
>>>
>>> From the point of future development first version is better (if some
>>> new calls occasionally spawn in), but it has 3 calls while your
>>> proposal have 2 calls.
>>> It seems to me that CheckForSerializableConflictIn() before
>>> gistinserttuple() is better as for now.
>>>
>>
>> Agree.
>>
>
> I have updated the location of  CheckForSerializableConflictIn()  and
> changed the status of the patch to "needs review".
>

Now, ITSM that predicate locks and conflict checks are placed right for now.
However, it would be good to add couple comments to gistdoinsert() whose
would state why do we call CheckForSerializableConflictIn() in these
particular places.

I also take a look at isolation tests.  You made two separate test specs:
one to verify that serialization failures do fire, and another to check
there are no false positives.
I wonder if we could merge this two test specs into one, but use more
variety of statements with different keys for both inserts and selects.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Pluggable storage

2017-10-09 Thread Alexander Korotkov
On Wed, Sep 27, 2017 at 7:51 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> I took a look on this patch.  I've following notes for now.
>
> tuple_insert_hook tuple_insert; /* heap_insert */
>> tuple_update_hook tuple_update; /* heap_update */
>> tuple_delete_hook tuple_delete; /* heap_delete */
>
>
> I don't think this set of functions provides good enough level of
> abstraction for storage AM.  This functions encapsulate only low-level work
> of insert/update/delete tuples into heap itself.  However, it still assumed
> that indexes are managed outside of storage AM.  I don't think this is
> right, assuming that one of most demanded storage API usage would be
> different MVCC implementation (for instance, UNDO log based as discussed
> upthread).  Different MVCC implementation is likely going to manage indexes
> in a different way.  For example, storage AM utilizing UNDO would implement
> in-place update even when indexed columns are modified.  Therefore this
> piece of code in ExecUpdate() wouldn't be relevant anymore.
>
> /*
>> * insert index entries for tuple
>> *
>> * Note: heap_update returns the tid (location) of the new tuple in
>> * the t_self field.
>> *
>> * If it's a HOT update, we mustn't insert new index entries.
>> */
>> if ((resultRelInfo->ri_NumIndices > 0) && 
>> !storage_tuple_is_heaponly(resultRelationDesc,
>> tuple))
>> recheckIndexes = ExecInsertIndexTuples(slot, &(slot->tts_tid),
>>   estate, false, NULL, NIL);
>
>
> I'm firmly convinced that this logic should be encapsulated into storage
> AM altogether with inserting new index tuples on storage insert.  Also, HOT
> should be completely encapsulated into heapam.  It's quite evident for me
> that storage utilizing UNDO wouldn't have anything like our current HOT.
> Therefore, I think there shouldn't be hot_search_buffer() API function.
>  tuple_fetch() may cover hot_search_buffer(). That might require some
> signature change of tuple_fetch() (probably, extra arguments).
>

For me, it's crucial point that pluggable storages should be able to have
different MVCC implementation, and correspondingly have full control over
its interactions with indexes.
Thus, it would be good if we would get consensus on that point.  I'd like
other discussion participants to comment whether they agree/disagree and
why.
Any comments?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


  1   2   3   4   5   6   7   8   9   10   >