date:20180505

Re: [HACKERS] Clock with Adaptive Replacement

2018-05-05 Thread Yura Sokolov

05.05.2018 09:16, Andrey Borodin пишет:
> Hi!
> 
>> 4 мая 2018 г., в 16:05, Юрий Соколов 
>> написал(а):
>> 
>> I didn't suggest log scale of usages, but rather
>> "replacement-period based" increment: usage count could be
>> incremented at most once in NBlocks/32 replaced items. Once it is
>> incremented, its "replacement time" is remembered, and next
>> NBlocks/32 replacements usage count of this buffer doesn't
>> increment. This way, increment is synchronized with replacement
>> activity.
> 
> But you loose difference between "touched once" and "actively used".
> Log scale of usage solves this: usage count grows logarithmically,
> but drains linearly.
No, I didn't loose difference. But instead of absolute value (log scale
or linear) I count how often in time block is used:
- if buffer were touched 1000 times just after placing into
shared_buffers should it live 500 times longer than neighbor that were
touched only 2 times? or 10 times longer? or 5 times longer?
- but what if that "1000 times" buffer were not touched in next period
of time, but neighbor were touched again 2 times?
All effective algorithms answers: "1000 times" buffer should be evicted
first, but its neighbor is a really hot buffer that should be saved for
longer period.

Log scale doesn't solve this. But increment "once in period" solves.
Especially if block is placed first with zero count (instead of 1 as
currently).

>> Digging further, I suggest as improvement of GClock algorithm: -
>> placing new buffer with usage count = 0 (and next NBlock/32
>> replacements its usage count doesn't increased)
>> - increment not by 1, but by 8 (it simulates "hot queue" of
>> popular algorithms) with limit 32.
>> - scan at most 25 buffers for eviction. If no buffer with zero
>> usage count found, the least used buffer (among scanned 25) is evicted.
>> (new buffers are not evicted during their first NBlock/32
>> replacements).
>> 
> 
> I do not understand where these numbers come from...

I found this number by testing with several artificial traces found in web.
I don't claim this number are best. Even on that traces best values may
vary on cache size: for small cache size increment and limit tends to be
higher, for huge cache - smaller. But this were most balanced.

And I don't claim those traces are representative for PostgreSQL, that
is why I'm pushing this discussion to collect more real-world PostgreSQL
traces and make them public.

And I believe my algorithm is not the best. Clock-Pro and ARC shows
better results on that traces. Tiny-LFU - cache admission algorithm -
may be even more efficient (in term of evictions). But results should be
rechecked with PostgreSQL traces.

My algorithm will be just least invasive for current code, imho.

With regards,
Sokolov Yura



signature.asc
Description: OpenPGP digital signature

Re: Explain buffers wrong counter with parallel plans

2018-05-05 Thread Amit Kapila

 On Fri, May 4, 2018 at 10:35 PM, Robert Haas  wrote:
> On Wed, May 2, 2018 at 11:37 AM, Adrien Nayrat
>  wrote:
>> In 9.6 gather node reports sum of buffers for main process + workers. In 10,
>> gather node only reports buffers from the main process.
>
> Oh, really?  Well, that sounds like a bug.  Amit seems to think it's
> expected behavior, but I don't know why it should be.
>

The reason why I think the current behavior is okay because it is
coincidental that they were displayed correctly.  We have not made any
effort to percolate it to upper nodes.  For ex., before that commit
also, it was not being displayed for Gather Merge or Gather with some
kind of node like 'Limit' where we have to stop before reaching the
end of the result.

>  The commit
> message makes it sound like it's just refactoring, but in fact it
> seems to have made a significant behavior change that doesn't look
> very desirable.
>

I think it is below part of that commit which has made this
difference, basically shutting down the workers.

if (readerdone)
{
Assert(!tup);
..
if (gatherstate->nreaders == 0)
-   {
-   ExecShutdownGatherWorkers(gatherstate);
return NULL;
-   }

The reason why we were getting different results due to above code is
that because while shutting down workers, we gather the buffer usage
of all workers in 'pgBufferUsage' via InstrAccumParallelQuery and then
upper-level node say Gather in this case would get chance to use that
stats via ExecProcNodeInstr->InstrStopNode.  However, this won't be
true in other cases where we need to exit before reaching the end of
results like in 'Limit' node case as in such cases after shutting down
the workers via ExecShutdownNode we won't do InstrStopNode for
upper-level nodes.  I think if we want that all the stats being
collected by workers should percolate to Gather and nodes above it,
then we need to somehow ensure that we always shutdown
Gather/GatherMerge before we do InstrStopNode for those nodes.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Compiler warnings with --enable-dtrace

2018-05-05 Thread Thomas Munro

Hi hackers,

--enable-dtrace produces compiler warnings about const correctness,
except on macOS.  That's because Apple's dtrace produces function
declarations in probes.h that take strings as const char * whereas
AFAIK on all other operating systems they take char * (you can see
that possibly recent difference in Apple's version of dt_header_decl()
in dt_program.c).  People have complained before[1].

Maybe we should do what the Perl people do[2] and post-process the
generated header file to add const qualifiers?  Please see attached.

I have just added --enable-dtrace to my build farm animal elver so
these warnings should appear at the next build.  I wonder if the
owners of damselfly, castoroides, protosciurus (CCed) would consider
adding it for them too so that we could get some coverage of this
build option on Illumos and Solaris.

[1] 
https://www.postgresql.org/message-id/flat/38D06FCCB225BA1C6699D4E7%40amenophis
[2] 
https://github.com/Perl/perl5/blob/a385812b685b3164e706880a72ee60c9cc9573e4/Makefile.SH#L870

-- 
Thomas Munro
http://www.enterprisedb.com


0001-Fix-const-warnings-when-building-with-enable-dtrace.patch
Description: Binary data

Re: Python 3.7 support

2018-05-05 Thread Tom Lane

Peter Eisentraut  writes:
> I have committed this now, since the release of Python 3.7 is soon.
> I'll let the build farm have a pass at it, then backport it for the
> upcoming minor releases.

If you're intending to push this into the back branches before Monday's
releases, please do it *today*, the sooner the better.  We are rapidly
closing in on the point where we'd not have a full set of buildfarm runs
done before the wrap.  That is not a good situation for a patch with
portability implications.

regards, tom lane

Re: genbki.pl not quoting keywords in postgres.bki output

2018-05-05 Thread Tom Lane

Mark Dilger  writes:
> There are not yet any examples in the postgres sources where this
> oversight causes problems, but in genbki.pl, where it makes the
> decision whether to quote a token: ...
> it should also quote anything that is a keyword, such as "open", as
> otherwise you get a syntax error during initdb.

Good point.  Up to now, I'd have written that off as one of the many
undocumented gotchas involved in construction of DATA lines.  However,
I think we've made a conscious effort in the bootstrap conversion work
to eliminate undocumented special cases, so we oughta do something to
make such cases work without extra hacking.

> This patch is not that complicated, but it does create a new coding
> requirement to keep bootparse.y and genbki.pl from getting out of sync.
> It might be simpler to just change genbki.pl to quote everything rather
> than applying this patch.  I don't have an opinion on that.

I don't like adding a lot of unnecessary quoting to the .bki file.

The patch as you have it isn't that awful from a maintenance perspective;
I don't think we've added new keywords to bootparse.y very often, so we
could surely cope with keeping genbki.pl in sync.

However, really this ought to be solved in bootparse.y itself, I think:
it should be possible to have it treat keywords like identifiers in INSERT
commands, just as (most) keywords aren't reserved in the main grammar.
Let me go poke at that idea.  If it doesn't work nicely I'll use your
patch.

regards, tom lane

citext function overloads for text parameters

2018-05-05 Thread Shay Rojansky

Hi hackers.

The following works well of course:

test=# select strpos('Aa'::citext, 'a');
 strpos

  1

However, if I pass a typed text parameter for the substring, I get
case-sensitive behavior instead:

test=# select strpos('Aa'::citext, 'a'::text);
 strpos

  2

This seems like surprising behavior - my expectation was that the first
parameter being citext would be enough to trigger case-insensitive
behavior. The same may be happening with other string functions (e.g.
regexp_matches). This is causing some difficulties in a real scenario where
SQL and parameters are getting generated by an O/RM, and changing them
isn't trivial.

Do the above seem like problematic behavior like it does to me, or is it
the expected behavior?

Shay

Re: citext function overloads for text parameters

2018-05-05 Thread Pavel Stehule

2018-05-06 8:26 GMT+02:00 Shay Rojansky :

> Hi hackers.
>
> The following works well of course:
>
> test=# select strpos('Aa'::citext, 'a');
>  strpos
> 
>   1
>
> However, if I pass a typed text parameter for the substring, I get
> case-sensitive behavior instead:
>
> test=# select strpos('Aa'::citext, 'a'::text);
>  strpos
> 
>   2
>
> This seems like surprising behavior - my expectation was that the first
> parameter being citext would be enough to trigger case-insensitive
> behavior. The same may be happening with other string functions (e.g.
> regexp_matches). This is causing some difficulties in a real scenario where
> SQL and parameters are getting generated by an O/RM, and changing them
> isn't trivial.
>
> Do the above seem like problematic behavior like it does to me, or is it
> the expected behavior?
>

This is expected - it is side effect of PostgreSQL implementation of
function overloading and type conversions

after installation citext, you will have more instances of function strpos

strpos(citext, citext)
strpos(text, text)

the call strpos('aa'::citext, 'a') is effective strpos('aa'::citext,
'a'::unknown) and that strpos(citext, citext) can be used in this case.

strpos('aa'::citext, 'a'::text) is ambiguous (both functions can be used
with necessary conversion - cast citext<->text is available), and usually
it fails with related error message - but there is a exception - the text
type is PREFERRED - what means, so strpost(text, text) is selected.

PostgreSQL type system is very generic and works almost well, but sometimes
there can be unwanted effects when some functions are overloaded. In this
case is better to implement own instance of unique function and use only it.

some like

create or replace function strpos_ci(text, text) returns int as $$ select
strpos($1::citext, $2::citext) $$ language sql;
create or replace function strpos_ci(citext, citext) returns int as $$
select strpos($1, $1) $$ language sql;

Regards

Pavel




>
> Shay
>

Re: [HACKERS] Clock with Adaptive Replacement

Re: Explain buffers wrong counter with parallel plans

Compiler warnings with --enable-dtrace

Re: Python 3.7 support

Re: genbki.pl not quoting keywords in postgres.bki output

citext function overloads for text parameters

Re: citext function overloads for text parameters

7 matches

Site Navigation

Mail list logo

Footer information