from:"Tels"

Re: Tab complete for CREATE OR REPLACE TRIGGER statement

2020-11-18 Thread Tels


On 2020-11-18 06:06, Michael Paquier wrote:

On Mon, Nov 16, 2020 at 10:14:10PM -0500, Tom Lane wrote:

Michael Paquier  writes:

I don't think that this is a requirement for this thread, though.


Agreed, I'm not trying to block this patch.  Just wishing
there were a better way.


Okay.  I have tweaked a couple of things in the patch and applied it.
I am wondering if libreadline gives the possibility to implement an
optional grouping of words to complete, but diving into its
documentation I have not found something that we could use.


To me the code looks like a prime candidate for "data-driven" 
refactoring.


It should be redone as generic code that reads a table of rules with
params and then checks and applies each. Thus the repetitive code would
be replaced by a bit more generic code and a lot of code-free data.

--
Best regards,

Tels

Re: Tab complete for CREATE OR REPLACE TRIGGER statement

2020-11-19 Thread Tels


Hello Tom,

On 2020-11-18 16:49, Tom Lane wrote:

Tels  writes:

On 2020-11-18 06:06, Michael Paquier wrote:

On Mon, Nov 16, 2020 at 10:14:10PM -0500, Tom Lane wrote:

Agreed, I'm not trying to block this patch.  Just wishing
there were a better way.



To me the code looks like a prime candidate for "data-driven"
refactoring.
It should be redone as generic code that reads a table of rules with
params and then checks and applies each. Thus the repetitive code 
would

be replaced by a bit more generic code and a lot of code-free data.


In the past I've looked into whether the rules could be autogenerated
from the backend's grammar.  It did not look very promising though.
The grammar isn't really factorized appropriately -- for instance,
tab-complete has a lot of knowledge about which kinds of objects can
be named in particular places, while the Bison productions only know
that's a "name".  Still, the precedent of ECPG suggests it might be
possible to process the grammar rules somehow to get to a useful
result.


Hm, that would be even better, for now I was just thinking that
code like this:

IF RULE_A_MATCHES THEN
DO STUFF A
ELSE IF RULE_B_MATCHES THEN
DO_STUFF_B
ELSE IF RULE_C_MATCHES THEN
DO_STUFF_C
...

should be replaced by

RULE_A MATCH STUFF
RULE_B MATCH STUFF
RULE_C MATCH STUFF
...

FOREACH RULE DO
  IF RULE.MATCH THEN
DO RULE.STUFF
END FOREACH

Even if the rules would be manually created (converted from the current
code), that would be more compact and probably less error-prone.

Creating the rule automatically turns this into a completely different
story.

--
Best regards,

Tels

Re: Parallel Aggregates for string_agg and array_agg

2018-04-05 Thread Tels

Moin,

On Wed, April 4, 2018 11:41 pm, David Rowley wrote:
> Hi Tomas,
>
> Thanks for taking another look.
>
> On 5 April 2018 at 07:12, Tomas Vondra 
> wrote:
>> Other than that, the patch seems fine to me, and it's already marked as
>> RFC so I'll leave it at that.
>
> Thanks.

I have one more comment - sorry for not writing sooner, the flu got to me ...

Somewhere in the code there is a new allocation of memory when the string
grows beyond the current size - and that doubles the size. This can lead
to a lot of wasted space (think: constructing a string that is a bit over
1 Gbyte, which would presumable allocate 2 GByte).

The same issue happens when each worker allocated 512 MByte for a 256
Mbyte + 1 result.

IMHO a factor of like 1.4 or 1.2 would work better here - not sure what
the current standard in situations like this in PG is.

>> The last obstacle seems to be the argument about the risks of the patch
>> breaking queries of people relying on the ordering. Not sure what's are
>> the right next steps in this regard ...
>
> yeah, seems like a bit of a stalemate.
>
> Personally, think if the group of people Tom mentions do exist, then
> they've already been through some troubled times since Parallel Query
> was born. I'd hope they've already put up their defenses due to the
> advent of that feature. I stand by my thoughts that it's pretty
> bizarre to draw the line here when we've probably been causing these
> people issues for many years already. I said my piece on this already
> so likely not much point in going on about it. These people are also
> perfectly capable of sidestepping the whole issue with setting
> max_parallel_workers_per_gather TO 0.

Coming from the Perl side, this issue has popped up there a lot with the
ordering of hash keys (e.g. "for my $key (keys %$hash) { ... }") and even
tho there was never a guarantee, it often caught people by surprise.
Mostly in testsuites, tho, because real output often needs some ordering,
anyway.

However, changes where nevertheless done, and code had to be adapted. But
the world didn't end, so to speak.

For PG, I think the benefits greatly outweight the problems with the
sorting order - after all, if you have code that relies on an implicit
order, you've already got a problem, no?

So my favourite would be something along these lines:

* add the string_agg
* document it in the release notes, and document workaround/solutions (add
ORDER BY, disabled workers etc.)
* if not already done, stress in the documentation that if you don't ORDER
things, things might not come out in the order you expect - especially
with paralell queries, new processing nodes etc.

Best wishes,

Tels

PS: We use string_agg() in a case where we first agg each row, then
string_agg() all rows, and the resulting string is really huge. We did run
into the "out of memory"-problem, so we now use a LIMIT and assembly the
resulting parts on the client side. My wish here would be to better know
how large the LIMIT can be, I found it quite difficult to predict with how
many rows PG runs out of memory for the string buffer, even tho all rows
have almost the same length as text. But that aside, getting the parts
faster with parallel agg would be very cool, too.

Re: perltidy version

2018-04-25 Thread Tels

Moin,

On Wed, April 25, 2018 12:35 pm, Tom Lane wrote:
> Michael Paquier  writes:
>> On Mon, Apr 23, 2018 at 12:40:00PM -0400, Tom Lane wrote:
>>> Alvaro Herrera  writes:
>>>> I still vote we use 20170521 which is recent enough that we won't have
>>>> to change it in a few years.
>
>> That's the version available on Debian sid, so from the prospective of
>> any Debian user this is a handy version to use when sending patches:
>> perltidy/unstable,now 20170521-1 all [installed]
>
> I'm not hearing any objections or counterproposals, so let's go with
> that version.
>
>>>> I further vote that we change the URL in pgindent/README from
>>>> sourceforge to metacpan.org,
>>>> https://metacpan.org/pod/distribution/Perl-Tidy/lib/Perl/Tidy.pod
>
> Agreed on pointing to cpan, but that page is pretty confusing if you're
> looking for a non-bleeding-edge version.  I suggest linking to
>
> https://cpan.metacpan.org/authors/id/S/SH/SHANCOCK/
>
> which presents a handy directory listing.

Linking to the author directory can be pretty confusing, because if a new
(co-)-maintainer releases something, it will end up not in this directory.

But it is possible to point to the specific version PG needs like so:

http://search.cpan.org/~shancock/Perl-Tidy-20170521/

That way visitor see the right version, but also have all the other data
and see all other (still existing) versions, if they want.

If that page is "more confusing" than a directory listing where you have
to pick the right file, or not, is of course debatable.

Regards,

Tels

Re: Efficient output for integer types

2019-09-23 Thread Tels


Moin,

On 2019-09-22 23:58, David Fetter wrote:

On Sat, Sep 21, 2019 at 07:29:25AM +0100, Andrew Gierth wrote:

>>>>> "David" == David Fetter  writes:



Fixed.


Good work, more performance is sure nice :)

Noticed one more thing in the patch:


-   *start++ = *a;
-   *a-- = swap;
+   memcpy(pos - 2, DIGIT_TABLE + c, 2);
+   i += 2;
}
+   else
+   *a = (char) ('0' + value2);
+
+   return olength;
}


The line "i += 2;" modifies i, but i is never used again nor returned.

Best regards,

Tels

Re: Transparent Data Encryption (TDE) and encrypted files

2019-10-01 Thread Tels


Moin,

On 2019-09-30 23:26, Bruce Momjian wrote:

For full-cluster Transparent Data Encryption (TDE), the current plan is
to encrypt all heap and index files, WAL, and all pgsql_tmp (work_mem
overflow).  The plan is:


https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#TODO_for_Full-Cluster_Encryption

We don't see much value to encrypting vm, fsm, pg_xact, pg_multixact, 
or

other files.  Is that correct?  Do any other PGDATA files contain user
data?


IMHO the general rule in crypto is: encrypt everything, or don't bother.

If you don't encrypt some things, somebody is going to find loopholes 
and sidechannels
and partial-plaintext attacks. Just a silly example: If you trick the DB 
into putting only one row per page,
any "bit-per-page" map suddenly reveals information about a single 
encrypted row that it shouldn't reveal.


Many people with a lot of free time on their hands will sit around, 
drink a nice cup of tea and come up
with all sorts of attacks on these things that you didn't (and couldn't) 
anticipate now.


So IMHO it would be much better to err on the side of caution and 
encrypt everything possible.


Best regards,

Tels

Re: Declaring a strict function returns not null / eval speed

2019-10-20 Thread Tels


Moin,

On 2019-10-20 13:30, Andreas Karlsson wrote:

On 10/1/19 9:38 AM, Andres Freund wrote:
We spend a surprising amount of time during expression evaluation to 
reevaluate whether input to a strict function (or similar) is not 
null, even though the value either comes from a strict function, or a 
column declared not null.


Now you can rightfully say that a strict function still can return 
NULL, even when called with non-NULL input. But practically that's 
quite rare. Most of the common byvalue type operators are strict, and 
approximately none of those return NULL when actually called.


That makes me wonder if it's worthwhile to invent a function property 
declaring strict strictness or such. It'd allow for some quite 
noticable improvements for e.g. queries aggregating a lot of rows, we 
spend a fair time checking whether the transition value has "turned" 
not null. I'm about to submit a patch making that less expensive, but 
it's still expensive.


I can also imagine that being able to propagate NOT NULL further up 
the parse-analysis tree could be beneficial for planning, but I've not 
looked at it in any detail.


Agreed, this sounds like something useful to do since virtually all
strict functions cannot return NULL, especially the ones which are
used in tight loops. The main design issue seems to be to think up a
name for this new level of strictness which is not too confusing for
end users.


STRICT NONULL? That way you could do

  CREATE FUNCTION f1 ... STRICT;
  CREATE FUNCTION f2 ... STRICT NONULL;
  CREATE FUNCTION f3 ... NONULL;

and the last wold throw "not implementet yet"? "NEVER RETURNS NULL" 
would also ryme with the existing "RETURNS NULL ON NULL INPUT", but I 
find the verbosity too high.


Best regards,

Tels

--
Best regards,

Tels

Re: Declaring a strict function returns not null / eval speed

2019-10-20 Thread Tels


Moin,

On 2019-10-20 16:27, Tom Lane wrote:

Tels  writes:

On 2019-10-20 13:30, Andreas Karlsson wrote:

Agreed, this sounds like something useful to do since virtually all
strict functions cannot return NULL, especially the ones which are
used in tight loops. The main design issue seems to be to think up a
name for this new level of strictness which is not too confusing for
end users.



STRICT NONULL? That way you could do



   CREATE FUNCTION f1 ... STRICT;
   CREATE FUNCTION f2 ... STRICT NONULL;
   CREATE FUNCTION f3 ... NONULL;



and the last wold throw "not implementet yet"? "NEVER RETURNS NULL"
would also ryme with the existing "RETURNS NULL ON NULL INPUT", but I
find the verbosity too high.


"RETURNS NOT NULL", perhaps?  That'd have the advantage of not 
requiring

any new keyword.


Hm, yes, that would be a good compromise on verbosity and align even 
better the other "RETURNS ..." variants.


I'm a little bit skeptical of the actual value of adding this 
additional

level of complexity, but I suppose we can't determine that reliably
without doing most of the work :-(


Maybe it would be possible to simulate the effect somehow? Or at least 
we could try to find practical queries where the additional information 
results in a much better plan if RETRUNS NOT NULL was set.


Best regards,

Tels

Re: What does Time.MAX_VALUE actually represent?

2018-01-01 Thread Tels

Moin,

On Sat, December 30, 2017 4:25 pm, Gavin Flower wrote:
> On 12/31/2017 03:07 AM, Dave Cramer wrote:
>> We are having a discussion on the jdbc project about dealing with
>> 24:00:00.
>>
>> https://github.com/pgjdbc/pgjdbc/pull/992#issuecomment-354507612
>>
>> Dave Cramer
>
> In Dublin (I was there 2001 to 2004), Time tables show buses just after
> midnight, such as 1:20am as running at the time 2520 - so there are
> visible close to the end of the day.  If you are looking for buses
> around midnight this is very user friendly - better than looking at the
> other end of the time table for 0120.
>
> I think logically that 24:00:00 is exactly one day later than 00:00:00 -
> but I see from following the URL, that there are other complications...

Careful here, if "24:00:00" always means literally "00:00:00 one day
later", that could work, but you can't just have it meaning "add 24 hours
to the clock".

For instance, during daylight saving time changes, days can be 23 hours or
25 hours long...

Best wishes,

Tels

Re: What does Time.MAX_VALUE actually represent?

2018-01-01 Thread Tels

Moin,

On Sun, December 31, 2017 12:50 pm, Tom Lane wrote:
> Peter Eisentraut  writes:
>> select timestamp '2017-12-30 24:00:00';
>> returns
>> 2017-12-31 00:00:00
>> which makes some sense.
>
>> I don't know why we accept that and not '24:00:01' and beyond, but it's
>> probably historical.
>
> We also accept
>
> regression=# select timestamp '2017-12-30 23:59:60';
>   timestamp
> -
>  2017-12-31 00:00:00
> (1 row)
>
> which is maybe a little bit more defensible as a common behavior
> to allow for leap seconds.  (It's far from a correct implementation of
> leap seconds, of course, but I think most versions of mktime() and
> friends do likewise.)
>
> Digging into the git history, it looks like this choice dates to
>
> commit a93bf4503ffc6d7cd6243a6324fb2ef206b10adf
> Author: Bruce Momjian 
> Date:   Fri Oct 14 11:47:57 2005 +
>
> Allow times of 24:00:00 to match rounding behavior:
>
> regression=# select '23:59:59.9'::time(0);
>time
> --
>  24:00:00
> (1 row)
>
> This is bad because:
>
> regression=# select '24:00:00'::time(0);
> ERROR:  date/time field value out of range: "24:00:00"
>
> The last example now works.
>
>
> There may at the time have been some argument about imprecision of
> float timestamps involved, but it works the same way today once
> you exceed the precision of our integer timestamps:
>
> regression=# select time '23:59:59.99';
>   time
> -
>  23:59:59.99
> (1 row)
>
> regression=# select time '23:59:59.999';
>time
> --
>  24:00:00
> (1 row)
>
> If we didn't allow '24:00:00' as a valid value then we'd need to
> throw an error for '23:59:59.999', which doesn't seem nice.

Hm, but shouldn't the result then be "00:00:00" instead of "24:00:00"?

With addition it seems to work different:

postgres=# select time '23:59:59.99' + interval '0.01 seconds';
 ?column?
--
 00:00:00
(1 row)

Best regards,

Tels

Re: Faster inserts with mostly-monotonically increasing values

2018-01-02 Thread Tels

Moin,

On Tue, January 2, 2018 7:51 am, Pavan Deolasee wrote:
> On Sun, Dec 31, 2017 at 4:36 PM, Peter Geoghegan  wrote:
>
>> On Sun, Dec 31, 2017 at 6:44 AM, Pavan Deolasee
>>  wrote:
>> > Here is a patch that implements the idea. If the last insert happens
>> to
>> be
>> > in the rightmost block of an index, then we cache the block and check
>> that
>> > first for the next insert. For the cached block to be usable for the
>> insert,
>> > it should still be the rightmost, the to-be-inserted key should go
>> into
>> the
>> > cached block and there is enough free space in the block. If any of
>> these
>> > conditions do not meet, we fall back to the regular code path, i.e.
>> descent
>> > down the index from the top.
[snip]

>> > So as the size of the index increases, the benefits of the patch also
>> tend
>> > to increase. This is most likely because as the index becomes taller
>> and
>> > taller, the costs associated with index descent becomes higher.
>>
>> FWIW, I think that int4 indexes have to be extremely large before they
>> exceed 4 levels (where the leaf level counts as a level). My
>> conservative back-of-an-envelope estimate is 9 billion index tuples.
>>
>>
> Hmm Ok. I am not entirely sure whether it's the depth or just purely
> binary
> searching through 3-4 index pages and/or pinning/unpinning buffers result
> in much overhead. I will run some more tests and collect evidences.

Just a question trying to understand how btree indexes work:

If one inserts ever-increasing value, is the tree a balanced tree with a
minimum (or at least not-as-high) number of levels, or does it increase in
height every insert and creates a "tall stack"?

@Peter: Could you please share your back-of-the-envelope calculation, I'd
love to get some insights into the innards.

All the best,

Tels

Re: [HACKERS] [PATCH] Incremental sort

2018-01-04 Thread Tels

Hello Alexander,

On Thu, January 4, 2018 4:36 pm, Alexander Korotkov wrote:
> On Fri, Dec 8, 2017 at 4:06 PM, Alexander Korotkov <
> a.korot...@postgrespro.ru> wrote:
>
>> Thank you for pointing that.  Sure, both cases are better.  I've added
>> second case as well as comments.  Patch is attached.

I had a quick look, this isn't a full review, but a few things struck me
on a read through the diff:

There are quite a few places where lines are broken like so:

+   
ExecIncrementalSortInitializeWorker((IncrementalSortState *) planstate,
+   
pwcxt);
Or like this:

+   result = (PlanState *) ExecInitIncrementalSort(
+   
(IncrementalSort *) node, estate, eflags);

e.g. a param is on the next line, but aligned to the very same place where
it would be w/o the linebreak. Or is this just some sort of artefact
because I viewed the diff with tabspacing = 8?

I'd fix the grammar here:

+ * Incremental sort is specially optimized kind of multikey sort 
when
+ * input is already presorted by prefix of required keys list.

Like so:

"Incremental sort is a specially optimized kind of multikey sort used when
the input is already presorted by a prefix of the required keys list."

+ * Consider following example.  We have input tuples consisting 
from

"Consider the following example: We have ..."

+* In incremental sort case we also have to cost to detect sort 
groups.

"we also have to cost the detection of sort groups."

"+   * It turns out into extra copy and comparison for each tuple."

"This turns out to be one extra copy and comparison per tuple."

+ "Portions Copyright (c) 1996-2017"

Should probably be 2018 now - time flies fast :)

return_value = _readMaterial();
else if (MATCH("SORT", 4))
return_value = _readSort();
+   else if (MATCH("INCREMENTALSORT", 7))
+   return_value = _readIncrementalSort();
else if (MATCH("GROUP", 5))
return_value = _readGroup();

I think the ", 7" here is left-over from when it was named "INCSORT", and
it should be MATCH("INCREMENTALSORT", 15)), shouldn't it?

+  space, fase 
when it's value for in-memory

typo: "space, false when ..."

+   boolcmp;
+   cmp = cmpSortSkipCols(node, node->sampleSlot, slot);
+
+   if (cmp)

In the above, the variable cmp could be optimized away with:

+   if (cmpSortSkipCols(node, node->sampleSlot, slot))

(not sure if modern compilers won't do this, anway, though)

+typedef struct IncrementalSortState
+{
+   ScanState   ss; /* its first field is 
NodeTag */
+   boolbounded;/* is the result set
bounded? */
+   int64   bound;  /* if bounded, how many
tuples are needed */

If I'm not wrong, the layout of the struct will include quite a bit of
padding on 64 bit due to the mixing of bool and int64, maybe it would be
better to sort the fields differently, e.g. pack 4 or 8 bools together?
Not sure if that makes much of a difference, though.

That's all for now :)

Thank you for your work,

Tels

Re: Faster inserts with mostly-monotonically increasing values

2018-01-04 Thread Tels

Hello Alvaro,

On Thu, January 4, 2018 7:35 am, Alvaro Herrera wrote:
> Pavan Deolasee wrote:
>> On Tue, Jan 2, 2018 at 7:15 PM, Tels 
>> wrote:
>>
>> > Just a question trying to understand how btree indexes work:
>> >
>> > If one inserts ever-increasing value, is the tree a balanced tree with
>> a
>> > minimum (or at least not-as-high) number of levels, or does it
>> increase in
>> > height every insert and creates a "tall stack"?
>>
>> AFAIK all pages will be half-filled in this case. Imagine if one index
>> page
>> can accommodate N entries, the page will be split when (N+1)th entry is
>> added. The left page will have half the entries and the right page will
>> get
>> the remaining. The next split will happen when the right page is full
>> i.e.
>> when another N/2 entries are added. Thus there will be a split at every
>> N/2
>> inserts, creating an index with half filled pages.
>
> Not really -- quoth _bt_findsplitloc():
>
>  * If the page is the rightmost page on its level, we instead try to
> arrange
>  * to leave the left split page fillfactor% full.  In this way, when we
> are
>  * inserting successively increasing keys (consider sequences, timestamps,
>  * etc) we will end up with a tree whose pages are about fillfactor% full,
>  * instead of the 50% full result that we'd get without this special case.
>
>
> To answer Tels question: the tree is balanced by splitting pages and
> propagating the splits upwards to the parent pages, all the way up to
> the root.  When the root page gets full, it is also split and a new root
> is created on top of the old root plus its new sibling page, which is
> the only point at which the tree grows in depth.

Thank you for the explanation! You learn something new every time :)

All the best,

Tels

Re: [Patch] Make block and file size for WAL and relations defined at cluster creation

2018-01-05 Thread Tels

On Wed, January 3, 2018 4:04 pm, Robert Haas wrote:
> On Wed, Jan 3, 2018 at 3:43 PM, Remi Colinet 
> wrote:
>> Justifications are:
>
> I think this is all missing the point.  If varying the block size (or
> whatever) is beneficial, then having it configurable at initdb is
> clearly useful.  But, as Andres says, you haven't submitted any
> evidence that this is the case.  You need to describe scenarios in
> which (1) a non-default blocksize performs better and (2) there's no
> reasonable way to obtain the same performance improvement without
> changing the block size.

(Note: I gather that "block size" here is the page size, but I'm not
entirely sure. So this detail might make my arguments moot :)

First, I do think there is a chicken-and-egg problem here. If you can't
vary the values except by re-compiling, almost no-one does. So you don't
get any benchmarks, or data.

And even if the author of the patch does this, he can only provide very
spotty data, which might not be enough to draw the right conclusions.

Plus, if the goal is to "select the optimal size" (as opposed to Remi's
goal, which seems to me os "make it easier to select the optimial size for
one system at hand"), you might not be able to do this, as the "optimial"
size is different for different conditions, but you can't try different
values if the value is compiled in...

Plus, isn't almost all advancement in computing that you replace "1 + 1"
with "x + y"? :)

So, I do think the patch has some merits, because at least it allows much
easer benchmarking. And if the value was configurable at initdb time, I
think more people would experiment and settle for their optimium. So for
me the question isn't here "does it benefit everyone", but "does it
benefit some, and what is the cost for others".

Plus, I stumbled just by accident over a blog post from Tomas Vondra[0]
from 2015, who seems to agree that different page sizes can be useful:

https://blog.pgaddict.com/posts/postgresql-on-ssd-4kb-or-8kB-pages

Me thinks the next steps would be that more benchmarks are done on more
distinct hardware to see what benefits we can see, and weight this against
the complexity the patch introduces into the code base.

Hope that does make sense,

Tels

[0]: Tomas, great blog, btw!

Re: proposal: alternative psql commands quit and exit

2018-01-15 Thread Tels


On Mon, January 15, 2018 11:10 am, Robert Haas wrote:
> On Mon, Jan 15, 2018 at 10:57 AM, Tom Lane  wrote:
>> Robert Haas  writes:
>>> I've discovered one thing about this design that is not so good, which
>>> is that if you open a single, double, or dollar quote, then the
>>> instructions that are provided under that design do not work:
>>
>> I was kind of imagining that we could make the hint text vary depending
>> on the parsing state.  Maybe that's too hard to get to --- but if the
>> prompt-creating code knows what it is, perhaps this can too.
>
> prompt_status does seem to be available in psql's MainLoop(), so I
> think that could be done, but the problem is that I don't know exactly
> what message would be useful to print when we're in a "in quotes"
> state.  If I had a good message text for that state, I might just
> choose to use it always rather than multiplying the number of
> messages.
>
> More broadly, I think what is needed here is less C-fu than
> English-fu.  If we come up with something good, we can make it print
> that thing.

Maybe that's a Linux-specific thing, but I always use Ctrl-D to exit a
console, and this works with psql, too, even when in the middle of a query
typed.

So maybe this could be suggested?

Best wishes,

Tels

Re: proposal: alternative psql commands quit and exit

2018-01-15 Thread Tels

Moin,

On Mon, January 15, 2018 2:34 pm, Tels wrote:

> Maybe that's a Linux-specific thing, but I always use Ctrl-D to exit a
> console, and this works with psql, too, even when in the middle of a query
> typed.
>
> So maybe this could be suggested?

Sorry, should have really read the thread until the very end, please
ignore my noise.

Best wishes,

Tels

Re: STATISTICS retained in CREATE TABLE ... LIKE (INCLUDING ALL)?

2018-01-26 Thread Tels

Moin,

On Fri, January 26, 2018 2:30 am, David Rowley wrote:
> On 21 January 2018 at 19:21, David Rowley 
> wrote:
>> On 20 January 2018 at 18:50, Tom Lane  wrote:
>>> Stephen Froehlich  writes:
>>>> Are custom statistics in PG10 retained in LIKE (INCLUDING ALL) or do I
>>>> need to recreate the statistics by hand each time I create a new
>>>> table?
>>>
>>> LIKE doesn't know about those, no.  Perhaps it should.
>>
>> Agreed. ALL does not mean MOST.
>
> (Moving to -hackers)
>
> The attached implements this.
>
> Looking at "LIKE ALL" in more detail in the docs it claims to be
> equivalent to "INCLUDING DEFAULTS INCLUDING IDENTITY INCLUDING
> CONSTRAINTS INCLUDING INDEXES INCLUDING STORAGE INCLUDING COMMENTS",
> so it didn't seem right to magically have LIKE ALL include something
> that there's no option to include individually, so I added INCLUDING
> STATISTICS.

Note sure if someone want's to exclude statistics, but it is good that one
can. So +1 from me.

Looking at the patch, at first I thought the order was sorted and you
swapped STORAGE and STATISTICS by accident. But then, it seems the order
is semi-random. Should that list be sorted or is it already sorted by some
criteria that I don't see?

-  INCLUDING DEFAULTS INCLUDING IDENTITY INCLUDING
CONSTRAINTS INCLUDING INDEXES INCLUDING STORAGE INCLUDING
COMMENTS.
+  INCLUDING DEFAULTS INCLUDING IDENTITY INCLUDING
CONSTRAINTS INCLUDING INDEXES INCLUDING STORAGE INCLUDING STATISTICS
INCLUDING COMMENTS.

Best wishes,

Tels

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2018-02-05 Thread Tels

On Mon, February 5, 2018 4:27 pm, Robert Haas wrote:
> On Mon, Feb 5, 2018 at 1:03 PM, Peter Geoghegan  wrote:
>> It certainly is common. In the case of logtape.c, we almost always
>> write out some garbage bytes, even with serial sorts. The only
>> difference here is the *sense* in which they're garbage: they're
>> uninitialized bytes, which Valgrind cares about, rather than byte from
>> previous writes that are left behind in the buffer, which Valgrind
>> does not care about.
>
> /me face-palms.
>
> So, I guess another option might be to call VALGRIND_MAKE_MEM_DEFINED
> on the buffer.  "We know what we're doing, trust us!"
>
> In some ways, that seems better than inserting a suppression, because
> it only affects the memory in the buffer.
>
> Anybody else want to express an opinion here?

Are the uninitialized bytes that are written out "whatever was in the
memory previously" or just some "0x00 bytes from the allocation but not
yet overwritten from the PG code"?

Because the first sounds like it could be a security problem - if random
junk bytes go out to the disk, and stay there, information could
inadvertedly leak to permanent storage.

Best regards,

Tels

Re: Using scalar function as set-returning: bug or feature?

2018-02-13 Thread Tels

Moin Tom,

On Mon, February 12, 2018 5:03 pm, Tom Lane wrote:
> Pavel Stehule  writes:
>> 2018-02-09 12:02 GMT+01:00 Marko Tiikkaja :
>>> This is quite short-sighted.  The better way to do this is to complain
>>> if
>>> the number of expressions is different from the number of target
>>> variables
>>> (and the target variable is not a record-ish type).  There's been at
>>> least
>>> two patches for this earlier (one my me, and one by, I think Pavel
>>> Stehule).  I urge you to dig around in the archives to avoid wasting
>>> your
>>> time.
[snip a bit]

> As things stand today, we would have a hard time tightening that up
> without producing unwanted complaints about the cases mentioned in
> this comment, because the DTYPE_ROW logic is used for both "INTO a,b,c"
> and composite-type variables.  However, my pending patch at
> https://commitfest.postgresql.org/17/1439/
> gets rid of the use of DTYPE_ROW for composite types, and once that
> is in it might well be reasonable to just throw a flat-out error for
> wrong number of source values for a DTYPE_ROW target.  I can't
> immediately think of any good reason why you'd want to allow for
> the number of INTO items not matching what the query produces.

Perl lets you set a fixed number of multiple variables from an array and
discard the rest like so:

  my ($a, $b) = (1,2,3);

and the right hand side can also be a function:

  my ($c, $d) = somefunc( 123 );

Where somefunc() returns more than 2 params.

This is quite handy if you sometimes need more ore less return values, but
don't want to create almost-identical functions for each case.

I'm not sure if you mean exactly the scenario as in the attached test
case, but this works in plpgsql, too, and would be a shame to lose.

OTOH, one could also write:

  SELECT INTO ba, bb  a,b FROM foo(1);

and it would still work, or wouldn't it?

Best regards,

Tels

test.psql
Description: Binary data


test.pl
Description: Perl program

Re: pglz performance

2019-11-03 Thread Tels


Hello Andrey,

On 2019-11-02 12:30, Andrey Borodin wrote:
1 нояб. 2019 г., в 18:48, Alvaro Herrera  
написал(а):

PFA two patches:
v4-0001-Use-memcpy-in-pglz-decompression.patch (known as 'hacked' in
test_pglz extension)
v4-0001-Use-memcpy-in-pglz-decompression-for-long-matches.patch (known
as 'hacked8')


Looking at the patches, it seems only the case of a match is changed. 
But when we observe a literal byte, this is copied byte-by-byte with:


 else
  {
  * An unset control bit means LITERAL BYTE. So we just
  * copy one from INPUT to OUTPUT.
  */
  *dp++ = *sp++;
  }

Maybe we can optimize this, too. For instance, you could just increase a 
counter:


 else
  {
  /*
  * An unset control bit means LITERAL BYTE. We count
  * these and copy them later.
  */
  literal_bytes ++;
  }

and in the case of:

  if (ctrl & 1)
{
/* First copy all the literal bytes */
if (literal_bytes > 0)
  {
  memcpy( sp, dp, literal_bytes);
  sp += literal_bytes;
  dp += literal_bytes;
  literal_bytes = 0;
  }

(Code untested!)

The same would need to be done at the very end, if the input ends 
without any new CTRL-byte.


Wether that gains us anything depends on how common literal bytes are. 
It might be that highly compressible input has almost none, while input 
that is a mix of incompressible strings and compressible ones might have 
longer stretches. One example would be something like an SHA-256, that 
is repeated twice. The first instance would be incompressible, the 
second one would be just a copy. This might not happens that often in 
practical inputs, though.


I wonder if you agree and what would happen if you try this variant on 
your corpus tests.


Best regards,

Tels

Re: backup manifests

2019-11-22 Thread Tels

y the backup. A few percentage points is probably not a big deal, but
a user who has an 8-hour window to get the backup done overnight will
not be happy if it's taking 6 hours now and we tack 40%-50% on to
that. So I think that we either have to disable backup checksums by
default, or figure out a way to get the overhead down to something a
lot smaller than what current tests are showing -- which we could
possibly do without changing the algorithm if we can somehow make it a
lot cheaper, but otherwise I think the choice is between disabling the
functionality altogether by default and adopting a less-expensive
algorithm. Maybe someday when delta restore is in core and widely used
and CPUs are faster, it'll make sense to revise the default, and
that's cool, but I can't see imposing a big overhead by default to
enable a feature core doesn't have yet...


Modern algorithms are amazingly fast on modern hardware, some even
are implemented in hardware nowadays:

 https://software.intel.com/en-us/articles/intel-sha-extensions

Quote from:

 
https://neosmart.net/blog/2017/will-amds-ryzen-finally-bring-sha-extensions-to-intels-cpus/


 "Despite the extremely limited availability of SHA extension support
  in modern desktop and mobile processors, crypto libraries have already
  upstreamed support to great effect. Botan’s SHA extension patches show 
a
  significant 3x to 5x performance boost when taking advantage of the 
hardware
  extensions, and the Linux kernel itself shipped with hardware SHA 
support
  with version 4.4, bringing a very respectable 3.6x performance upgrade 
over

  the already hardware-assisted SSE3-enabled code."

If you need to load the data from disk and shove it over a network, the
hashing will certainly be very little overhead, it might even be 
completely
invisible, since it can run in paralell to all the other things. Sure, 
there
is the thing called zero-copy-networking, but if you have to compress 
the
data bevore sending it to the network, you have to put it through the 
CPU,

anyway. And if you have more than one core, the second one can to the
hashing it paralell to the first one doing the compression.

To get a feeling one can use:

   openssl speed md5 sha1 sha256 sha512

On my really-not-fast desktop CPU (i5-4690T CPU @ 2.50GHz) it says:

 The 'numbers' are in 1000s of bytes per second processed.
  type   16 bytes 64 bytes256 bytes   1024 bytes   8192 
bytes  16384 bytes
  md5   122638.55k   277023.96k   487725.57k   630806.19k   
683892.74k   688553.98k
  sha1  127226.45k   313891.52k   632510.55k   865753.43k   
960995.33k   977215.19k
  sha256 77611.02k   173368.15k   325460.99k   412633.43k   
447022.92k   448020.48k
  sha512 51164.77k   205189.87k   361345.79k   543883.26k   
638372.52k   645933.74k


Or in other words, it can hash nearly 931 MByte /s with SHA-1 and about
427 MByte / s with SHA-256 (if I haven't miscalculated something). You'd 
need a
pretty fast disk (aka M.2 SSD) and network (aka > 1 Gbit) to top these 
speeds
and then you'd use a real CPU for your server, not some poor Intel 
powersaving

surfing thingy-majingy :)

Best regards,

Tels

Re: backup manifests

2019-11-23 Thread Tels

Moin,

On 2019-11-22 23:30, David Steele wrote:

On 11/22/19 5:15 PM, Tels wrote:

On 2019-11-22 20:01, Robert Haas wrote:
On Fri, Nov 22, 2019 at 1:10 PM David Steele  
wrote:

> Phrased more positively, if you want a cryptographic hash
> at all, you should probably use one that isn't widely viewed as too
> weak.

Sure.  There's another advantage to picking an algorithm with lower
collision rates, though.

CRCs are fine for catching transmission errors (as caveated above) 
but
not as great for comparing two files for equality.  With strong 
hashes
you can confidently compare local files against the path, size, and 
hash

stored in the manifest and save yourself a round-trip to the remote
storage to grab the file if it has not changed locally.

I agree in part. I think there are two reasons why a 
cryptographically
strong hash is desirable for delta restore. First, since the 
checksums

are longer, the probability of a false match happening randomly is
lower, which is important. Even if the above analysis is correct and
the chance of a false match is just 2^-32 with a 32-bit CRC, if you
back up ten million files every day, you'll likely get a false match
within a few years or less, and once is too often. Second, unlike 
what

I supposed above, the contents of a PostgreSQL data file are not
chosen at random, unlike transmission errors, which probably are more
or less random. It seems somewhat possible that there is an adversary
who is trying to choose the data that gets stored in some particular
record so as to create a false checksum match. A CRC is a lot easier
to fool than a crytographic hash, so I think that using a CRC of 
*any*

length for this kind of use case would be extremely dangerous no
matter the probability of an accidental match.

Agreed. See above.

However, if you choose a hash, please do not go below SHA-256. Both 
MD5
and SHA-1 already had collision attacks, and these only got to be 
bound

to be worse.

I don't think collision attacks are a big consideration in the general
case.  The manifest is generally stored with the backup files so if a
file is modified it is then trivial to modify the manifest as well.

That is true. However, a simple way around this is to sign the manifest
with a public key l(GPG or similiar). And if the manifest contains
strong, hard-to-forge hashes, we got a mure more secure backup, where
(almost) nobody else can alter the manifest, nor can he mount easy
collision attacks against the single files.

Without the strong hashes it would be pointless to sign the manifest.

Of course, you could store the manifest separately or even just know 
the

hash of the manifest and store that separately.  In that case SHA-256
might be useful and it would be good to have the option, which I 
believe

is the plan.

I do wonder if you could construct a successful collision attack (even
in MD5) that would also result in a valid relation file.  Probably, at
least eventually.

With MD5, certainly. One way is to have two block of 512 bits that hash
to the different MD5s. It is trivial to re-use one already existing from
the known examples.

Here is one, where the researchers constructed 12 PDFs that all
have the same MD5 hash:

  https://www.win.tue.nl/hashclash/Nostradamus/

If you insert one of these blocks into a relation and dump it, you could
swap it (probably?) out on disk for the other block. I'm not sure this
is of practical usage as an attack, tho. It would, however, cast doubt
on the integrity of the backup and prove that MD5 is useless.

OTOH, finding a full collision with MD5 should also be in reach with
todays hardware. It is hard find exact numbers but this:

   https://www.win.tue.nl/hashclash/SingleBlock/

gives the following numbers for 2008/2009:

  "Finding the birthday bits took 47 hours (expected was 3 days) on the
  cluster of 215 Playstation 3 game consoles at LACAL, EPFL. This is
  roughly equivalent to 400,000 hours on a single PC core. The single
  near-collision block construction took 18 hours and 20 minutes on a
  single PC core."

Today one can probably compute it on a single GPU in mere hours. And you
can rent massive amounts of them in the cloud for real cheap.

Here are a few, now a bit dated, references:

   https://blog.codinghorror.com/speed-hashing/
   http://codahale.com/how-to-safely-store-a-password/

Best regards,

Tels

Re: backup manifests

2019-11-25 Thread Tels


On 2019-11-24 15:38, David Steele wrote:

On 11/23/19 4:34 PM, Andrew Dunstan wrote:


On 11/23/19 3:13 AM, Tels wrote:


Without the strong hashes it would be pointless to sign the manifest.



I guess I must have missed where we are planning to add a 
cryptographic

signature.


I don't think we were planning to, but the user could do so if they 
wished.


That was what I meant.

Best regards,

Tels

Re: backup manifests

2019-12-31 Thread Tels


Moin,

sorry for the very late reply. There was a discussion about the specific 
format of the backup manifests, and maybe that was already discussed and 
I just overlooked it:


1) Why invent your own format, and not just use a machine-readable 
format that already exists? It doesn't have to be full blown XML, or 
even JSON, something simple as YAML would already be better. That way 
not everyone has to write their own parser. Or maybe it is already YAML 
and just the different keywords where under discussion?


2) It would be very wise to add a version number to the format. That 
will making an extension later much easier and avoids the "we need  to 
add X, but that breaks compatibility with all software out there" 
situations that often arise a few years down the line.


Best regards,

and a happy New Year 2020

Tels

Re: Translations contributions urgently needed

2018-02-23 Thread Tels

On Thu, February 22, 2018 11:04 pm, Tom Lane wrote:
> Alvaro Herrera  writes:
>> Please join pgsql-translat...@postgresql.org.
>
> What surprises me about this thread is that apparently the sad state
> of the v10 translations wasn't already discussed on that list?

The list is more or less dead - over the past 6 months there where only
between 1 and 5 messages per month.

> I have no objection to calling for more translation volunteers on
> this list --- in fact, probably it'd be a good idea to call for
> more help on pgsql-general, too.  But it doesn't seem like it's
> quite on-topic for -hackers otherwise.

I do think it would be a good idea to at least send a translation summary
once a quarter to the translators list and maybe include hackers and
general - many might not be aware of the 80% rule.

Also an automated summary one ore two months before a major release as a
"call for action" would be good to avoid such mishaps.

Last but not least I'd be able to help with the german translation, but I
have no clear idea how, or what the actual status is. Are German
translators actually needed?

And while I have no problem entering some texts on a website or edit a
file with an editor, the process described here:

  https://www.postgresql.org/docs/10/static/nls-translator.html

sounds complicated and leaves me unclear on quite a few things, for
instance how would I submit whatever work I've done? Do I need git? An
account? Where? And how does it all work?

I guess a lot of potential translators who aren't programmers would be
left baffled, too.

Best regards,

Tels

Re: [HACKERS] [POC] Faster processing at Gather node

2018-03-02 Thread Tels

Hello Robert,

On Fri, March 2, 2018 12:22 pm, Robert Haas wrote:
> On Wed, Feb 28, 2018 at 10:06 AM, Robert Haas 
> wrote:
>> [ latest patches ]
>
> Committed.  Thanks for the review.

Cool :)

There is a typo, tho:

+   /*
+* If the counterpary is known to have attached, we can read mq_receiver
+* without acquiring the spinlock and assume it isn't NULL.  Otherwise,
+* more caution is needed.
+*/

s/counterpary/counterparty/;

Sorry, only noticed while re-reading the thread.

Also, either a double space is missing, or one is too many:

+   /*
+* Separate prior reads of mq_ring from the increment of mq_bytes_read
+* which follows.  Pairs with the full barrier in shm_mq_send_bytes(). 
We
+* only need a read barrier here because the increment of mq_bytes_read 
is
+* actually a read followed by a dependent write.
+*/

("  Pairs ..." vs. ". We only ...")

Best regards,

Tels

Re: 2018-03 Commitfest Summary (Andres #1)

2018-03-03 Thread Tels

Moin,

On Fri, March 2, 2018 2:48 pm, Andres Freund wrote:
> Hi,
>
> On 2018-03-02 11:06:12 +0100, Fabien COELHO wrote:
>> A lot of patches do not even get a review: no immediate interest or more
>> often no ressources currently available, patch stays put, I'm fine with
>> that.
>
> Well, even if that's the case that's not free of cost to transport them
> from fest to fest. Going through them continually isn't free. At some
> point we just gotta decide it's an undesirable feature.

Erm, I don't think that from A: "There are not enough reviewer capacities"
necessarily follows B: "This is an undesirable feature."

That PG doesn't have enough reviewers doesn't mean users/authors want or
need features - it just means there are not enough people who are capable
of doing a review, or they haven't been found yet, or worse, turned away
by the current process.

Sure, sometimes a lack of interest just means a lack of interest - but not
every user of PG reads or follows the -hackers list or does (or can) even
contribute to PG.

And, the most strongest point to me is: Somebody showed soo much interest
they got up and wrote a patch. Even if nobody reviewed it, that is already
a high hurdle cleared, isn't it?

>> Now, some happy patches actually get reviews and are switched to ready,
>> which shows that somebody saw enough interest in them to spend some time
>> to
>> improve them.
>>
>> If committers ditch these reviewed patches on weak ground (eg "I do not
>> need
>> this feature so nobody should need it"), it is both in contradiction
>> with
>> the fact that someone took the time to review it, and is highly
>> demotivating
>> for people who do participate to the reviewing process and contribute to
>> hopefully improve these patches, because the reviewing time just goes to
>> the
>> drain in the end even when the patch is okay.
>
> The consequence of this appears to be that we should integrate
> everything that anybody decided worthy enough to review.

And likewise, I don't think the reverse follows, either.

[snipabit]
>> So for me killing ready patches in the end of the process and on weak
>> ground
>> can only make the process worse. The project is shooting itself in the
>> foot,
>> and you cannot complain later that there is not enough reviewers.

That would also be my opinion.

> How would you want it to work otherwise? Merge everything that somebody
> found review worthy? Have everything immediately reviewed by committers?
> Neither of those seem realistic.

True, but the process could probably be streamlined quite a bit. As an
outsider, I do wonder why sometimes long time periods go by without any
action - but when then a deadline comes, the action is taken suddenly
without the involved parties having had the time to respond first (I mean
"again", of course they could have responded earlier. But you know how it
is in a busy world, and with PG being a "side-project").

That is unsatisfactory for the authors (who either rebase patches
needlessly, or find their patch not getting a review because it has
bitrotted again) or reviewers (because they find patches have bitrotted)
and everyone else (because even the simplest features or problem-solving
fixes can take easily a year or two, if they don't get rejected outright).

Maybe an idea would be to send automatic notifications about patches that
need rebasing, or upcoming deadlines like starting commit fests?

In addition, clear rules and well-formulated project goals would help a lot.

Also, the discussion about "needs of the project" vs. the "needs of the
users" [0] should be separate from the "what do we do about the lack of
manpower" discussion.

Because if you argue what you want with the what you have, you usually end
up with nothing in both departments.

Best regards,

Tels

[0]: E.g. how much is it worth to have a clean, pristine state of source
code vs. having some features and fixes for the most pressing items in a
somewhat timely manner? I do think that there should be at least different
levels between PG core and utilities like pgbench.

Re: [WIP PATCH] Index scan offset optimisation using visibility map

2018-03-06 Thread Tels

Hello Michail,

On Tue, March 6, 2018 4:03 pm, Michail Nikolaev wrote:
> Hello, Andrey.
>
> Thanks for review.
>
> I have updated comments according your review also renamed some fields for
> consistency.
> Additional some notes added to documentation.
>
> Updated patch in attach, github updated too.

That is a cool idea, and can come in very handy if you regulary need large
offsets. Cannot comment on the code, but there is at least one regular
typo:

+   * Decrement counter for remaning skipped tuples.
+   * If last tuple skipped - release the buffer.
+   */
+   if (node->iss_TuplesSkippedRemaning > 0)
+   node->iss_TuplesSkippedRemaning--;

The English word is "remaining", with "ai" instead of "a" :)

Also the variable name "iss_TuplesSkipped" has its grammar at odds with
the purpose the code seems to be.

It could be named "SkipTuples" (e.g. this is the number of tuples we need
to skip, not the number we have skipped), and the other one then
"iss_SkipTuplesRemaining" so they are consistent with each other.

Others might have a better name for these two, of course.

Best wishes,

Tels

Re: [HACKERS] plpgsql - additional extra checks

2018-03-20 Thread Tels

Hello Pavel and Tomas,

On Tue, March 20, 2018 12:36 am, Pavel Stehule wrote:
> 2018-03-19 21:47 GMT+01:00 Tomas Vondra :
>
>> Hi,
>>
>> I'm looking at the updated patch (plpgsql-extra-check-180316.patch), and
>> this time it applies and builds OK. The one thing I noticed is that the
>> documentation still uses the old wording for strict_multi_assignement:
>>
>> WARNING:  Number of evaluated fields does not match expected.
>> HINT:  strict_multi_assignement check of extra_warnings is active.
>> WARNING:  Number of evaluated fields does not match expected.
>> HINT:  strict_multi_assignement check of extra_warnings is active.
>>
>> This was reworded to "Number of source and target fields in assignment
>> does not match."
>>

I believe the correct wording should be:

"Number of source and target fields in assignment do not match."

ecause comparing one number to the other means "the number A and B _do_
not match", not "the number A does not match number B".

Also there is an inconsistent quoting here:

+   
+Setting plpgsql.extra_warnings, or
+plpgsql.extra_errors, as appropriate, to
all

no quotes for 'all'.

+is encouraged in development and/or testing environments.
+   
+
+   
+These additional checks are enabled through the configuration variables
+plpgsql.extra_warnings for warnings and
+plpgsql.extra_errors for errors. Both can be set
either to
+a comma-separated list of checks, "none" or
+"all".

quotes here around '"all"'. I think it should be one or the other in both
cases.

Also:

+ Currently
+the list of available checks includes only one:

but then it lists more than one check?

Best wishes,

Tels

Re: truncating timestamps on arbitrary intervals

2020-03-24 Thread Tels


Hello John,

this looks like a nice feature. I'm wondering how it relates to the 
following use-case:


When drawing charts, the user can select pre-defined widths on times 
(like "15 min", "1 hour").


The data for these slots is fitted always to intervalls that start in 0 
in the slot, e.g. if the user selects "15 min", the interval always 
starts at xx:00, xx:15, xx:30 or xx:45. This is to aid caching of the 
resulting data, and so that requesting the same chart at xx:00 and xx:01 
actually draws the same chart, and not some bar with only one minute 
data at at the end.


In PSQL, this works out to using this as GROUP BY and then summing up 
all values:


  SELECT FLOOR(EXTRACT(EPOCH from thetime) / 3600) * 3600, SUM(events) 
FROM mytable ... GROUP BY 1;


whereas here 3600 means "hourly".

It is of course easy for things like "1 hour", but for yearly I had to 
come up with things like:


  EXTRAC(YEAR FROM thetime) * 12 + EXTRACT(MONTH FROM thetime)

and its gets even more involved for weeks, weekdays or odd things like 
fortnights.


So would that mean one could replace this by:

 date_trunc_interval('3600 seconds'::interval, thetime)

and it would be easier, and (hopefully) faster?

Best regards,

Tels

Re: Speed up Hash Join by teaching ExprState about hashing

2024-08-17 Thread Tels


Hello David,

you wrote:

v4 patch attached. If nobody else wants to look at this then I'm 
planning on pushing it soon.


Had a very brief look at this bit caught my attentioon:

+   EEO_CASE(EEOP_HASHDATUM_NEXT32_STRICT)
+   {
+   FunctionCallInfo fcinfo = op->d.hashdatum.fcinfo_data;
+   uint32  existing_hash = 
DatumGetUInt32(*op->resvalue);
+   uint32  hashvalue;
+
+   /* combine successive hash values by rotating */
+   existing_hash = pg_rotate_left32(existing_hash, 1);
+
+   if (fcinfo->args[0].isnull)
+   {

Is it nec. to rotate existing_hash here before checking for isnull? 
Because in case of isnull, isn't the result of the rotate thrown away?


Or in other words, mnaybe this bit here can be moved to after the isnull 
check:


+   /* combine successive hash values by rotating */
+   existing_hash = pg_rotate_left32(existing_hash, 1);

--
Best regards,

Tels

Re: Tighten up a few overly lax regexes in pg_dump's tap tests

2019-02-07 Thread Tels

Moin,

On Wed, February 6, 2019 8:10 pm, Michael Paquier wrote:
> On Wed, Feb 06, 2019 at 03:41:20PM +0100, Daniel Gustafsson wrote:
>> Correct.  One could argue that the regex is still suboptimal since
>> “COMMENT ON
>> DATABASE postgres IS  ;” will be matched as well, but there I think the
>> tradeoff
>> for readability wins.
>
> Okay, that looks like an improvement anyway, so committed after going
> over the tests for similar problems, and there was one for CREATE
> DATABASE and DROP ROLE.  It is possible to have a regex which tells as
> at least one non-whitespace character, but from what I can see these
> don't really improve the readability.

Sorry for being late to the party, but it just occured to me that while
".+"  is definitely an improvement over ".*", it isn't foolproof either,
as it also matches ";".

Thus:

  regexp => qr/^COMMENT ON DATABASE postgres IS .+;/m,

matches things like:

  'COMMENT ON DATABASE postgres IS ;;'
  'COMMENT ON DATABASE postgres IS  ;'
  'COMMENT ON DATABASE postgres IS --;'

etc.

I'm not sure it is really necessary to deal with these cases, but one
possibility would be to pre-compute regexps:

  $QR_COMMENT = qr/[^ ;]+/;
  $QR_IDENTIFIER = qr/[^ ;]+/; # etc

or whataver is the thing that should actually be matched here and use them
like so:

  regexp => qr/^COMMENT ON DATABASE postgres IS $QR_COMMENT;/m,

That way it is easily changable and quite readable.

Oh, one more question. Shouldn't these regexps that start with "^" also
end with "$"? Or can there be output like:

  'COMMENT ON DATABASE postgres IS $QR_IDENTIFIER; SELECT 1;'

?

Best regards,

Tels

Re: In PG12, query with float calculations is slower than PG11

2020-02-07 Thread Tels

Moin,

On 2020-02-07 15:42, Emre Hasegeli wrote:

> The patch looks unduly invasive to me, but I think that it might be
> right that we should go back to a macro-based implementation, because
> otherwise we don't have a good way to be certain that the function
> parameter won't get evaluated first.

I'd first like to see some actual evidence of this being a problem,
rather than just the order of the checks.

There seem to be enough evidence of this being the problem.  We are
better off going back to the macro-based implementation.  I polished
Keisuke Kuroda's patch commenting about the performance issue, removed
the check_float*_val() functions completely, and added unlikely() as
Tom Lane suggested.  It is attached.  I confirmed with different
compilers that the macro, and unlikely() makes this noticeably faster.

Hm, the diff has the macro tests as:

 +  if (unlikely(isinf(val) && !(inf_is_valid)))
 ...
 +  if (unlikely((val) == 0.0 && !(zero_is_valid)))

But the comment does not explain that this test has to be in that
order, or the compiler will for non-constant arguments evalute
the (now) right-side first. E.g. if I understand this correctly:

 +  if (!(zero_is_valid) && unlikely((val) == 0.0)

would have the same problem of evaluating "zero_is_valid" (which
might be an isinf(arg1) || isinf(arg2)) first and so be the same thing
we try to avoid with the macro? Maybe adding this bit of info to the
comment makes it clearer?

Also, a few places use the macro as:

 +  CHECKFLOATVAL(result, true, true);

which evaluates to a complete NOP in both cases. IMHO this could be
replaced with a comment like:

 +  // No CHECKFLOATVAL() needed, as both inf and 0.0 are valid

(or something along the lines of "no error can occur"), as otherwise
CHECKFLOATVAL() implies to the casual reader that there are some checks
done, while in reality no real checks are done at all (and hopefully
the compiler optimizes everything away, which might not be true for
debug builds).

--
Best regards,

TelsFrom e869373ad093e668872f08833de2c5c614aab673 Mon Sep 17 00:00:00 2001
From: Emre Hasegeli 
Date: Fri, 7 Feb 2020 10:27:25 +
Subject: [PATCH] Bring back CHECKFLOATVAL() macro

The inline functions added by 6bf0bc842b caused the conditions of
overflow/underflow checks to be evaluated when no overflow/underflow
happen.  This slowed down floating point operations.  This commit brings
back the macro that was in use before 6bf0bc842b to fix the performace
regression.

Reported-by: Keisuke Kuroda 
Author: Keisuke Kuroda 
Discussion: https://www.postgresql.org/message-id/CANDwggLe1Gc1OrRqvPfGE%3DkM9K0FSfia0hbeFCEmwabhLz95AA%40mail.gmail.com
---
 src/backend/utils/adt/float.c   | 66 ++---
 src/backend/utils/adt/geo_ops.c |  2 +-
 src/include/utils/float.h   | 75 ++---
 3 files changed, 66 insertions(+), 77 deletions(-)

diff --git a/src/backend/utils/adt/float.c b/src/backend/utils/adt/float.c
index a90d4db215..5885719850 100644
--- a/src/backend/utils/adt/float.c
+++ b/src/backend/utils/adt/float.c
@@ -1184,21 +1184,21 @@ ftod(PG_FUNCTION_ARGS)

 /*
  *		dtof			- converts a float8 number to a float4 number
  */
 Datum
 dtof(PG_FUNCTION_ARGS)
 {
 	float8		num = PG_GETARG_FLOAT8(0);

-	check_float4_val((float4) num, isinf(num), num == 0);
+	CHECKFLOATVAL((float4) num, isinf(num), num == 0);

 	PG_RETURN_FLOAT4((float4) num);
 }

 /*
  *		dtoi4			- converts a float8 number to an int4 number
  */
 Datum
 dtoi4(PG_FUNCTION_ARGS)
@@ -1438,36 +1438,36 @@ dsqrt(PG_FUNCTION_ARGS)
 	float8		arg1 = PG_GETARG_FLOAT8(0);
 	float8		result;

 	if (arg1 < 0)
 		ereport(ERROR,
 (errcode(ERRCODE_INVALID_ARGUMENT_FOR_POWER_FUNCTION),
  errmsg("cannot take square root of a negative number")));

 	result = sqrt(arg1);

-	check_float8_val(result, isinf(arg1), arg1 == 0);
+	CHECKFLOATVAL(result, isinf(arg1), arg1 == 0);
 	PG_RETURN_FLOAT8(result);
 }

 /*
  *		dcbrt			- returns cube root of arg1
  */
 Datum
 dcbrt(PG_FUNCTION_ARGS)
 {
 	float8		arg1 = PG_GETARG_FLOAT8(0);
 	float8		result;

 	result = cbrt(arg1);
-	check_float8_val(result, isinf(arg1), arg1 == 0);
+	CHECKFLOATVAL(result, isinf(arg1), arg1 == 0);
 	PG_RETURN_FLOAT8(result);
 }

 /*
  *		dpow			- returns pow(arg1,arg2)
  */
 Datum
 dpow(PG_FUNCTION_ARGS)
 {
@@ -1525,40 +1525,40 @@ dpow(PG_FUNCTION_ARGS)
 			/* The sign of Inf is not significant in this case. */
 			result = get_float8_infinity();
 		else if (fabs(arg1) != 1)
 			result = 0;
 		else
 			result = 1;
 	}
 	else if (errno == ERANGE && result != 0 && !isinf(result))
 		result = get_float8_infinity();

-	check_float8_val(result, isinf(arg1) || isinf(arg2), arg1 == 0);
+	CHECKFLOATVAL(result, isinf(arg1) || isinf(arg2), arg1 == 0);
 	PG_RETURN_FLOAT8(result);
 }

 /*
  *		dexp			- returns the exponential function of arg1
  */
 Datum
 dexp(PG_FUNCTION_ARGS)
 {
 	float8		arg1 = PG_GETARG_FLOAT8(0);
 	float8		result;

 	errno = 0;

Re: Some improvements to numeric sqrt() and ln()

2020-03-02 Thread Tels


Dear Dean,

On 2020-03-01 20:47, Dean Rasheed wrote:
On Fri, 28 Feb 2020 at 08:15, Dean Rasheed  
wrote:


It's possible that there are further gains to be had in the sqrt()
algorithm on platforms that support 128-bit integers, but I haven't
had a chance to investigate that yet.



Rebased patch attached, now using 128-bit integers for part of
sqrt_var() on platforms that support them. This turned out to be well
worth it (1.5 to 2 times faster than the previous version if the
result has less than 30 or 40 digits).


Thank you for these patches, these sound like really nice improvements.
One thing can to my mind while reading the patch:

+*  If r < 0 Then
+*  Let r = r + 2*s - 1
+*  Let s = s - 1

+   /* s is too large by 1; let r = r + 2*s - 1 and s = s - 
1 */
+   r_int64 += 2 * s_int64 - 1;
+   s_int64--;

This can be reformulated as:

+*  If r < 0 Then
+*  Let r = r + s
+*  Let s = s - 1
+*  Let r = r + s

+   /* s is too large by 1; let r = r + 2*s - 1 and s = s - 
1 */
+   r_int64 += s_int64;
+   s_int64--;
+   r_int64 += s_int64;

which would remove one mul/shift and the temp. variable. Mind you, I 
have
not benchmarked this, so it might make little difference, but maybe it 
is

worth trying it.

Best regards,

Telsdiff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c
new file mode 100644
index 10229eb..9e6bb80
--- a/src/backend/utils/adt/numeric.c
+++ b/src/backend/utils/adt/numeric.c
@@ -393,16 +393,6 @@ static const NumericVar const_ten =
 #endif
 
 #if DEC_DIGITS == 4
-static const NumericDigit const_zero_point_five_data[1] = {5000};
-#elif DEC_DIGITS == 2
-static const NumericDigit const_zero_point_five_data[1] = {50};
-#elif DEC_DIGITS == 1
-static const NumericDigit const_zero_point_five_data[1] = {5};
-#endif
-static const NumericVar const_zero_point_five =
-{1, -1, NUMERIC_POS, 1, NULL, (NumericDigit *) const_zero_point_five_data};
-
-#if DEC_DIGITS == 4
 static const NumericDigit const_zero_point_nine_data[1] = {9000};
 #elif DEC_DIGITS == 2
 static const NumericDigit const_zero_point_nine_data[1] = {90};
@@ -518,6 +508,8 @@ static void div_var_fast(const NumericVa
 static int	select_div_scale(const NumericVar *var1, const NumericVar *var2);
 static void mod_var(const NumericVar *var1, const NumericVar *var2,
 	NumericVar *result);
+static void div_mod_var(const NumericVar *var1, const NumericVar *var2,
+		NumericVar *quot, NumericVar *rem);
 static void ceil_var(const NumericVar *var, NumericVar *result);
 static void floor_var(const NumericVar *var, NumericVar *result);
 
@@ -7712,6 +7704,7 @@ div_var_fast(const NumericVar *var1, con
 			 NumericVar *result, int rscale, bool round)
 {
 	int			div_ndigits;
+	int			load_ndigits;
 	int			res_sign;
 	int			res_weight;
 	int		   *div;
@@ -7766,9 +7759,6 @@ div_var_fast(const NumericVar *var1, con
 	div_ndigits += DIV_GUARD_DIGITS;
 	if (div_ndigits < DIV_GUARD_DIGITS)
 		div_ndigits = DIV_GUARD_DIGITS;
-	/* Must be at least var1ndigits, too, to simplify data-loading loop */
-	if (div_ndigits < var1ndigits)
-		div_ndigits = var1ndigits;
 
 	/*
 	 * We do the arithmetic in an array "div[]" of signed int's.  Since
@@ -7781,9 +7771,16 @@ div_var_fast(const NumericVar *var1, con
 	 * (approximate) quotient digit and stores it into div[], removing one
 	 * position of dividend space.  A final pass of carry propagation takes
 	 * care of any mistaken quotient digits.
+	 *
+	 * Note that div[] doesn't necessarily contain all of the digits from the
+	 * dividend --- the desired precision plus guard digits might be less than
+	 * the dividend's precision.  This happens, for example, in the square
+	 * root algorithm, where we typically divide a 2N-digit number by an
+	 * N-digit number, and only require a result with N digits of precision.
 	 */
 	div = (int *) palloc0((div_ndigits + 1) * sizeof(int));
-	for (i = 0; i < var1ndigits; i++)
+	load_ndigits = Min(div_ndigits, var1ndigits);
+	for (i = 0; i < load_ndigits; i++)
 		div[i + 1] = var1digits[i];
 
 	/*
@@ -7844,9 +7841,15 @@ div_var_fast(const NumericVar *var1, con
 			maxdiv += Abs(qdigit);
 			if (maxdiv > (INT_MAX - INT_MAX / NBASE - 1) / (NBASE - 1))
 			{
-/* Yes, do it */
+/*
+ * Yes, do it.  Note that if var2ndigits is much smaller than
+ * div_ndigits, we can save a significant amount of effort
+ * here by noting that we only need to normalise those div[]
+ * entries touched where prior iterations subtracted multiples
+ * of the divisor.
+ */
 carry = 0;
-for (i = div_ndigits; i > qi; i--)
+for (i = Min(qi + var2ndigits - 2, div_ndigits); i > qi; i--)
 {
 	newdig = div[i] + carry;
 	if (

Re: Non-portable shell code in pg_upgrade tap tests

2018-07-20 Thread Tels

Moin,

On Fri, July 20, 2018 10:55 am, Victor Wagner wrote:
> On Fri, 20 Jul 2018 10:25:47 -0400
> Tom Lane  wrote:
>
>> Victor Wagner  writes:
>> > I've discovered that in the branch REL_11_STABLE there is shell
>> > script src/bin/pg_upgrade/test.sh which doesn't work under Solaris
>> > 10. (it uses $(command) syntax with is not compatible with original
>> > Solaris /bin/sh)
>
>>
>> Please send a patch.  Most of us do not have access to old shells
>
> Here it goes. Previous letter was written before fixed tests were
> completed, because this old machine is slow.


+   *)  if [ `find ${PGDATA} -type f ! -perm 640 | wc -l` -ne 0 ]; then

Shouldn't ${PGDATA} in the above as argument to find be quoted, otherwise
the shell would get confused if it contains spaces or other special
characters?

Regards,

Tels

Re: Have an encrypted pgpass file

2018-07-20 Thread Tels

Moin,

On Wed, July 18, 2018 7:25 pm, Tom Lane wrote:
> Alvaro Herrera  writes:
>> Seems to me that passing %-specifiers to the command would make it more
>> useful (%u for "user", "host" etc) -- your command could refuse to give
>> you a password for the superuser account for instance but grant one for
>> a read-only user.
>
> It would also provide a *very* fertile source of shell-script-injection
> vulnerabilities.  (Whaddya mean, you tried to use a user name with a
> quote mark in it?)

Little Bobby Tables, we call him. :)

I'm also concerned that that would let anybody who could alter the
environment then let arbitrary code be run as user postgres. Is this
something that poses a risk in addition to the current situation?

Best regards,

Tels

Re: Non-portable shell code in pg_upgrade tap tests

2018-07-21 Thread Tels

Moin,

On Sat, July 21, 2018 12:47 pm, Dagfinn Ilmari Mannsåker wrote:
> Tom Lane  writes:
>
>> "Tels"  writes:
>>> +   *)  if [ `find ${PGDATA} -type f ! -perm 640 | wc -l` -ne 0 ]; then
>>
>>> Shouldn't ${PGDATA} in the above as argument to find be quoted,
>>> otherwise
>>> the shell would get confused if it contains spaces or other special
>>> characters?
>>
>> Hmm.  Yeah, probably.  I don't think this script is meant to be run with
>> arbitrary values of PGDATA, but most of the other uses are quoted, so
>> for consistency's sake this should be too.
>
> PGDATA is built from `pwd`, so it breaks if the build directory has a
> space in it. Because it's testing for the absence of files, it doesn't
> actually break the test, but would fail to detect the bugs it's trying
> to.

Thanx for the fix. But perhaps I should have been a bit more specific in
my first message, spaces are not the only character this can break at.

Looking at your new patch, I notice you used "" for quoting, not ''. (Not
sure which variant Tom used when pushing a patch).

I'm not a shell expert, but I think '' are safer, as "" still has some
interpolation from the shell (at least on the systems I use regulary):

 te@pc:~$ mkdir 'test$test'
 te@pc:~$ touch 'test$test/test'
 te@pc:~$ find 'test$test/test' -type f
 test$test/test
 te@pc:~$ find "test$test/test" -type f
 find: ‘test/test’: No such file or directory

So I've grown the habit to always use '' instead of "". Not sure if this
is specific to bash vs. sh or PC vs Mac, tho.

Best wishes,

Tels

Re: perlcritic script

2018-05-08 Thread Tels

Moin,

On Tue, May 8, 2018 5:03 pm, Peter Eisentraut wrote:
> On 5/8/18 16:51, Tom Lane wrote:
>> Peter Eisentraut  writes:
>>> On 5/8/18 13:57, Andrew Dunstan wrote:
>>>> +  # take executable files that file(1) thinks are perl files
>>>> +  find . -type f -perm -100 -exec file {} \; -print |
>>>> +  egrep -i ':.*perl[0-9]*\>' |
>>
>>> How portable is that?
>>
>> Well, it's the same code that's in pgperltidy ... but I agree that
>> it's making a lot of assumptions about the behavior of file(1).
>
> OK, but then it's not a problem for this thread.

If I'm not mistaken, the first line in the "find" code could be more
compact like so:

 find . -type f -iname '*.p[lm]'

(-print is default, and the -name argument is a regexp, anyway. And IMHO
it could be "-iname" so we catch "test.PM", too?).

Also, "-print" does not handle filenames with newlines well, so "-print0"
should be used, however, this can be tricky when the next step isn't xarg,
but sort. Looking at the man page, on my system this would be:

 find . -type f -name '*.p[lm]' -print0 | sort -u -z | xargs -0 ...

Not sure if that is more, or less, portable then the original -print
variant, tho.

Best regards,

Tels

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-12-08 Thread Tels

Hello Rushabh,

On Fri, December 8, 2017 2:28 am, Rushabh Lathia wrote:
> Thanks for review.
>
> On Fri, Dec 8, 2017 at 6:27 AM, Peter Geoghegan  wrote:
>
>> On Thu, Dec 7, 2017 at 12:25 AM, Rushabh Lathia
>>  wrote:
>> > 0001-Add-parallel-B-tree-index-build-sorting_v14.patch

I've looked only at patch 0002, here are some comments.

> + * leaderasworker indicates whether leader going to participate as
worker  or
> + * not.

The grammar is a bit off, and the "or not" seems obvious. IMHO this could be:

+ * leaderasworker indicates whether the leader is going to participate as
worker

The argument leaderasworker is only used once and for one temp. variable
that is only used once, too. So the temp. variable could maybe go.

And not sure what the verdict was from the const-discussion threads, I did
not follow it through. If "const" is what should be done generally, then
the argument could be consted, as to not create more "unconsted" code.

E.g. so:

+plan_create_index_workers(Oid tableOid, Oid indexOid, const bool
leaderasworker)

and later:

-   sort_mem_blocks / (parallel_workers + 1) <
min_sort_mem_blocks)
+   sort_mem_blocks / (parallel_workers + (leaderasworker
? 1 : 0)) < min_sort_mem_blocks)

Thank you for working on this patch!

All the best,

Tels

Re: Bitmap table scan cost per page formula

2017-12-21 Thread Tels

Moin,

On Wed, December 20, 2017 11:51 pm, Jeff Janes wrote:
> On Wed, Dec 20, 2017 at 2:18 PM, Robert Haas 
> wrote:
>
>> On Wed, Dec 20, 2017 at 4:20 PM, Jeff Janes 
>> wrote:
>>>
>>> It is not obvious to me that the parabola is wrong.  I've certainly
>>> seen
>>> cases where reading every 2nd or 3rd block (either stochastically, or
>>> modulus) actually does take longer than reading every block, because it
>>> defeats read-ahead.  But it depends on a lot on your kernel version and
>>> your kernel settings and your file system and probably other things as
>>> well.
>>>
>>
>> Well, that's an interesting point, too.  Maybe we need another graph
>> that
>> also shows the actual runtime of a bitmap scan and a sequential scan.
>>
>
> I've did some low level IO benchmarking, and I actually get 13 times
> slower
> to read every 3rd block than every block using CentOS6.9 with ext4 and the
> setting:
> blockdev --setra 8192 /dev/sdb1
> On some virtualized storage which I don't know the details of, but it
> behaves as if it were RAID/JBOD with around 6 independent spindles..

Repeated this here on my desktop, linux-image-4.10.0-42 with a Samsung SSD
850 EVO 500 Gbyte, on an encrypted / EXT4 partition:

 $ dd if=/dev/zero of=zero.dat count=130 bs=8192
 130+0 records in
 130+0 records out
 1064960 bytes (11 GB, 9,9 GiB) copied, 22,1993 s, 480 MB/s

All blocks:

 $ sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"
 $ time perl -le 'open my $fh, "rand" or die; foreach (1..130)
{$x="";next if $_%3>5; sysseek $fh,$_*8*1024,0 or die $!; sysread $fh,
$x,8*1024; print length $x} ' | uniq -c
129 8192

 real0m20,841s
 user0m0,960s
 sys 0m2,516s

Every 3rd block:

 $ sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"
 $ time perl -le 'open my $fh, "rand" or die; foreach (1..130) {$x="";
next if $_%3>0; sysseek $fh,$_*8*1024,0 or die $!; sysread $fh,
$x,8*1024; print length $x} '|uniq -c
 43 8192

 real0m50,504s
 user0m0,532s
 sys 0m2,972s

Every 3rd block random:

 $ sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"
 $ time perl -le 'open my $fh, "rand" or die; foreach (1..130) {$x="";
next if rand()> 0.; sysseek $fh,$_*8*1024,0 or die $!; sysread $fh,
$x,8*1024; print length $x} ' | uniq -c
 432810 8192

 real0m26,575s
 user0m0,540s
 sys 0m2,200s

So it does get slower, but only about 2.5 times respectively about 30%.

Hope this helps,

Tels

Re: [HACKERS] Replication status in logical replication

2017-12-26 Thread Tels

Moin,

On Mon, December 25, 2017 7:26 pm, Masahiko Sawada wrote:
> On Tue, Dec 26, 2017 at 1:10 AM, Petr Jelinek
>  wrote:
>> On 21/11/17 22:06, Masahiko Sawada wrote:
>>>
>>> After investigation, I found out that my previous patch was wrong
>>> direction. I should have changed XLogSendLogical() so that we can
>>> check the read LSN and set WalSndCaughtUp = true even after read a
>>> record without wait. Attached updated patch passed 'make check-world'.
>>> Please review it.
>>>
>>
>> Hi,
>>
>> This version looks good to me and seems to be in line with what we do in
>> physical replication.
>>
>> Marking as ready for committer.

(Sorry Masahiko, you'll get this twice, as fumbled the reply button.)

I have not verifed that comment and/or code are correct, just a grammar fix:

+/*
+ * If we've sent a record is at or beyond the flushed
point, then
+ * we're caught up.

That should read more like this:

"If we've sent a record that is at or beyond the flushed point, we have
caught up."

All the best,

Tels

Re: [HACKERS] Replication status in logical replication

2017-12-26 Thread Tels

Moin,

On Tue, December 26, 2017 5:26 am, Masahiko Sawada wrote:
> On Tue, Dec 26, 2017 at 6:19 PM, Tels 
> wrote:
>> Moin,
>>
>> On Mon, December 25, 2017 7:26 pm, Masahiko Sawada wrote:
>>> On Tue, Dec 26, 2017 at 1:10 AM, Petr Jelinek
>>>  wrote:
>>>> On 21/11/17 22:06, Masahiko Sawada wrote:
>>>>>
>>>>> After investigation, I found out that my previous patch was wrong
>>>>> direction. I should have changed XLogSendLogical() so that we can
>>>>> check the read LSN and set WalSndCaughtUp = true even after read a
>>>>> record without wait. Attached updated patch passed 'make
>>>>> check-world'.
>>>>> Please review it.
>>>>>
>>>>
>>>> Hi,
>>>>
>>>> This version looks good to me and seems to be in line with what we do
>>>> in
>>>> physical replication.
>>>>
>>>> Marking as ready for committer.
>>
>> (Sorry Masahiko, you'll get this twice, as fumbled the reply button.)
>>
>> I have not verifed that comment and/or code are correct, just a grammar
>> fix:
>>
>> +/*
>> + * If we've sent a record is at or beyond the flushed
>> point, then
>> + * we're caught up.
>>
>> That should read more like this:
>>
>> "If we've sent a record that is at or beyond the flushed point, we have
>> caught up."
>>
>
> Thank you for reviewing the patch!
> Actually, that comment is inspired by the comment just below comment.
> ISTM it's better to fix both if grammar of them is not appropriate.

Oh yes. Your attached version reads fine to me.

All the best,

Tels

Re: [HACKERS] [PATCH] Tap test support for backup with tablespace mapping

2017-12-27 Thread Tels

Dear Vaishnavi,

On Tue, December 26, 2017 8:58 pm, Vaishnavi Prabakaran wrote:
> Hi All,
>
> I have added support in Postgres TAP test framework to backup a data
> directory with tablespace mapping. Also added support to move the backup
> directory contents to standby node, because current option to init the
> standby from backup does not support copying softlinks, which is needed
> when tablespace mapping is involved in backup.
>
> Added a new test to existing streaming replication tap test to demonstrate
> the usage of these new APIs.
>
> Attached the patch, Hope this enhancement is useful.
>
> Thanks & Regards,
> Vaishnavi,
> Fujitsu Australia.
>

Thank you for the path, I saw these things:

* backup_withtablespace() does not have any documentation?

* The mkdir calls do not set a mask for the created dir, defaulting to
0777 - is this what is wanted here?

* none of the mkdir, chdir etc. calls check any error code, e.g. what
happens if one of them fails?

* different indentation between rmdir and move (tab vs. spaces):

+rmdir($data_path);
+   move("$backup_path", "$self->{_basedir}/pgdata")

Best regards,

Tels

Re: plpgsql function startup-time improvements

2017-12-28 Thread Tels

Hello Tom,

On Wed, December 27, 2017 3:38 pm, Tom Lane wrote:
> Attached are patches for two performance-improvement ideas that came
> to me while working on
[snip]
>
> Also, I changed PLpgSQL_var.isconst and PLpgSQL_var.notnull from "int"
> to "bool", which is what they should have been all along, and relocated
> them in the PLpgSQL_var struct.  There are two motivations for this.
> It saves a whole 8 bytes per PLpgSQL_var, at least on 64-bit machines,
> because the fields now fit into what had been wasted padding space;
> reducing the size of what we have to copy during copy_plpgsql_datums
> has to be worth something.  Second, those fields are now adjacent to
> the common PLpgSQL_variable fields, which will simplify migrating them
> into PLpgSQL_variable, as I anticipate we'll want to do at some point
> when we allow composite variables to be marked CONSTANT and maybe NOT
> NULL.

More performance, especially in plpgsql is always a good thing :)

After a few experiments I've got a question or two, though (and please
excuse if this are stupid questions :)

My C is a bit rusty, so I embarked on a mission to learn more.

With a short test program printing out the size of PLpgSQL_var to check, I
always saw 72 bytes, regardless of bool vs. int. So a bool would be 4
bytes here. Hmm.

Googling around, this patch comment from Peter Eisentraut says that "bool"
can be more than one byte, and only "bool8" is really one byte.

https://www.postgresql.org/message-id/attachment/54267/0006-Add-bool8-typedef-for-system-catalog-structs.patch

I used 64-bit Kubuntu, the gcc that came with the system, and installed
Postgres 10 to see what MAXIMUM_ALIGNOF should be (it says 8 here).

Since I could get the test program to compile against all PG header files
(which is probably me being just being dense...), I used a stripped down
test.c (attached).

However, I probably did not have the real compiler settings, which might
influence the size of bool or the enum definition?

Maybe someone could shed some light on these questions:

* Does bool vs. int really save space or is this maybe system/compiler
dependend?

* Or should the structs use "bool8" to ensure this?

* I dimly remember that access to 1-byte fields might be slower than to 4
byte fields on X86-64. Would the size savings still worth it?

And maybe folding all four bool fields into an "int flags" field with bits
would save space, and not much slower (depending on often how the
different flags are accessed due to the ANDing and ORing ops)?

Best regards,

Tels#include 
#include 
#include  

/*
 * #include "access/xact.h"
#include "commands/event_trigger.h"
#include "commands/trigger.h"
#include "executor/spi.h"
*/

#define MAXIMUM_ALIGNOF 8

#define TYPEALIGN(ALIGNVAL,LEN)  \
   (((uintptr_t) (LEN) + ((ALIGNVAL) - 1)) & ~((uintptr_t) ((ALIGNVAL) - 1)))
#define MAXALIGN(LEN) TYPEALIGN(MAXIMUM_ALIGNOF, (LEN))

/*
 *   * Datum array node types
 * */
typedef enum PLpgSQL_datum_type
{
  PLPGSQL_DTYPE_VAR,
  PLPGSQL_DTYPE_ROW,
  PLPGSQL_DTYPE_REC,
  PLPGSQL_DTYPE_RECFIELD,
  PLPGSQL_DTYPE_ARRAYELEM,
  PLPGSQL_DTYPE_EXPR
} PLpgSQL_datum_type;

/*
 *   * Scalar variable
 * */
typedef struct PLpgSQL_var
{
PLpgSQL_datum_type dtype;
int dno;
char   *refname;
int lineno;

/*PLpgSQL_type *datatype; */
char *datatype; 

boolisconst;
boolnotnull;
/*PLpgSQL_expr *default_val;
PLpgSQL_expr *cursor_explicit_expr; */
char*default_val;
char*cursor_explicit_expr;

int cursor_explicit_argrow;
int cursor_options;

boolisnull;
boolfreeval;
} PLpgSQL_var;

int main(void)
	 {
		 int i = MAXALIGN(sizeof(PLpgSQL_var));
		 printf( "MAXALIGN(sizeof(PLpgSQL_var)=%i\n", i);

		 return 0;
	 }

Re: plpgsql function startup-time improvements

2017-12-29 Thread Tels

Moin,

On Thu, December 28, 2017 5:43 pm, Tom Lane wrote:
> "Tels"  writes:
>> On Wed, December 27, 2017 3:38 pm, Tom Lane wrote:
>>> Also, I changed PLpgSQL_var.isconst and PLpgSQL_var.notnull from "int"
>>> to "bool", which is what they should have been all along, and relocated
>>> them in the PLpgSQL_var struct.
>
>> With a short test program printing out the size of PLpgSQL_var to check,
>> I
>> always saw 72 bytes, regardless of bool vs. int. So a bool would be 4
>> bytes here. Hmm.
>
> Seems fairly unlikely, especially given that we typedef bool as char ;-).

Hmn, yes, I can see my confusion. And a test shows, that sizeof(bool) is 1
here. and *char etc are 8.

> Which field order were you checking?  Are you accounting for alignment
> padding?
>
> By my count, with the existing field order on typical 64-bit hardware,
> we ought to have
>
> PLpgSQL_datum_type dtype;   -- 4 bytes [1]
> int dno;-- 4 bytes
> char   *refname;-- 8 bytes
> int lineno; -- 4 bytes
> -- 4 bytes wasted due to padding here
> PLpgSQL_type *datatype; -- 8 bytes
> int isconst;-- 4 bytes
> int notnull;-- 4 bytes
> PLpgSQL_expr *default_val;  -- 8 bytes
> PLpgSQL_expr *cursor_explicit_expr; -- 8 bytes
> int cursor_explicit_argrow; -- 4 bytes
> int cursor_options; -- 4 bytes
>
> Datum   value;  -- 8 bytes
> boolisnull; -- 1 byte
> boolfreeval;-- 1 byte
>
> so at this point we've consumed 74 bytes, but the whole struct has
> to be aligned on 8-byte boundaries because of the pointers, so
> sizeof(PLpgSQL_var) ought to be 80 --- and that is what I see here.
>
> With the proposed redesign,
>
> PLpgSQL_datum_type dtype;   -- 4 bytes [1]
> int dno;-- 4 bytes
> char   *refname;-- 8 bytes
> int lineno; -- 4 bytes
>
> boolisconst;-- 1 byte
> boolnotnull;-- 1 byte
> -- 2 bytes wasted due to padding here
> PLpgSQL_type *datatype; -- 8 bytes
> PLpgSQL_expr *default_val;  -- 8 bytes
> PLpgSQL_expr *cursor_explicit_expr; -- 8 bytes
> int cursor_explicit_argrow; -- 4 bytes
> int cursor_options; -- 4 bytes
>
> Datum   value;  -- 8 bytes
> boolisnull; -- 1 byte
> boolfreeval;-- 1 byte
>
> so we've consumed 66 bytes, which rounds up to 72 with the addition
> of trailing padding.

Sounds logical, thanx for the detailed explanation. In my test the first 4
padding bytes are probably not there, because I probably miss something
that aligns ptrs on 8-byte boundaries. At least that would explain the
sizes seen here.

So, if you moved "isnull" and "freeval" right behind "isconst" and
"notnull", you could save another 2 byte, land at 64, and thus no extra
padding would keep it at 64?

>> And maybe folding all four bool fields into an "int flags" field with
>> bits
>> would save space, and not much slower (depending on often how the
>> different flags are accessed due to the ANDing and ORing ops)?
>
> That'd be pretty messy/invasive in terms of the code changes needed,
> and I don't think it'd save any space once you account for alignment
> and the fact that my other patch proposes to add another enum at
> the end of the struct.  Also, I'm not exactly convinced that
> replacing byte sets and tests with bitflag operations would be
> cheap time-wise.  (It would particularly be a mess for isnull,
> since then there'd be an impedance mismatch with a whole lotta PG
> APIs that expect null flags to be bools.)

Already had a hunch the idea wouldn't be popular, and this are all pretty
solid arguments against it. Nevermind, then :)

Best wishes,

Tels

Re: TAP test module - PostgresClient

2017-12-29 Thread Tels

On Thu, December 28, 2017 10:14 pm, Tom Lane wrote:
> Craig Ringer  writes:
>> Another option might be to teach the TAP infrastructure and the
>> buildfarm
>> client how to fetch cpanminus and build DBD::Pg against our build-tree,
>> so
>> we could use Perl DBI.
>
> As a buildfarm owner, I'd take a *really* dim view of the buildfarm
> trying to auto-fetch code off the internet.  As a developer, the
> idea that "make check-world" would try to do that is even scarier.
> Buildfarm owners are likely to have taken at least some steps to
> sandbox their builds, but developers not so much.
>
> I do not think we should even think about going there.

Well, while I couldn't agree more on the "running code from the internet
is dangerous" thing, there are a few points for it, tho:

* if you use Perl modules on your system, you are likely doing already,
anyway, as the Perl modules come, you guessed it, from the internet :)
Just because a random $PackageMaintainer signed it does mean it is really
safe.

* And a lot of Perl modules are not in say, Debian repositories, so you
need to use CPAN (or re-invent everything). Unfortunately, the trend for
other languages seems to go into the same direction, with Ruby gems, the
python package manager, and almost everybody else re-inventing their own
packaging system, often poorly. So you might already have fallen in the
trap of "use random code from the internet". (Of course, that is not
really an argument for doing it, too...)

* the other option seems to be "re-invent the wheel, again, locally",
which isn't always the best, either.

I do agree tho that having "setup" or "make check" auto-fetching stuff
from the internet is not a good idea, however. Mostly because it becomes
suddenly much harder to run in closed networks w/o access and such
side-loading installations can bypass your systems packaging system, which
doesn't sound good, either.

OTOH, it is possible to setup local repositories, or maybe even
pre-bundling modules into some sort of "approved TAP bundle" hosted on an
official server.

The last resort would be to pre-bundle the wanted modules, but this has
the risk of outdating them fast. Plus, pre-bundled modules are not more
security vetted than the ones from the internet, so you might as well use
the CPAN version directly.

The best course seems to me to have dependencies on the OS packackes for
the  Perl modules you want to use. Not sure, however, if the build farm
client has "proper" Debian etc. packages and if it is even possible to add
these dependencies in this way.

Best wishes,

Tels

Re: Undocumented(?) limits on regexp functions

2018-08-14 Thread Tels

Moin Andrew,

On Tue, August 14, 2018 9:16 am, Andrew Gierth wrote:
>>>>>> "Tom" == Tom Lane  writes:
>
>  >> Should these limits:
>
>  >> a) be removed
>
>  Tom> Doubt it --- we could use the "huge" request variants, maybe, but
>  Tom> I wonder whether the engine could run fast enough that you'd want
>  Tom> to.
>
> I do wonder (albeit without evidence) whether the quadratic slowdown
> problem I posted a patch for earlier was ignored for so long because
> people just went "meh, regexps are slow" rather than wondering why a
> trivial splitting of a 40kbyte string was taking more than a second.

Pretty much this. :)

First of all, thank you for working in this area, this is very welcome.

We do use UTF-8 and we did notice that regexp are not actually the fastest
around, albeit we did not (yet) run into the memory limit. Mostly, because
the regexp_match* stuff we use is only used in places where the
performance is not key and the input/output is small (albeit, now that I
mention it, the quadratic behaviour might explain a few slowdowns in other
cases I need to investigate).

Anyway, in a few places we have functions that use a lot (> a dozend)
regexps that are also moderate complex (e.g. span multiple lines). In
these cases the performance was not really up to par, so I experimented
and in the end rewrote the functions in plperl. Which fixed the
performance, so we no longer had this issue.

All the best,

Tels

Re: JIT compiling with LLVM v12

2018-08-26 Thread Tels

Moin,

On Sat, August 25, 2018 9:34 pm, Robert Haas wrote:
> On Wed, Aug 22, 2018 at 6:43 PM, Andres Freund  wrote:
>> Now you can say that'd be solved by bumping the cost up, sure. But
>> obviously the row / cost model is pretty much out of whack here, I don't
>> see how we can make reasonable decisions in a trivial query that has a
>> misestimation by five orders of magnitude.
>
> Before JIT, it didn't matter whether the costing was wrong, provided
> that the path with the lowest cost was the cheapest path (or at least
> close enough to the cheapest path not to bother anyone).  Now it does.
> If the intended path is chosen but the costing is higher than it
> should be, JIT will erroneously activate.  If you had designed this in
> such a way that we added separate paths for the JIT and non-JIT
> versions and the JIT version had a bigger startup cost but a reduced
> runtime cost, then you probably would not have run into this issue, or
> at least not to the same degree.  But as it is, JIT activates when the
> plan looks expensive, regardless of whether activating JIT will do
> anything to make it cheaper.  As a blindingly obvious example, turning
> on JIT to mitigate the effects of disable_cost is senseless, but as
> you point out, that's exactly what happens right now.
>
> I'd guess that, as you read this, you're thinking, well, but if I'd
> added JIT and non-JIT paths for every option, it would have doubled
> the number of paths, and that would have slowed the planner down way
> too much.  That's certainly true, but my point is just that the
> problem is probably not as simple as "the defaults are too low".  I
> think the problem is more fundamentally that the model you've chosen
> is kinda broken.  I'm not saying I know how you could have done any
> better, but I do think we're going to have to try to figure out
> something to do about it, because saying, "check-pg_upgrade is 4x
> slower, but that's just because of all those bad estimates" is not
> going to fly.  Those bad estimates were harmlessly bad before, and now
> they are harmfully bad, and similar bad estimates are going to exist
> in real-world queries, and those are going to be harmful now too.
>
> Blaming the bad costing is a red herring.  The problem is that you've
> made the costing matter in a way that it previously didn't.

Hm, no, I don't quite follow this argument. Isn't trying to avoid "bad
costing having bad consequences" just hiding the symponts instead of
curing them? It would have a high development cost, and still bad
estimates could ruin your day in other places.

Wouldn't it be much smarter to look at why and how the bad costing appears
and try to fix this? If a query that returns 12 rows was estimated to
return about 4 million, something is wrong on a ridiculous scale.

If the costing didn't produce so much "to the moon" values, then it
wouldn't matter so much what later decisions do depending on it. I mean,
JIT is not the only thing here, even choosing the wrong plan can lead to
large runtime differences (think of a sort that spills to disk etc.)

So, is there a limit on how many rows can be estimated? Maybe based on
things like:

* how big the table is? E.g. a table with 2 pages can't have a million rows.
* what the column types are? E.g. if you do:

  SELECT * FROM table WHERE id >= 100 AND id < 200;

you cannot have more than 100 rows as a result if "id" is a unique integer
column.
* Index size: You can't pull out more rows from an index than it contains,
maybe this helps limiting "worst estimate"?

These things might also be cheaper to implement that rewriting the entire
JIT model.

Also, why does PG allow the stats to be that outdated - or missing, I'm
not sure which case it is in this example. Shouldn't the system aim to
have at least some basic stats, even if the user never runs ANALYZE? Or is
this on purpose for these tests to see what happens?

Best regards,

Tels

48 matches

Mail list logo