from:"John Naylor"

small cleanup in unicode_norm.c

2020-12-07 Thread John Naylor

We've had get_canonical_class() for a while as a backend-only function.
There is some ad-hoc code elsewhere that implements the same logic in a
couple places, so it makes sense for all sites to use this function
instead, as in the attached.

-- 
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


use-canonical-class-func.patch
Description: Binary data

Re: small cleanup in unicode_norm.c

2020-12-08 Thread John Naylor

On Tue, Dec 8, 2020 at 5:45 AM Michael Paquier  wrote:
>
> On Mon, Dec 07, 2020 at 03:24:56PM -0400, John Naylor wrote:
> > We've had get_canonical_class() for a while as a backend-only function.
> > There is some ad-hoc code elsewhere that implements the same logic in a
> > couple places, so it makes sense for all sites to use this function
> > instead, as in the attached.
>
> Thanks John for caring about that.  This is a nice simplification, and
> it looks fine to me.
>
> -static uint8
> -get_canonical_class(pg_wchar ch)
> -{
> Two nits here.  I would use "code" for the name of the argument for
> consistency with get_code_entry(), and add a description at the top of
> this helper routine (say a simple "get the combining class of given
> code").  Anything else you can think of?

Thanks for taking a look. Sounds good, I've made those adjustments and
wrote a commit message. I took another look and didn't see anything else to
address.

--
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


v2-0001-Simplify-ad-hoc-code-for-getting-a-Unicode-codepo.patch
Description: Binary data

Re: cutting down the TODO list thread

2020-12-10 Thread John Naylor

Hi,

Continuing with TODO list maintenance, first a couple things to clean up:

- Allow ALTER INDEX ... RENAME concurrently

This was in the wrong section, but it's irrelevant: The lock level was
lowered in commit 1b5d797cd4f, so I went ahead and removed this already.

- Add CORRESPONDING BY to UNION/INTERSECT/EXCEPT

The link titled "how not to write this patch" points to a web archive of
the author's description of how he implemented the rejected patch. That
doesn't seem useful, since it was...rejected. I propose to replace that
with the -hackers thread, where there is discussion of the design problem:

https://www.postgresql.org/message-id/flat/CAJZSWkWN3YwQ01C3%2Bcq0%2BeyZ1DmK%3D69_6vryrsVGMC%3D%2BfWrSZA%40mail.gmail.com

Now, for the proposed items to move to "Not Worth Doing". As before, let me
know of any objections. I plan to move these early next week:

*Views and Rules

- Allow VIEW/RULE recompilation when the underlying tables change

The entry itself says "This is both difficult and controversial." and the
linked threads confirm that.

- Make it possible to use RETURNING together with conditional DO INSTEAD
rules, such as for partitioning setups

This was from before we got native partitioning, so the stated rationale is
outdated.

*SQL Commands (this is a huge section, for now just doing the miscellany at
the top before the various subsections)

- Add a GUC variable to warn about non-standard SQL usage in queries

I don't see the reason for this, and sounds difficult anyway.

- Add NOVICE output level for helpful messages

This would only be useful if turned on, so is going to be least used where
it might help the most. It also sounds like a lot of slow menial work to
implement.

- Allow DISTINCT to work in multiple-argument aggregate calls

Tom suggested this in 2006 for the sake of orthogonality. Given the amount
of time passed, it seems not very important.

- Allow DELETE and UPDATE to be used with LIMIT and ORDER BY

Some use cases mentioned, but nearly all have some kind of workaround
already.

--
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: cutting down the TODO list thread

2020-12-14 Thread John Naylor

On Thu, Dec 10, 2020 at 3:29 PM John Naylor 
wrote:
>
> *Views and Rules

> *SQL Commands

Hearing no objections, the items mentioned have been moved over.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: Perform COPY FROM encoding conversions in larger chunks

2020-12-22 Thread John Naylor

On Wed, Dec 16, 2020 at 8:18 AM Heikki Linnakangas  wrote:
>
> Currently, COPY FROM parses the input one line at a time. Each line is
> converted to the database encoding separately, or if the file encoding
> matches the database encoding, we just check that the input is valid for
> the encoding. It would be more efficient to do the encoding
> conversion/verification in larger chunks. At least potentially; the
> current conversion/verification implementations work one byte a time so
> it doesn't matter too much, but there are faster algorithms out there
> that use SIMD instructions or lookup tables that benefit from larger
inputs.

Hi Heikki,

This is great news. I've seen examples of such algorithms and that'd be
nice to have. I haven't studied the patch in detail, but it looks fine on
the whole.

In 0004, it seems you have some doubts about upgrade compatibility. Is that
because user-defined conversions would no longer have the right signature?

--
John Naylor
EDB: http://www.enterprisedb.com

Re: Perform COPY FROM encoding conversions in larger chunks

2020-12-23 Thread John Naylor

On Wed, Dec 23, 2020 at 3:41 AM Heikki Linnakangas  wrote:

>
> I'm not sure it's worth the trouble, though. Custom conversions are very
> rare. And I don't think any other object can depend on a conversion, so
> you can always drop the conversion before upgrade, and re-create it with
> the new function signature afterwards. A note in the release notes and a
> check in pg_upgrade, with instructions to drop and recreate the
> conversion, are probably enough.
>

That was my thought as well.

-- 
John Naylor
EDB: http://www.enterprisedb.com

Re: [POC] verifying UTF-8 using SIMD instructions

2021-02-12 Thread John Naylor

On Mon, Feb 8, 2021 at 6:17 AM Heikki Linnakangas  wrote:
>
> I also tested the fallback implementation from the simdjson library
> (included in the patch, if you uncomment it in simdjson-glue.c):
>
>   mixed | ascii
> ---+---
> 447 |46
> (1 row)
>
> I think we should at least try to adopt that. At a high level, it looks
> pretty similar your patch: you load the data 8 bytes at a time, check if
> there are all ASCII. If there are any non-ASCII chars, you check the
> bytes one by one, otherwise you load the next 8 bytes. Your patch should
> be able to achieve the same performance, if done right. I don't think
> the simdjson code forbids \0 bytes, so that will add a few cycles, but
> still.

Attached is a patch that does roughly what simdjson fallback did, except I
use straight tests on the bytes and only calculate code points in assertion
builds. In the course of doing this, I found that my earlier concerns about
putting the ascii check in a static inline function were due to my
suboptimal loop implementation. I had assumed that if the chunked ascii
check failed, it had to check all those bytes one at a time. As it turns
out, that's a waste of the branch predictor. In the v2 patch, we do the
chunked ascii check every time we loop. With that, I can also confirm the
claim in the Lemire paper that it's better to do the check on 16-byte
chunks:

(MacOS, Clang 10)

master:

 chinese | mixed | ascii
-+---+---
1081 |   761 |   366

v2 patch, with 16-byte stride:

 chinese | mixed | ascii
-+---+---
 806 |   474 |83

patch but with 8-byte stride:

 chinese | mixed | ascii
-+---+---
 792 |   490 |   105

I also included the fast path in all other multibyte encodings, and that is
also pretty good performance-wise. It regresses from master on pure
multibyte input, but that case is still faster than PG13, which I simulated
by reverting 6c5576075b0f9 and b80e10638e3:

~PG13:

 chinese | mixed | ascii
-+---+---
1565 |   848 |   365

ascii fast-path plus pg_*_verifychar():

 chinese | mixed | ascii
-+---+---
1279 |   656 |94

v2 has a rough start to having multiple implementations in
src/backend/port. Next steps are:

1. Add more tests for utf-8 coverage (in addition to the ones to be added
by the noError argument patch)
2. Add SSE4 validator -- it turns out the demo I referred to earlier
doesn't match the algorithm in the paper. I plan to only copy the lookup
tables from simdjson verbatim, but the code will basically be written from
scratch, using  simdjson as a hint.
3. Adjust configure.ac

--
John Naylor
EDB: http://www.enterprisedb.com

v2-add-portability-stub-and-new-fallback.patch
Description: Binary data

Re: Preventing free space from being reused

2021-02-13 Thread John Naylor

On Fri, Feb 12, 2021 at 6:21 PM Noah Bergbauer  wrote:
>
> A btree index on the same column is 700x the size of BRIN, or 10% of
relation itself. It does not perform significantly better than BRIN. The
issue here is twofold: not only does slotting these tuples into older pages
significantly reduce the effectiveness of BRIN, it also causes
fragmentation on disk. Ultimately, this is why CLUSTER exists. One way to
look at this situation is that my data is inserted exactly in index order,
but Postgres keeps un-clustering it for reasons that are valid in general
(don't waste disk space) but don't apply at all in this case (the file
system uses compression, no space is wasted).
>
> Any alternative ideas would of course be much appreciated! But at the
moment HEAP_INSERT_SKIP_FSM seems like the most practical solution to me.

I would suggest to take a look at the BRIN opclass multi-minmax currently
in development. It's designed to address that exact situation, and more
review would be welcome:

https://commitfest.postgresql.org/32/2523/

--
John Naylor
EDB: http://www.enterprisedb.com

Re: [POC] verifying UTF-8 using SIMD instructions

2021-02-15 Thread John Naylor

dows support, but it's not developed.

- I had to add a large number of casts to get rid of warnings in the magic
constants macros. That needs some polish.

I also attached a C file that visually demonstrates every step of the
algorithm following the example found in Table 9 in the paper. That
contains the skeleton coding I started with and got abandoned early, so it
might differ from the actual patch.

--
John Naylor
EDB: http://www.enterprisedb.com


v3-SSE4-with-autoconf-support.patch
Description: Binary data


test-utf8.c
Description: Binary data

Re: [POC] verifying UTF-8 using SIMD instructions

2021-02-16 Thread John Naylor

I wrote:

> [v3]
> - It's not smart enough to stop at the last valid character boundary --
it's either all-valid or it must start over with the fallback. That will
have to change in order to work with the proposed noError conversions. It
shouldn't be very hard, but needs thought as to the clearest and safest way
to code it.

In v4, it should be able to return an accurate count of valid bytes even
when the end crosses a character boundary.

> - This is my first time hacking autoconf, and it still seems slightly
broken, yet functional on my machine at least.

It was actually completely broken if you tried to pass the special flags to
configure. I redesigned this part and it seems to work now.

--
John Naylor
EDB: http://www.enterprisedb.com

v4-SSE4-with-autoconf-support.patch
Description: Binary data

Re: [POC] verifying UTF-8 using SIMD instructions

2021-02-18 Thread John Naylor

On Mon, Feb 15, 2021 at 9:32 PM John Naylor 
wrote:
>
> On Mon, Feb 15, 2021 at 9:18 AM Heikki Linnakangas 
wrote:
> >
> > I'm guessing that's because the unaligned access in check_ascii() is
> > expensive on this platform.

> Some possible remedies:

> 3) #ifdef out the ascii check for 32-bit platforms.

> 4) Same as the non-UTF8 case -- only check for ascii 8 bytes at a time.
I'll probably try this first.

I've attached a couple patches to try on top of v4; maybe they'll help the
Arm32 regression. 01 reduces the stride to 8 bytes, and 02 applies on top
of v1 to disable the fallback fast path entirely on 32-bit platforms. A bit
of a heavy hammer, but it'll confirm (or not) your theory about unaligned
loads.

Also, I've included patches to explain more fully how I modeled non-UTF-8
performance while still using the UTF-8 tests. I think it was a useful
thing to do, and I have a theory that might predict how a non-UTF8 encoding
will perform with the fast path.

03A and 03B are independent of each other and conflict, but both apply on
top of v4 (don't need 02). Both replace the v4 fallback with the ascii
fastpath + pg_utf8_verifychar() in the loop, similar to utf-8 on master.
03A has a local static copy of pg_utf8_islegal(), and 03B uses the existing
global function. (On x86, you can disable SSE4 by passing
USE_FALLBACK_UTF8=1 to configure.)

While Clang 10 regressed for me on pure multibyte in a similar test
upthread, on Linux gcc 8.4 there isn't a regression at all. IIRC, gcc
wasn't as good as Clang when the API changed a few weeks ago, so its
regression from v4 is still faster than master. Clang only regressed with
my changes because it somehow handled master much better to begin with.

x86-64 Linux gcc 8.4

master

 chinese | mixed | ascii
-+---+---
1453 |   857 |   428

v4 (fallback verifier written as a single function)

 chinese | mixed | ascii
-+---+---
 815 |   514 |82

v4 plus addendum 03A -- emulate non-utf-8 using a copy of
pg_utf8_is_legal() as a static function

 chinese | mixed | ascii
-+---+---
1115 |   547 |87

v4 plus addendum 03B -- emulate non-utf-8 using pg_utf8_is_legal() as a
global function

 chinese | mixed | ascii
-+---+---
1279 |   604 |82

(I also tried the same on ppc64le Linux, gcc 4.8.5 and while not great, it
never got worse than master either on pure multibyte.)

This is supposed to model the performance of a non-utf8 encoding, where we
don't have a bespoke function written from scratch. Here's my theory: If an
encoding has pg_*_mblen(), a global function, inside pg_*_verifychar(), it
seems it won't benefit as much from an ascii fast path as one whose
pg_*_verifychar() has no function calls. I'm not sure whether a compiler
can inline a global function's body into call sites in the unit where it's
defined. (I haven't looked at the assembly.) But recall that you didn't
commit 0002 from the earlier encoding change, because it wasn't performing.
I looked at that patch again, and while it inlined the pg_utf8_verifychar()
call, it still called the global function pg_utf8_islegal().

If the above is anything to go by, on gcc at least, I don't think we need
to worry about a regression when adding an ascii fast path to non-utf-8
multibyte encodings.

Regarding SSE, I've added an ascii fast path in my local branch, but it's
not going to be as big a difference because 1) the check is more expensive
in terms of branches than the C case, and 2) because the general case is so
fast already, it's hard to improve upon. I just need to do some testing and
cleanup on the whole thing, and that'll be ready to share.

--
John Naylor
EDB: http://www.enterprisedb.com
diff --git a/src/include/port/pg_utf8.h b/src/include/port/pg_utf8.h
index dac2afc130..a0c94dd4f3 100644
--- a/src/include/port/pg_utf8.h
+++ b/src/include/port/pg_utf8.h
@@ -53,26 +53,25 @@ extern int pg_validate_utf8_fallback(const unsigned char *s, int len);
 static inline int
 check_ascii(const unsigned char *s, int len)
 {
-	uint64		half1, half2,
+	uint64		chunk,
 highbit_mask;

-	if  (len >= 2 * sizeof(uint64))
+	if  (len >= sizeof(uint64))
 	{
-		memcpy(&half1, s, sizeof(uint64));
-		memcpy(&half2, s + sizeof(uint64), sizeof(uint64));
+		memcpy(&chunk, s, sizeof(uint64));

 		/*
 		 * If there are any zero bytes, bail and let the slow
 		 * path handle it.
 		 */
-		if (HAS_ZERO(half1) || HAS_ZERO(half2))
+		if (HAS_ZERO(chunk))
 			return 0;

 		/* Check if any bytes in this chunk have the high bit set. */
-		highbit_mask = ((half1 | half2) & UINT64CONST(0x8080808080808080));
+		highbit_mask = (chunk & UINT64CONST(0x8080808080808080));

 		if (!highbit_mask)
-			return 2 * sizeof(uint64);
+			return sizeof(uint64);
 		else
 			return 0;
 	}
diff -

Re: WIP: BRIN multi-range indexes

2021-02-22 Thread John Naylor

On Mon, Feb 15, 2021 at 10:37 AM Tomas Vondra 
wrote:
> [v20210215]

Hi Tomas,

This time I only looked at the cumulative changes in the multiminmax
opclass, since I'm pretty sure all the bloom issues have been addressed.

 * XXX CombineRange name seems a bit weird. Consider renaming, perhaps to
 * something ExpandedRange or so.

The time to do it is now, if you like, or remove the XXX. I agree with the
comment, FWIW.

In has_matching_range():

 * So we know it's in the general min/max, the question is whether it
 * falls in one of the ranges or gaps. We'll use a binary search on
 * the ranges.
 *
 * it's in the general range, but is it actually covered by any
 * of the ranges? Repeat the check for each range.
 *
 * XXX We simply walk the ranges sequentially, but maybe we could
 * further leverage the ordering and non-overlap and use bsearch to
 * speed this up a bit.

It looks to me like you already implemented binary search and the last part
is out of date, or am I missing something?

Same in range_contains_value():

 * XXX This might benefit from the fact that both the intervals and exact
 * values are sorted - we might do bsearch or something. Currently that
 * does not make much difference (there are only ~32 intervals), but if
 * this gets increased and/or the comparator function is more expensive,
 * it might be a huge win.

Below that it does binary search if the number of elements > 16.

In merge_combine_ranges():

There are a couple assert-related TODOs.

In brin_minmax_multi_distance_timetz():

 * XXX Does this need to consider the time zones?

I wouldn't think so, because the stored values are in UTC -- the time zone
calculation only happens during storage and retrieval, and they've already
been stored, IIUC.

Also, I think you need to copy this part from
brin_minmax_multi_distance_timestamp() here as well:

if (TIMESTAMP_NOT_FINITE(dt1) || TIMESTAMP_NOT_FINITE(dt2))
PG_RETURN_FLOAT8(0);


At this point, I think it's pretty close to commit-ready. I thought maybe I
would create a small index with every type, and make sure it looks sane in
page_inspect, but that's about it.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: non-HOT update not looking at FSM for large tuple update

2021-02-24 Thread John Naylor

On Wed, Feb 24, 2021 at 10:44 AM Floris Van Nee 
wrote:
>
> The problem here is two-folded: for any non-HOT update of a tuple that’s
larger than the size of the fillfactor, the fsm will not be used, but
instead a new page will be chosen.

I confirmed this not only non-HOT updates, but regular inserts, which are
the same thing in this context.

> This seems to rely on the false assumption that every existing page has
at last one tuple on it.

Yep.

> Secondly, and this is a bit trickier.. Even if we would ask the FSM to
come up with a free page with a free size of “MaxHeapTupleSize”, it
wouldn’t find anything… This is, because the FSM tracks free space
excluding any unused line pointers.

> There are 7 line pointers on this page, consuming 28 bytes. Plus the 24
byte header, that means that lower=52. However, all line pointers are
unused, so the page really is empty. The FSM does not see the page as empty
though, as it only looks at “upper-lower”.
>
>
>
> When asking the FSM for slightly less space (MaxHeapTupleSize – 50 for
example), it does find the free pages. I’ve confirmed that with such a hack
the table is not growing indefinitely anymore. However, this number 50 is
rather arbitrary obviously, as it depends on the number of unused line
items on a page, so that’s not a proper way to fix things.
>
>
>
> In any case, the behavior feels like a bug to me, but I don’t know what
the best way would be to fix it. Thoughts?

One idea is to take your -50 idea and make it more general and safe, by
scaling the fudge factor based on fillfactor, such that if fillfactor is
less than 100, the requested freespace is a bit smaller than the max. It's
still a bit of a hack, though. I've attached a draft of this idea.

--
John Naylor
EDB: http://www.enterprisedb.com


allow-inserting-tuples-into-almost-empty-pages.patch
Description: Binary data

Re: non-HOT update not looking at FSM for large tuple update

2021-02-24 Thread John Naylor

On Wed, Feb 24, 2021 at 4:52 PM Floris Van Nee 
wrote:

> I also understand the temptation to define it based on the relation's
fill factor, as you did in the patch. However, upon some further thought I
wonder if that's useful? A relation with a higher fill factor will have a
lower 'saveFreeSpace' variable, so it's less likely to run into issues in
finding a fresh page, except if the tuple you're inserting/updating is even
larger. However, if that case happens, you'll still be wanting to look for
a page that's completely empty (except for the line items). So the only
proper metric is 'how many unused line items do we expect on empty pages'
and the fillfactor doesn't say much about this. Since this is probably
difficult to estimate at all, we may be better off just defining it off
MaxHeapTupleSize completely?
> For example, we expect 1.5% of the page could be line items, then:
>
> targetFreeSpace = MaxHeapTupleSize * 0.985

That makes sense, although the exact number seems precisely tailored to
your use case. 2% gives 164 bytes of slack and doesn't seem too large.
Updated patch attached.

--
John Naylor
EDB: http://www.enterprisedb.com


v2-allow-inserting-tuples-into-almost-empty-pages.patch
Description: Binary data

Re: Removing support for COPY FROM STDIN in protocol version 2

2021-02-25 Thread John Naylor

On Thu, Feb 4, 2021 at 11:47 AM Heikki Linnakangas  wrote:
>
> On 04/02/2021 17:35, Tom Lane wrote:
> > Alvaro Herrera  writes:
> >> Yeah, the changes I was thinking about are all in libpq-int.h so that's
> >> not really a problem.  But one enum in libpq-fe.h renumbers values, and
> >> I think it's better to keep the old value labelled as "unused" to avoid
> >> any changes.
> >
> > Oh, yeah, can't do that.  libpq-fe.h probably shouldn't change at all;
> > but certainly we can't renumber existing enum values there.
>
> Ah, right, there's even a comment above the enum that says that's a no
> no. But yeah, fixing that, I see no need for .so version bump.

I was able to build libpq and psql on 7.3 with the tooling found on RHEL 7
(the rest of the tree refused to build, but that's not relevant here) and
got the expected message when trying to connect:

master:
Welcome to psql 7.3.21, the PostgreSQL interactive terminal.

patch:
psql: FATAL:  unsupported frontend protocol 2.0: server supports 3.0 to 3.0

I couldn't find any traces of version 2 in the tree with the patch applied.
The enum mentioned above seems the only issue that needs to be fixed before
commit.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: non-HOT update not looking at FSM for large tuple update

2021-02-26 Thread John Naylor

On Wed, Feb 24, 2021 at 6:29 PM Floris Van Nee 
wrote:
>
>
> > That makes sense, although the exact number seems precisely tailored to
your use case. 2% gives 164 bytes of slack and doesn't seem too large.
Updated patch attached.
>
> Yeah, I tried picking it as conservative as I could, but 2% is obviously
great too. :-) I can't think of any large drawbacks either of having a
slightly larger value.
> Thanks for posting the patch!

I've added this to the commitfest as a bug fix and added you as an author.

--
John Naylor
EDB: http://www.enterprisedb.com

our use of popcount

2021-03-03 Thread John Naylor

While reviewing the patch to speed up Gist indexes with tsvectors [1] by
smarter use of popcount, I was reminded that for hardware popcount, we do
an indirect function call to execute a single CPU instruction, one word at
a time. I went ahead and did some microbenchmarks to see how much that buys
us, and if we could do better.

For the tests below I used the attached C file compiled into a shared
library with gcc 8.4, measuring popcount on an 8kB buffer, median of 3 runs:

select drive_popcount64(100, 1024);
select drive_popcount(100, 1024);

Note: pg_popcount64() is passed one 8-byte word, and pg_popcount() is
passed a buffer, which if properly aligned allows to call the appropriate
word-at-a-time popcount function.


master -- indirect call to pg_popcount64_asm(), where available. Note
that passing a buffer is faster:

pg_popcount64()
2680ms

pg_popcount()
1640ms


0001 ignores assembly and uses direct calls to popcount64_slow(). It is
quite a bit slower, but notice that passing a buffer to pg_popcont() with
the slow implementation is not all that much slower than calling the
indirect assembly one word at a time (2680ms vs 3190ms):

pg_popcount64()
4930ms

pg_popcount()
3190ms

It's also worth noting that currently all platforms use an indirect
function call, regardless of instruction support, so the direct call here
is a small win for non-x86-64 platforms.


0002 restores indirection through a function pointer, but chooses the
pass-a-buffer function rather than the retail function, and if assembly is
available, calls inlined popcount64_asm(). This in itself is not much
faster than master (1640ms vs 1430ms):

pg_popcount()
1430ms


However, 0003 takes the advice of [2] and [3] and uses hand-written
unrolled assembly, and now we actually have some real improvement, over 3x
faster than master:

pg_popcount()
494ms

0004 shows how it would look to use the buffer-passing version for
bitmapsets and the visibility map. Bitmapsets would benefit from this even
on master, going by the test results above. If we went with something like
0003, bitmapsets would benefit further automatically. However, counting of
visibility map bits is a bit more tricky, since we have to mask off the
visible bits and frozen bits to count them separately. 0004 has a simple
unoptimized function to demonstrate. As I recall, the VM code did not
benefit much from the popcount infrastructure to begin with, so a small
regression might not be noticeable. If it is, it's just a SMOP to offer an
assembly version here also.

The motivation for benchmarking this was to have some concrete numbers in
mind while reviewing gist index creation. If the gist patch can benefit
from any of the above, it's worth considering, as a more holistic approach.
Since it affects other parts of the system, I wanted to share this in a new
thread first.

[1] https://commitfest.postgresql.org/32/3023/
[2] https://danluu.com/assembly-intrinsics/
[3]
https://stackoverflow.com/questions/25078285/replacing-a-32-bit-loop-counter-with-64-bit-introduces-crazy-performance-deviati

-- 
John Naylor
EDB: http://www.enterprisedb.com


v1-0001-Use-direct-function-calls-for-pg_popcount-32-64.patch
Description: Binary data


v1-0002-Use-platform-specific-implementations-of-pg_popco.patch
Description: Binary data


v1-0003-Manually-unroll-the-loop-with-hand-written-assemb.patch
Description: Binary data


v1-0004-Use-pg_popcount-rather-than-pg_popcount-32-64-whe.patch
Description: Binary data


drive_popcount.c
Description: Binary data

Re: Speeding up GIST index creation for tsvectors

2021-03-03 Thread John Naylor

On Wed, Jan 27, 2021 at 11:07 AM Amit Khandekar 
wrote:

Hi Amit,

Your performance numbers look like this is a fruitful area to improve. I
have not yet tested performance, but I will do so at a later date. I did
some microbenchmarking of our popcount implementation, since I wasn't quite
sure it's optimal, and indeed, there is room for improvement there [1]. I'd
be curious to hear your thoughts on those findings. I think it'd be worth
it to test a version of this patch using those idioms here as well, so at
some point I plan to post something.

Now for the patches:

0001:

+ /*
+ * We can process 64-bit chunks only if both are mis-aligned by the same
+ * number of bytes.
+ */
+ if (b_aligned - b == a_aligned - a)

The obvious question here is: how often are they identically misaligned?
You don't indicate that your measurements differ in a bimodal fashion, so
does that mean they happen to be always (mis)aligned? Is that an accident
of the gist coding and could change unexpectedly? And how hard would it be
to allocate those values upstream so that the pointers are always aligned
on 8-byte boundaries? (I imagine pretty hard, since there are multiple
callers, and such tight coupling is not good style)

+ /* For smaller lengths, do simple byte-by-byte traversal */
+ if (bytes <= 32)

You noted upthread:

> Since for the gist index creation of some of these types the default
> value for siglen is small (8-20), I tested with small siglens. For
> siglens <= 20, particularly for values that are not multiples of 8
> (e.g. 10, 13, etc), I see a  1-7 % reduction in speed of index
> creation. It's probably because of
> an extra function call for pg_xorcount(); and also might be due to the
> extra logic in pg_xorcount() which becomes prominent for shorter
> traversals. So for siglen less than 32, I kept the existing method
> using byte-by-byte traversal.

I wonder if that can be addressed directly, while cleaning up the loops and
alignment checks in pg_xorcount_long() a little. For example, have a look
at pg_crc32c_armv8.c -- it might be worth testing a similar approach.

Also, pardon my ignorance, but where can I find the default siglen for
various types?

+/* Count the number of 1-bits in the result of xor operation */
+extern uint64 pg_xorcount_long(const unsigned char *a, const unsigned char
*b,
+   int bytes);
+static inline uint64 pg_xorcount(const unsigned char *a, const unsigned
char *b,
+ int bytes)

I don't think it makes sense to have a static inline function call a global
function.

-static int
+static inline int
 hemdistsign(BITVECP a, BITVECP b, int siglen)

Not sure what the reason is for inlining all these callers. Come to think
of it, I don't see a reason to have hemdistsign() at all anymore. All it
does is call pg_xorcount(). I suspect that's what Andrey Borodin meant when
he said upthread:

> > > Meanwhile there are at least 4 incarnation of hemdistsign() functions
that are quite similar. I'd propose to refactor them somehow...

0002:

I'm not really happy with this patch. I like the idea of keeping indirect
calls out of non-x86 platforms, but I believe it could be done more simply.
For one, I don't see a need to invent a third category of retail function.
Second, there's no reason to put "nonasm" in the header as a static inline,
and then call from there to the new now-global function "slow". Especially
since the supposed static inline is still needed as a possible value as a
function pointer on x86, so the compiler is not going to inline it on x86
anyway. That just confuses things. (I did make sure to remove indirect
calls from the retail functions in [1], in case we want to go that route).

[1]
https://www.postgresql.org/message-id/CAFBsxsFCWys_yfPe4PoF3%3D2_oxU5fFR2H%2BmtM6njUA8nBiCYug%40mail.gmail.com

--
John Naylor
EDB: http://www.enterprisedb.com

Re: Force lookahead in COPY FROM parsing

2021-03-04 Thread John Naylor

On Thu, Mar 4, 2021 at 5:13 AM Heikki Linnakangas  wrote:
>
> I posted this earlier at
>
https://www.postgresql.org/message-id/9ec25819-0a8a-d51a-17dc-4150bb3cc...@iki.fi
,
> and that led to removing FE/BE protocol version 2 support. That's been
> committed now, so here's COPY FROM patch again, rebased.

Looks good to me. Just a couple minor things:

+ * Look at the next character.  If we're at EOF, c2 will wind
+ * up as '\0' because of the guaranteed pad of raw_buf.
  */
- IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(0);
-
- /* get next char */
  c = copy_raw_buf[raw_buf_ptr];

The new comment seems copy-pasted from the c2 statements further down.

- if (raw_buf_ptr >= copy_buf_len || need_data)
+#define COPY_READ_LINE_LOOKAHEAD 4
+ if (raw_buf_ptr + COPY_READ_LINE_LOOKAHEAD >= copy_buf_len)

Is this #define deliberately put down here rather than at the top of the
file?

- * of the buffer and then we load more data after that.  This case occurs
only
- * when a multibyte character crosses a bufferload boundary.
+ * of the buffer and then we load more data after that.

Is the removed comment really invalidated by this patch? I figured it was
something not affected until the patch to do the encoding conversion in
larger chunks.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: WIP: BRIN multi-range indexes

2021-03-04 Thread John Naylor

On Tue, Mar 2, 2021 at 7:32 PM Tomas Vondra 
wrote:
> Ummm, in brin_minmax_multi_distance_timetz()? I don't think timetz can
> be infinite, no? I think brin_minmax_multi_distance_date you meant?

In timestamp.c there are plenty of checks for TIMESTAMP_NOT_FINITE
for TimestampTz types and I assumed that was applicable here.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: [POC] verifying UTF-8 using SIMD instructions

2021-03-09 Thread John Naylor

On Tue, Mar 9, 2021 at 5:00 AM Amit Khandekar 
wrote:
>
> Hi,
>
> Just a quick question before I move on to review the patch ... The
> improvement looks like it is only meant for x86 platforms.

Actually it's meant to be faster for all platforms, since the C fallback is
quite a bit different from HEAD. I've found it to be faster on ppc64le. An
earlier version of the patch was a loser on 32-bit Arm because of alignment
issues, but if you could run the test script attached to [1] on 64-bit Arm,
I'd be curious to see how it does on 0002, and whether 0003 and 0004 make
things better or worse. If there is trouble building on non-x86 platforms,
I'd want to fix that also.

(Note: 0001 is not my patch, and I just include it for the tests)

> Can this be
> done in a portable way by arranging for auto-vectorization ? Something
> like commit 88709176236caf. This way it would benefit other platforms
> as well.

I'm fairly certain that the author of a compiler capable of doing that in
this case would be eligible for some kind of AI prize. :-)

[1]
https://www.postgresql.org/message-id/06d45421-61b8-86dd-e765-f1ce527a5...@iki.fi
--
John Naylor
EDB: http://www.enterprisedb.com

Re: non-HOT update not looking at FSM for large tuple update

2021-03-09 Thread John Naylor

On Tue, Mar 9, 2021 at 9:40 AM Floris Van Nee 
wrote:
>
> Hi,
>
> >
> > This patch fails to consider that len may be bigger than
MaxHeapTupleSize *
> > 0.98, which in this case triggers a reproducable
> > PANIC:
>
> Good catch! I've adapted the patch with your suggested fix.

Thank you both for testing and for the updated patch. It seems we should
add a regression test, but it's not clear which file it belongs in.
Possibly insert.sql?

> > One different question I have, though, is why we can't "just" teach
vacuum
> > to clean up trailing unused line pointers. As in, can't we trim the
line pointer
> > array when vacuum detects that the trailing line pointers on the page
are all
> > unused?

That seems like the proper fix, and I see you've started a thread for that.
I don't think that change in behavior would be backpatchable, but patch
here might have a chance at that.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: non-HOT update not looking at FSM for large tuple update

2021-03-09 Thread John Naylor

I wrote:

> That seems like the proper fix, and I see you've started a thread for
that. I don't think that change in behavior would be backpatchable, but
patch here might have a chance at that.

I remembered after the fact that truncating line pointers would only allow
for omitting the 2% slack logic (and has other benefits), but the rest of
this patch would be needed regardless.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: WIP: BRIN multi-range indexes

2021-03-09 Thread John Naylor

On Sun, Mar 7, 2021 at 8:53 PM Tomas Vondra 
wrote:
[v20210308b]

I managed to trap an assertion that somehow doesn't happen during the
regression tests. The callers of fill_expanded_ranges() do math like this:

/* both ranges and points are expanded into a separate element */
neranges = ranges->nranges + ranges->nvalues;

but inside fill_expanded_ranges() we have this assertion:

/* Check that the output array has the right size. */
Assert(neranges == (2 * ranges->nranges + ranges->nvalues));

In the regression test data, a debugging elog() shows that nranges is most
often zero, so in that case, the math happens to be right either way. I can
reliably get nranges above zero by running

update brintest_multi set int8col = int8col - 1;

a few times, at which point I get the crash.

Aside from that, the new changes look good. Just a couple small things:

+allowed parameters.  Only the bloom operator class
+allows specifying parameters:

minmax-multi aren't mentioned here, but are mentioned further down.

+ * Addresses from different families are consider to be in maximum

(comment above brin_minmax_multi_distance_inet)
s/consider/considered/

> > 2) moving minmax/inclusion changes from 0002 to a separate patch 0003
> >
> > I think we should either ditch the 0003 (i.e. keep the existing
> > opclasses unchanged) or commit 0003 (in which case I'd vote to just stop
> > supporting the old signature of the consistent function).
> >
>
> Still not sure what do to about this. I'm leaning towards keeping 0003
> and just removing the "old" signature entirely, to keep the API cleaner.
> It might cause some breakage in out-of-core BRIN opclasses, but that
> seems like a reasonable price. Moreover, the opclasses may need some
> updating anyway, because of the changes in handling NULL scan keys (0004
> moves that from the opclass to the bringetbitmap function).

Keeping 0003 seems reasonable, given the above.

> > The remaining part that didn't get much review is the very last patch,
> > adding an option to ignore correlation for some BRIN opclases. This is
> > needed as the regular BRIN costing is quite sensitive to correlation,
> > and the cost gets way too high for poorly correlated data, making it
> > unlikely the index will be used. But handling such data sets efficiently
> > is the main point of those new opclasses. Any opinions on this?
> >
>
> Not sure about this.

I hadn't given it much thought (nor tested), but I just took a look at
brincostestimate(). If the table is badly correlated, I'm thinking the
number of ranges that need to be scanned will increase regardless. Maybe
rather than ignoring correlation, we could clamp it or otherwise tweak it.
Not sure about the details, though, that would require some testing.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: get rid of tags in the docs?

2021-03-10 Thread John Naylor

On Thu, Feb 4, 2021 at 11:31 AM Tom Lane  wrote:
>
> John Naylor  writes:
> > While looking at the proposed removal of the v2 protocol, I noticed
that we
> > italicize some, but not all, instances of 'per se', 'pro forma', and 'ad
> > hoc'. I'd say these are widespread enough in formal registers of English
> > that they hardly need to be called out as foreign, so I propose removing
> > the tags for those words.
>
> +1, nobody italicizes those in normal usage.

Now that protocol v2 is gone, here's a patch to remove those tags.

> > The other case is 'voilà', found in rules.sgml. The case for italics
here
> > is stronger, but looking at that file, I actually think a more
> > generic-sounding phrase here would be preferable.
>
> Yeah, seeing that we only use that in one place, I think we could do
> without it.  Looks like something as pedestrian as "The results are:"
> would do fine.

Done that way.

--
John Naylor
EDB: http://www.enterprisedb.com


v1-0001-Get-rid-of-foreignphrase-tags-in-the-docs.patch
Description: Binary data

Re: Speeding up GIST index creation for tsvectors

2021-03-10 Thread John Naylor

On Mon, Mar 8, 2021 at 8:43 AM Amit Khandekar 
wrote:
>
> On Wed, 3 Mar 2021 at 23:32, John Naylor 
wrote:

> > 0001:
> >
> > + /*
> > + * We can process 64-bit chunks only if both are mis-aligned by the
same
> > + * number of bytes.
> > + */
> > + if (b_aligned - b == a_aligned - a)
> >
> > The obvious question here is: how often are they identically
misaligned? You
> > don't indicate that your measurements differ in a bimodal fashion, so
does
> > that mean they happen to be always (mis)aligned?
>
> I ran CREATE INDEX on tsvector columns using the tsearch.data that I
> had attached upthread, with some instrumentation; here are the
> proportions :
> 1. In 15% of the cases, only one among a and b was aligned. The other
> was offset from the 8-byte boundary by 4 bytes.
> 2. 6% of the cases, both were offset by 4 bytes, i.e. identically
misaligned.
> 3. Rest of the cases, both were aligned.

That's somewhat strange. I would have assumed always aligned, or usually
not. It sounds like we could require them both to be aligned and not bother
with the byte-by-byte prologue. I also wonder if it's worth it to memcpy to
a new buffer if the passed pointer is not aligned.

> > + /* For smaller lengths, do simple byte-by-byte traversal */
> > + if (bytes <= 32)
> >
> > You noted upthread:
> >
> > > Since for the gist index creation of some of these types the default
> > > value for siglen is small (8-20), I tested with small siglens. For
> > > siglens <= 20, particularly for values that are not multiples of 8
> > > (e.g. 10, 13, etc), I see a  1-7 % reduction in speed of index
> > > creation. It's probably because of
> > > an extra function call for pg_xorcount(); and also might be due to the
> > > extra logic in pg_xorcount() which becomes prominent for shorter
> > > traversals. So for siglen less than 32, I kept the existing method
> > > using byte-by-byte traversal.
> >
> > I wonder if that can be addressed directly, while cleaning up the loops
and
> > alignment checks in pg_xorcount_long() a little. For example, have a
look at
> > pg_crc32c_armv8.c -- it might be worth testing a similar approach.
>
> Yeah, we can put the bytes <= 32 condition inside pg_xorcount_long().
> I avoided that to not hamper the <= 32 scenarios. Details explained
> below for "why inline pg_xorcount is calling global function"
>
> > Also, pardon my ignorance, but where can I find the default siglen for
various types?
> Check SIGLEN_DEFAULT.

Okay, so it's hard-coded in various functions in contrib modules. In that
case, let's just keep the existing coding for those. In fact, the comments
that got removed by your patch specifically say:

/* Using the popcount functions here isn't likely to win */

...which your testing confirmed. The new function should be used only for
Gist and possibly Intarray, whose default is 124. That means we can't get
rid of hemdistsign(), but that's fine. Alternatively, we could say that a
small regression is justifiable reason to refactor all call sites, but I'm
not proposing that.

> > (I did make sure to remove  indirect calls  from the retail functions
> > in [1], in case we want to go that route).
>
> Yeah, I quickly had a look at that. I am still going over that thread.
> Thanks for the exhaustive analysis there.

I'll post a patch soon that builds on that, so you can see what I mean.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: Speeding up GIST index creation for tsvectors

2021-03-10 Thread John Naylor

I wrote:
> I'll post a patch soon that builds on that, so you can see what I mean.

I've attached where I was imagining this heading, as a text file to avoid
distracting the cfbot. Here are the numbers I got with your test on the
attached, as well as your 0001, on x86-64 Clang 10, default siglen:

master:
739ms

v3-0001
692ms

attached POC
665ms

The small additional speed up is not worth it, given the code churn and
complexity, so I don't want to go this route after all. I think the way to
go is a simplified version of your 0001 (not 0002), with only a single
function, for gist and intarray only, and a style that better matches the
surrounding code. If you look at my xor functions in the attached text
file, you'll get an idea of what it should look like. Note that it got the
above performance without ever trying to massage the pointer alignment. I'm
a bit uncomfortable with the fact that we can't rely on alignment, but
maybe there's a simple fix somewhere in the gist code.

--
John Naylor
EDB: http://www.enterprisedb.com
 src/backend/access/heap/visibilitymap.c |  13 +-
 src/backend/nodes/bitmapset.c   |  23 +--
 src/backend/utils/adt/tsgistidx.c   |  14 +-
 src/include/port/pg_bitutils.h  |  12 +-
 src/port/pg_bitutils.c  | 240 ++--
 5 files changed, 214 insertions(+), 88 deletions(-)

diff --git a/src/backend/access/heap/visibilitymap.c 
b/src/backend/access/heap/visibilitymap.c
index e198df65d8..d910eb2875 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -388,7 +388,6 @@ visibilitymap_count(Relation rel, BlockNumber *all_visible, 
BlockNumber *all_fro
{
Buffer  mapBuffer;
uint64 *map;
-   int i;
 
/*
 * Read till we fall off the end of the map.  We assume that 
any extra
@@ -409,17 +408,11 @@ visibilitymap_count(Relation rel, BlockNumber 
*all_visible, BlockNumber *all_fro
StaticAssertStmt(MAPSIZE % sizeof(uint64) == 0,
 "unsupported MAPSIZE");
if (all_frozen == NULL)
-   {
-   for (i = 0; i < MAPSIZE / sizeof(uint64); i++)
-   nvisible += pg_popcount64(map[i] & 
VISIBLE_MASK64);
-   }
+   nvisible = pg_popcount_mask64(map, MAPSIZE, 
VISIBLE_MASK64);
else
{
-   for (i = 0; i < MAPSIZE / sizeof(uint64); i++)
-   {
-   nvisible += pg_popcount64(map[i] & 
VISIBLE_MASK64);
-   nfrozen += pg_popcount64(map[i] & 
FROZEN_MASK64);
-   }
+   nvisible = pg_popcount_mask64(map, MAPSIZE, 
VISIBLE_MASK64);
+   nfrozen = pg_popcount_mask64(map, MAPSIZE, 
FROZEN_MASK64);
}
 
ReleaseBuffer(mapBuffer);
diff --git a/src/backend/nodes/bitmapset.c b/src/backend/nodes/bitmapset.c
index 649478b0d4..5368c72255 100644
--- a/src/backend/nodes/bitmapset.c
+++ b/src/backend/nodes/bitmapset.c
@@ -452,7 +452,6 @@ bms_is_member(int x, const Bitmapset *a)
 int
 bms_member_index(Bitmapset *a, int x)
 {
-   int i;
int bitnum;
int wordnum;
int result = 0;
@@ -466,14 +465,8 @@ bms_member_index(Bitmapset *a, int x)
bitnum = BITNUM(x);
 
/* count bits in preceding words */
-   for (i = 0; i < wordnum; i++)
-   {
-   bitmapword  w = a->words[i];
-
-   /* No need to count the bits in a zero word */
-   if (w != 0)
-   result += bmw_popcount(w);
-   }
+   result = pg_popcount((const char *) a->words,
+wordnum * BITS_PER_BITMAPWORD 
/ BITS_PER_BYTE);
 
/*
 * Now add bits of the last word, but only those before the item. We can
@@ -645,22 +638,14 @@ bms_get_singleton_member(const Bitmapset *a, int *member)
 int
 bms_num_members(const Bitmapset *a)
 {
-   int result = 0;
int nwords;
-   int wordnum;
 
if (a == NULL)
return 0;
nwords = a->nwords;
-   for (wordnum = 0; wordnum < nwords; wordnum++)
-   {
-   bitmapword  w = a->words[wordnum];
 
-   /* No need to count the bits in a zero word */
-   if (w != 0)
-   result += bmw_popcount(w);
-   }
-   return result;
+   return pg_popcount((const char *) a->words,
+  nwords * BITS_PER_BITMAPWORD / 
BITS_PER_

Re: non-HOT update not looking at FSM for large tuple update

2021-03-11 Thread John Naylor

On Thu, Mar 11, 2021 at 9:46 AM Matthias van de Meent <
boekewurm+postg...@gmail.com> wrote:

> Regarding the 2% slack logic, could we change it to use increments of
> line pointers instead? That makes it more clear what problem this
> solution is trying to work around; that is to say line pointers not
> (all) being truncated away. The currently subtracted value accounts

That makes sense.

> ...
> - if (len + saveFreeSpace > MaxHeapTupleSize)
> + if (len + saveFreeSpace > maxPaddedFsmRequest)
> ...
> - targetFreeSpace = Max(len, MaxHeapTupleSize - (MaxHeapTupleSize * 2 /
100));
> + targetFreeSpace = Max(len, maxPaddedFsmRequest);
> ...

If we have that convenient constant, it seems equivalent (I think) and a
bit more clear to write it this way, but I'm not wedded to it:

if (len + saveFreeSpace > MaxHeapTupleSize &&
len <= maxPaddedFsmRequest)
{
...
targetFreeSpace = maxPaddedFsmRequest;
}
else
targetFreeSpace = len + saveFreeSpace;

Also, should I write a regression test for it? The test case is already
available, just no obvious place to put it.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: [POC] verifying UTF-8 using SIMD instructions

2021-03-12 Thread John Naylor

On Fri, Mar 12, 2021 at 9:14 AM Amit Khandekar 
wrote:
>
> On my Arm64 VM :
>
> HEAD :
>  mixed | ascii
> ---+---
>   1091 |   628
> (1 row)
>
> PATCHED :
>  mixed | ascii
> ---+---
>681 |   119

Thanks for testing! Good, the speedup is about as much as I can hope for
using plain C. In the next patch I'll go ahead and squash in the ascii fast
path, using 16-byte stride, unless there are objections. I claim we can
live with the regression Heikki found on an old 32-bit Arm platform since
it doesn't seem to be true of Arm in general.

> I guess, if at all we use the equivalent Arm NEON intrinsics, the
> "mixed" figures will be close to the "ascii" figures, going by your
> figures on x86.

I would assume so.

> I was not thinking about auto-vectorizing the code in
> pg_validate_utf8_sse42(). Rather, I was considering auto-vectorization
> inside the individual helper functions that you wrote, such as
> _mm_setr_epi8(), shift_right(), bitwise_and(), prev1(), splat(),

If the PhD holders who came up with this algorithm thought it possible to
do it that way, I'm sure they would have. In reality, simdjson has
different files for SSE4, AVX, AVX512, NEON, and Altivec. We can
incorporate any of those as needed. That's a PG15 project, though, and I'm
not volunteering.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: Move catalog toast table and index declarations

2020-10-27 Thread John Naylor

On Tue, Oct 27, 2020 at 7:43 AM Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 2020-10-24 15:23, John Naylor wrote:
> > Style: In genbki.h, "extern int no_such_variable" is now out of place.
> > Also, the old comments like "The macro definitions are just to keep the
> > C compiler from spitting up." are now redundant in their new setting.
>
> Hmm, I don't really see what's wrong there.  Do you mean the macro
> definitions should be different, or the comments are wrong, or something
> else?
>

There's nothing wrong; it's just a minor point of consistency. For the
first part, I mean defined symbols in this file that are invisible to the C
compiler are written

#define SOMETHING()

If some are written

#define SOMETHING() extern int no_such_variable

I imagine some future reader will wonder why there's a difference.

As for the comments, the entire file is for things meant for scripts to
read, but have to be put in macro form to be invisible to the compiler. The
header comment has

"genbki.h defines CATALOG(), BKI_BOOTSTRAP and related macros
 * so that the catalog header files can be read by the C compiler."

I'm just saying we don't need to carry over the comments I mentioned from
the toasting/indexing headers that were specially for those macros.

-- 
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

duplicate function oid symbols

2020-10-27 Thread John Naylor

Hi,

I noticed that the table AM abstraction introduced the symbol
HEAP_TABLE_AM_HANDLER_OID, although we already have a convention for
defining symbols automatically for builtin functions, which in this case is
(currently unused) F_HEAP_TABLEAM_HANDLER.

It seems a wart to have two symbols for the same function (seemingly
accidentally). I suppose it's unacceptable to remove the non-standard
symbol since it's been referred to in code for a while now. We could remove
the unused (in core anyway) standard one by arranging fmgroids.h to use
explicit symbols from pg_proc.dat where they exist, as well as prevent such
symbols from being emitted into pg_proc_d.h. But then again there is
no guarantee the standard symbol is not being used elsewhere. Thoughts?

-- 
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

cutting down the TODO list thread

2020-10-27 Thread John Naylor

As I mentioned in [1], I've volunteered to clear out the TODO list of items
that appear to be too difficult, controversial, or otherwise not worth
doing to warrant being listed there. I'll be working a few sections at a
time, and every so often I'll have a list of proposed items for removal. If
I don't hear objections, I'll remove the items after a few days while going
through the next set.

Where there's an email thread, I've skimmed a few messages to get a sense
of the community's thoughts on it. Where easily determined, I've taken age
into account, insofar as something from 2017 is going to get much more
benefit of doubt than something from 2008. I've added after each item a
phrase that sums up the reason I believe it doesn't belong anymore.
Feedback welcome, of course, although I suspect there won't be much.

**Administration

- Have custom variables be transaction-safe
  Old and found to be difficult after attempting

- Allow custom variables to appear in pg_settings()
  Old and controversial

- Implement the SQL-standard mechanism whereby REVOKE ROLE revokes only the
privilege granted by the invoking role, and not those granted by other roles
  Old and difficult

- Prevent query cancel packets from being replayed by an attacker,
especially when using SSL
  Old and difficult

*Configuration files

- Consider normalizing fractions in postgresql.conf, perhaps using '%'
  At the time (2007), some gucs used an actual percentage.

- Add external tool to auto-tune some postgresql.conf parameters
  There are already out-of-core tools that try to do this.

- Create utility to compute accurate random_page_cost value
  Seems outdated: In the current age of SSDs and cloud environments, it's
often just set to 1.1, and there hasn't been a demand to be more accurate
than that.

- Allow synchronous_standby_names to be disabled after communication
failure with all synchronous standby servers exceeds some timeout
  Controversial

- Adjust rounding behavior for numeric GUC values
  Controversial

*Tablespaces

- Allow WAL replay of CREATE TABLESPACE to work when the directory
structure on the recovery computer is different from the original
  Thread quote: "part of the difficult, perhaps-not-worth doing impossible
problems"

- Allow per-tablespace quotas
  This seems to point to the larger problem space of disk space monitoring,
and should probably be phrased thusly, and is a much bigger project or set
of projects.

- Allow tablespaces on RAM-based partitions for temporary objects
  In the thread, what's desired is the ability to have some amount of
durability on a RAM-disk without WAL logging.

- Close race in DROP TABLESPACE on Windows
  This refers to buildfarm failures from 2014.

*Statistics Collector

- Track number of WAL files ready to be archived in pg_stat_archiver
  Thread quote: "pg_stat_archiver already has a column for
last_archived_wal and last_failed_wal, so you can already work out how many
files there must be between then and now"

*Point-In-Time Recovery

- Allow archive_mode to be changed without server restart
  Controversial and old

*Standby server mode

- Allow pg_xlogfile_name() to be used in recovery mode
  Controversial and old

- Change walsender so that it applies per-role settings
  Old and possibly obsolete

--
[1]
https://www.postgresql.org/message-id/CAFBsxsHbqMzDoGB3eAGmpcpB%2B7uae%2BLLi_G%2Bo8HMEECM9CbQcQ%40mail.gmail.com


-- 
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: cutting down the TODO list thread

2020-10-27 Thread John Naylor

On Tue, Oct 27, 2020 at 4:00 PM Thomas Munro  wrote:

> On Wed, Oct 28, 2020 at 8:36 AM Andres Freund  wrote:
> > On 2020-10-27 15:24:35 -0400, John Naylor wrote:
> > > - Allow WAL replay of CREATE TABLESPACE to work when the directory
> > > structure on the recovery computer is different from the original
> > >   Thread quote: "part of the difficult, perhaps-not-worth doing
> impossible
> > > problems"
> >
> > I think we ought to do something here. Mostly because the current
> > situation makes it impossible to test many things on a single
> > system. And we have a partial solution with the tablespace mapping
> > files.
>
> +1, we need to get something like this working so that we can write
> decent replication tests.  FWIW there was another little thread on the
> topic, not listed there:
>
>
> https://www.postgresql.org/message-id/flat/CALfoeisEF92F5nJ-aAcuWTvF_Aogxq_1bHLem_kVfM_tHc2mfg%40mail.gmail.com
>

Thanks, I've added this thread to the entry.

-- 
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: cutting down the TODO list thread

2020-10-27 Thread John Naylor

On Tue, Oct 27, 2020 at 3:52 PM Bruce Momjian  wrote:

>
> Do any of these limitations need to be documented before removing them
> from the TODO list?
>

I see two areas that might use a mention:

- pg_settings not displaying custom variables
- SQL standard difference with REVOKE ROLE (I haven't looked further into
this)

-- 
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: duplicate function oid symbols

2020-10-27 Thread John Naylor

On Tue, Oct 27, 2020 at 9:51 AM Tom Lane  wrote:

> John Naylor  writes:
> > I noticed that the table AM abstraction introduced the symbol
> > HEAP_TABLE_AM_HANDLER_OID, although we already have a convention for
> > defining symbols automatically for builtin functions, which in this case
> is
> > (currently unused) F_HEAP_TABLEAM_HANDLER.
>
> Yeah, that seems wrong.  I'd just remove HEAP_TABLE_AM_HANDLER_OID.
> As long as we're not back-patching the change, it seems like a very
> minor thing to fix, if anyone outside core is referencing the old name.
>

Ok, here is a patch to fix that, and also throw an error if pg_proc.dat has
an explicitly defined symbol.

-- 
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


v1-0001-Use-the-standard-symbol-for-the-builtin-function-.patch
Description: Binary data

Re: cutting down the TODO list thread

2020-10-28 Thread John Naylor

On Tue, Oct 27, 2020 at 6:05 PM Bruce Momjian  wrote:

> On Tue, Oct 27, 2020 at 04:54:24PM -0400, John Naylor wrote:
> >
> >
> > On Tue, Oct 27, 2020 at 3:52 PM Bruce Momjian  wrote:
> >
> >
> > Do any of these limitations need to be documented before removing
> them
> > from the TODO list?
> >
> >
> > I see two areas that might use a mention:
> >
> > - pg_settings not displaying custom variables
> > - SQL standard difference with REVOKE ROLE (I haven't looked further
> into this)
>
> OK, thanks.  Do you want to work on a doc patch or should I?  Having it
> the docs at least warns our users.
>

I'll work on that.

-- 
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: cutting down the TODO list thread

2020-10-28 Thread John Naylor

On Wed, Oct 28, 2020 at 6:52 AM Magnus Hagander  wrote:

> On Wed, Oct 28, 2020 at 11:15 AM Julien Rouhaud 
> wrote:
> >
> > On Wed, 28 Oct 2020, 17:55 Oleksandr Shulgin <
> oleksandr.shul...@zalando.de wrote:
> >> I'm totally on board with cleaning the list up, but how about marking
> as "won't fix" (or similar) instead of actually removing the items?  That
> should help to prevent the same exact items from appearing on the list
> again, which they eventually would, I believe.
> >
> >
> > +1
>
> A small technical detail on the topic but if doing that, let's not
> mark them as that inline -- create a separate page with those items on
> it.
>

How about a section on the same page at the bottom, near "features we don't
want"?

-- 
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: duplicate function oid symbols

2020-10-28 Thread John Naylor

I wrote:


> Ok, here is a patch to fix that, and also throw an error if pg_proc.dat
> has an explicitly defined symbol.
>

It occurred to me I neglected to explain the error with a comment, which
I've added in v2.

-- 
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


v2-0001-Use-the-standard-symbol-for-the-builtin-function-.patch
Description: Binary data

Re: duplicate function oid symbols

2020-10-28 Thread John Naylor

On Wed, Oct 28, 2020 at 12:25 PM Tom Lane  wrote:

> I wondered about introducing a similar prohibition for pg_type.
>

That might be worth doing, since some of the grandfathered macros are
clustered together, which could lead to more cases creeping in as people
match new types to examples nearby.

> The only existing oid_symbol in pg_type that I think has enough
> grandfather status to be tough to change is CASHOID for "money".
> But we could imagine special-casing that with a handmade macro
>
> #define CASHOID MONEYOID
>
> and then getting rid of the oid_symbol entries.  (Or perhaps we
> could just up and nuke CASHOID too?  It's somewhat dubious that
> any outside code is really using that macro.)
>

Yeah, grepping shows that some of those aren't even used in core code. On
the other hand, the difference from the heap_am_handler case is the
standard macros don't already exist for these pg_type entries. The handmade
macro idea could be used for all eight just as easily as for one.

-- 
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

document pg_settings view doesn't display custom options

2020-10-28 Thread John Naylor

Starting separate threads to keep from cluttering the TODO list thread.

Here's a patch for the subject, as mentioned in
https://www.postgresql.org/message-id/20201027220555.GS4951%40momjian.us

-- 
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


doc-pg-settings-no-custom.patch
Description: Binary data

Re: document pg_settings view doesn't display custom options

2020-10-28 Thread John Naylor

On Wed, Oct 28, 2020 at 2:15 PM John Naylor 
wrote:

> Starting separate threads to keep from cluttering the TODO list thread.
>
> Here's a patch for the subject, as mentioned in
> https://www.postgresql.org/message-id/20201027220555.GS4951%40momjian.us
>

I just realized I introduced a typo, so here's v2.

-- 
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


v2-doc-pg-settings-no-custom.patch
Description: Binary data

Re: duplicate function oid symbols

2020-10-28 Thread John Naylor

On Wed, Oct 28, 2020 at 3:24 PM Tom Lane  wrote:

> and then the negotiation here is only about whether to make this list
> longer.  We don't need to complicate genbki.pl with a new facility.
>

Agreed, and reformat_dat_files.pl must also know about these special
attributes.

-- 
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: duplicate function oid symbols

2020-10-28 Thread John Naylor

On Wed, Oct 28, 2020 at 3:24 PM Tom Lane  wrote:

>
> Nah.  What I'm imagining is just that pg_type.h contains
>
> #ifdef EXPOSE_TO_CLIENT_CODE
>
> /*
>  * Backwards compatibility for ancient random spellings of OID macros.
>  * Don't use these macros in new code.
>  */
> #define CASHOID  MONEYOID
> #define LSNOID   PG_LSNOID
>
> #endif
>

Here is a quick patch implementing this much.

-- 
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


forbid-custom-pg-type-symbols.patch
Description: Binary data

Re: document pg_settings view doesn't display custom options

2020-10-29 Thread John Naylor

On Wed, Oct 28, 2020 at 11:38 PM Fujii Masao 
wrote:

>
>
> On 2020/10/29 3:45, John Naylor wrote:
> > On Wed, Oct 28, 2020 at 2:15 PM John Naylor <
> john.nay...@enterprisedb.com <mailto:john.nay...@enterprisedb.com>> wrote:
> >
> > Starting separate threads to keep from cluttering the TODO list
> thread.
> >
> > Here's a patch for the subject, as mentioned in
> >
> https://www.postgresql.org/message-id/20201027220555.GS4951%40momjian.us
> >
> >
> > I just realized I introduced a typo, so here's v2.
>
> +   The pg_settings view does not display
> +   customized options.
>
> This is true until the module that defines the customized options is
> loaded,
> but not after that. No? For example, pg_settings displays
> pg_stat_statements.max after pg_stat_statements is loaded.
>

True, how about this:

   The pg_settings does not display
   customized options
   that have been set before the relevant extension module has been loaded.

-- 
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: cutting down the TODO list thread

2020-10-30 Thread John Naylor

On Wed, Oct 28, 2020 at 1:57 PM Andres Freund  wrote:

> On 2020-10-28 16:20:03 +0100, Magnus Hagander wrote:
> > I would personally prefer a completely seprate page
>
> Same.
>

Ok, that's two votes for a separate page, and one for a new section on the
same page, so it looks like it's a new page. That being the case, I would
think it logical to move "features we don't want" there. As for the name,
we should probably encompass both "won't fix" bugs and features not wanted.
Maybe "past development ideas" or "not worth doing", but I'm open to better
ideas. Once that's agreed upon, I'll make a new page and migrate the items
over, minus the two that were mentioned upthread.

-- 
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: document pg_settings view doesn't display custom options

2020-10-30 Thread John Naylor

On Thu, Oct 29, 2020 at 11:51 PM Fujii Masao 
wrote:

>
>
> On 2020/10/29 21:54, John Naylor wrote:
>
> > The pg_settings does not display
> > customized options
> > that have been set before the relevant extension module has been
> loaded.
>
> I guess that someone can misread this as
>
>  customized options that have been set before the relevant extension
>  module has been loaded are not displayed even after the module is
> loaded.
>
> So what about the following, instead?
>
>  The pg_settings does not display customized options until the
> extension
>  module that defines them has been loaded.
>
> Also I think this note should be in the different paragraph from the
> paragraph
> of "The pg_settings view cannot be inserted into or deleted from"
> because
> they are different topics. Thought?
>

Agreed on both points. In a separate paragraph, I think it's awkward to
start two consecutive sentences with "The pg_settings view". If we put it
in the previous paragraph we could phrase it like this:

"See Section 20.1 for more information about the various ways to change
these parameters. Customized options are not displayed until the
extension module that defines them has been loaded.

The pg_settings view..."

-- 
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: document pg_settings view doesn't display custom options

2020-10-30 Thread John Naylor

On Fri, Oct 30, 2020 at 12:07 PM Tom Lane  wrote:

> John Naylor  writes:
> > On Thu, Oct 29, 2020 at 11:51 PM Fujii Masao <
> masao.fu...@oss.nttdata.com>
> > wrote:
> >> Also I think this note should be in the different paragraph from the
> >> paragraph
> >> of "The pg_settings view cannot be inserted into or deleted from"
> >> because
> >> they are different topics. Thought?
>
> > Agreed on both points. In a separate paragraph, I think it's awkward to
> > start two consecutive sentences with "The pg_settings view". If we put it
> > in the previous paragraph we could phrase it like this:
>
> > "See Section 20.1 for more information about the various ways to change
> > these parameters. Customized options are not displayed until the
> > extension module that defines them has been loaded.
>
> That just moves the subject-inconsistency to a different para :-(
> I think this item should be its own new para.
>
> As for the repetitiveness, we could just say "This view ...", in one or
> even both paras.
>

Okay, along those lines here's a patch using "this view" in a new paragraph
for simplicity.

-- 
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


v3-doc-pg-settings-no-custom.patch
Description: Binary data

Re: document pg_settings view doesn't display custom options

2020-10-30 Thread John Naylor

On Fri, Oct 30, 2020 at 12:48 PM Tom Lane  wrote:

> John Naylor  writes:
> > Okay, along those lines here's a patch using "this view" in a new
> paragraph
> > for simplicity.
>
> Basically OK with me, but ...
>
> 
> It seems fairly weird to use a nonspecific reference first and then a
> specific one.  That is, I'd expect to read "The pg_settings view ..."
> and then "This view ...", not the other way around.  So we could
> put this para second, or put it first but make this para say
> "The pg_settings view ..." while the existing text gets reduced to
> "This view ...".
>
> Or just make them both say "This view ..." so we don't have to have
> this discussion again the next time somebody wants to add a para here.
> 
>

Okay, how's this?

-- 
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


v4-doc-pg-settings-no-custom.patch
Description: Binary data

document deviation from standard on REVOKE ROLE

2020-10-30 Thread John Naylor

This is the other doc fix as suggested in
https://www.postgresql.org/message-id/20201027220555.GS4951%40momjian.us

There is already a compatibility section, so put there.
-- 
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


v1-doc-fix-revoke-role.patch
Description: Binary data

Re: [proposal] de-TOAST'ing using a iterator

2020-11-02 Thread John Naylor

On Mon, Nov 2, 2020 at 1:30 PM Alvaro Herrera 
wrote:

> On 2020-Nov-02, Anastasia Lubennikova wrote:
>
> > Status update for a commitfest entry.
> >
> > This entry was inactive for a very long time.
> > John, are you going to continue working on this?
>

Not in the near future. For background, this was a 2019 GSoC project where
I was reviewer of record, and the patch is mostly good, but there is some
architectural awkwardness. I have tried to address that, but have not had
success.

> > The last message mentions some open issues, namely backend crashes, so I
> move it to "Waiting on author".
>
> As I understand, the patch he posted is fine -- it only crashes when he
> tried a change I suggested.

That's my recollection as well.

-- 
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: cutting down the TODO list thread

2020-11-03 Thread John Naylor

I wrote:

>
> Ok, that's two votes for a separate page, and one for a new section on the
> same page, so it looks like it's a new page. That being the case, I would
> think it logical to move "features we don't want" there. As for the name,
> we should probably encompass both "won't fix" bugs and features not wanted.
> Maybe "past development ideas" or "not worth doing", but I'm open to better
> ideas. Once that's agreed upon, I'll make a new page and migrate the items
> over, minus the two that were mentioned upthread.
>

Hearing no preference, I've created

https://wiki.postgresql.org/wiki/Not_Worth_Doing

...with links between the two. I've moved over the items I suggested
upthread, minus the two where I heard feedback otherwise (prevent replay of
query cancel packets and improve WAL replay of CREATE TABLESPACE)

I have patches for documenting some behavior we won't fix in [1][2].

I was thinking of not having the next updates during commitfest, but it
could also be argued this is a type of review, and the things here will be
returned with feedback or rejected, in a way. Ultimately, it comes down to
"when time permits".

[1]
https://www.postgresql.org/message-id/flat/CAFBsxsGsBZsG%3DcLM0Op5HFb2Ks6SzJrOc_eRO_jcKSNuqFRKnQ%40mail.gmail.com
[2]
https://www.postgresql.org/message-id/CAFBsxsEmg=kqrekxrlygy0ujcfyck4vgxzkalrwh_olfj8o...@mail.gmail.com

-- 
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Move catalog toast table and index declarations

2020-11-05 Thread John Naylor

On Thu, Nov 5, 2020 at 4:24 AM Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 2020-10-27 13:12, John Naylor wrote:
> > There's nothing wrong; it's just a minor point of consistency. For the
> > first part, I mean defined symbols in this file that are invisible to
> > the C compiler are written
> >
> > #define SOMETHING()
> >
> > If some are written
> >
> > #define SOMETHING() extern int no_such_variable
> >
> > I imagine some future reader will wonder why there's a difference.
>
> The difference is that CATALOG() is followed in actual use by something
> like
>
>  { ... } FormData_pg_attribute;
>
> so it becomes a valid C statement.  For DECLARE_INDEX() etc., we need to
> do something else to make it valid.  I guess this could be explained in
> more detail (as I'm attempting in this email), but this isn't materially
> changed by this patch.
>

I think we're talking past eachother. Here's a concrete example:

#define BKI_ROWTYPE_OID(oid,oidmacro)
#define DECLARE_TOAST(name,toastoid,indexoid) extern int no_such_variable

I understand these to be functionally equivalent as far as what the C
compiler sees. If not, I'd be curious to know what the difference is. I was
thinking this is just a random style difference, and if so, they should be
the same now that they're in the same file together:

#define BKI_ROWTYPE_OID(oid,oidmacro)
#define DECLARE_TOAST(name,toastoid,indexoid)

And yes, this doesn't materially change the patch, it's just nitpicking :-)
. Materially, I believe it's fine.

-- 
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Move catalog toast table and index declarations

2020-11-06 Thread John Naylor

On Thu, Nov 5, 2020 at 2:20 PM Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 2020-11-05 12:59, John Naylor wrote:
> > I think we're talking past eachother. Here's a concrete example:
> >
> > #define BKI_ROWTYPE_OID(oid,oidmacro)
> > #define DECLARE_TOAST(name,toastoid,indexoid) extern int no_such_variable
> >
> > I understand these to be functionally equivalent as far as what the C
> > compiler sees.
>
> The issue is that you can't have a bare semicolon at the top level of a
> C compilation unit, at least on some compilers.  So doing
>
> #define FOO(stuff) /*empty*/
>
> and then
>
> FOO(123);
>
> won't work.  You need to fill the definition of FOO with some stuff to
> make it valid.
>
> BKI_ROWTYPE_OID on the other hand is not used at the top level like
> this, so it can be defined to empty.
>

Thank you for the explanation.

-- 
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: speed up unicode decomposition and recomposition

2020-11-06 Thread John Naylor

There is a latent bug in the way code pairs for recomposition are sorted
due to a copy-pasto on my part. Makes no difference now, but it could in
the future. While looking, it seems pg_bswap.h should actually be
backend-only. Both fixed in the attached.

-- 
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


minor-fixes-in-unicode-norm.patch
Description: Binary data

Re: document pg_settings view doesn't display custom options

2020-11-09 Thread John Naylor

On Mon, Nov 9, 2020 at 2:12 AM Fujii Masao 
wrote:

> Pushed. Thanks!
>

Thank you!

-- 
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP: BRIN multi-range indexes

2020-11-09 Thread John Naylor

ter, the reference for this

/* upper bound of number of primes below limit */
/* WIP: reference for this number */
int numprimes = 1.26 * limit / log(limit);

is

Rosser, J. Barkley; Schoenfeld, Lowell (1962). "Approximate formulas for
some functions of prime numbers". Illinois J. Math. 6: 64–94.
doi:10.1215/ijm/1255631807

More precisely, it's 30*log(113)/113 rounded up.

--
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP: BRIN multi-range indexes

2020-11-09 Thread John Naylor

On Mon, Nov 9, 2020 at 1:39 PM Tomas Vondra 
wrote:
>
>
> While investigating the failures, I've tried increasing the values a
> lot, without observing any measurable increase in runtime. IIRC I've
> even used (10 * target_partlen) or something like that. That tells me
> it's not very sensitive part of the code, so I'd suggest to simply use
> something that we know is large enough to be safe.

Okay, then it's not worth being clever.

> For example, the largest bloom filter we can have is 32kB, i.e. 262kb,
> at which point the largest gap is less than 95 (per the gap table). And
> we may use up to BLOOM_MAX_NUM_PARTITIONS, so let's just use
> BLOOM_MAX_NUM_PARTITIONS * 100

Sure.

> FWIW I wonder if we should do something about bloom filters that we know
> can get larger than page size. In the example I used, we know that
> nbits=575104 is larger than page, so as the filter gets more full (and
> thus more random and less compressible) it won't possibly fit. Maybe we
> should reject that right away, instead of "delaying it" until later, on
> the basis that it's easier to fix at CREATE INDEX time (compared to when
> inserts/updates start failing at a random time).

Yeah, I'd be inclined to reject that right away.

> The problem with this is of course that if the index is multi-column,
> this may not be strict enough (i.e. each filter would fit independently,
> but the whole index row is too large). But it's probably better to do at
> least something, and maybe improve that later with some whole-row check.

A whole-row check would be nice, but I don't know how hard that would be.

As a Devil's advocate proposal, how awful would it be to not allow
multicolumn brin-bloom indexes?

--
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Clean up optional rules in grammar

2020-11-11 Thread John Naylor

On Wed, Nov 11, 2020 at 5:13 AM Peter Eisentraut <
peter.eisentr...@enterprisedb.com> wrote:

> It's obviously confusing to have multiple different styles to do the
> same thing.  And these extra rules (including the empty ones) also end
> up in the output, so they create more work down the line.
>
> The attached patch cleans this up to make them all look like the first
> style.
>

+1 for standardizing in this area. It's worth noting that Bison 3.0
introduced %empty for this situation, which is self-documenting. Until we
get there, this is a good step.

-- 
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: cutting down the TODO list thread

2020-11-11 Thread John Naylor

On Tue, Nov 10, 2020 at 7:08 PM Bruce Momjian  wrote:

> On Tue, Nov  3, 2020 at 02:06:13PM -0400, John Naylor wrote:
> > I was thinking of not having the next updates during commitfest, but it
> could
> > also be argued this is a type of review, and the things here will be
> returned
> > with feedback or rejected, in a way. Ultimately, it comes down to "when
> time
> > permits".
>
> I don't understand what this is referring to.  Thanks for the rest of
> the work.
>

This was awkwardly phrased, but I was concerned future proposals for
removal would be easy to miss during commitfest. At this point, I'm
thinking it isn't an issue.

-- 
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: cutting down the TODO list thread

2020-11-11 Thread John Naylor

Here is the next section on data types, proposed to be moved to the "not
worth doing" page. As before, if there are any objections, do speak up.
I'll make the move in a few days.

**Datatypes

- Fix data types where equality comparison is not intuitive, e.g. box
  There is likely no way to do this without breaking applications. We
already have a big warning about this in the docs.

- Add IMMUTABLE column attribute
  Controversial

*Domains (this entire section would go)

- Allow functions defined as casts to domains to be called during casting
- Allow values to be cast to domain types
- Make domains work better with polymorphic functions
  Old and difficult

*Date/Time

- Allow TIMESTAMP WITH TIME ZONE to store the original timezone information
- Have timestamp subtraction not call justify_hours()
  Very old

- Allow a comma to denote fractional seconds in ISO-8601-compliant times
(and timestamps)
  Apparent lack of interest

*Arrays

- Add function to detect if an array is empty
- Improve handling of NULLs in arrays
  Lack of interest

*Money (this entire section would go)

- Add locale-aware MONEY type, and support multiple currencies
- MONEY dumps in a locale-specific format making it difficult to restore to
a system with a different locale
  The money datatype seems kind of obsolete anyway, and there doesn't seem
to be demand to improve it.

*Text Search

- Allow dictionaries to change the token that is passed on to later
dictionaries
- Consider a function-based API for '@@' searches
  Very old

- Improve text search error messages
  One of the gripes has been fixed already, in any case it's very old

- tsearch and tsdicts regression tests fail in Turkish locale on glibc
  Bug report that refers to locale behavior from 2009

- Improve handling of dash and plus signs in email address user names, and
perhaps improve URL parsing
  Difficult

*XML (difficult section -- plenty of bugs which should be fixed, but also
old and low interest)

- Allow XML arrays to be cast to other data types
  Very old

- Allow reliable XML operation non-UTF8 server encodings
  Difficult

- Move XSLT from contrib/xml2 to a more reasonable location
  Lack of consensus

- Improve the XSLT parameter passing API
  Lack of consensus

- xpath_table needs to be implemented/implementable to get rid of
contrib/xml2
- xpath_table is pretty broken anyway
  Unclear path forward

- better handling of XPath data types
- Improve handling of PIs and DTDs in xmlconcat()
  Zero interest

- Restructure XML and /contrib/xml2 functionality
  As discussed in the thread, it's an unrealistically large project


-- 
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: cutting down the TODO list thread

2020-11-16 Thread John Naylor

On Wed, Nov 11, 2020 at 4:45 PM John Naylor 
wrote:

> Here is the next section on data types, proposed to be moved to the "not
> worth doing" page. As before, if there are any objections, do speak up.
> I'll make the move in a few days.
>

Hearing no objection, these have been moved over.

-- 
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: cutting down the TODO list thread

2020-11-18 Thread John Naylor

Here are the next couple of sections with items proposed to be moved to the
"not worth doing" page. As before, if there are any objections, let me
know. I'll make the move in a few days.

Also, since 13 has been released, I'll change the explanation of Done items
to "will appear in the PostgreSQL 14 release." Also, since that wasn't
updated, it's not immediately clear to me which release the [D] marking for
"improve setting of visibility map bits for read-only and insert-only
workloads" refers to. Does anyone know which commit that is?


*Functions

- Enforce typmod for function inputs, function results and parameters for
spi_prepare'd statements called from PLs
  Lack of consensus

- Reduce memory usage of aggregates in set returning functions
  The issue and proposed patch is likely no longer an important thing to
improve in this area nowadays.

- Fix /contrib/ltree operator
  Bug from 2007 with zero followup

- Fix /contrib/btree_gist's implementation of inet indexing
  Bug from 2010 and apparent lack of interest


*Character Formatting (this entire section would be removed)

- Allow to_date() and to_timestamp() to accept localized month names
  The following attempts to pick this from the TODO list in 2008 didn't go
anywhere:

https://www.postgresql.org/message-id/flat/010401c86788%246f1ddb60%240a01a8c0%40gevmus

https://www.postgresql.org/message-id/flat/CA%2BheTbrDQ6b0Am_mk0dJEcwNxwQz%2Br%3Daz_%3DzHTva%2B5BDnfOKjA%40mail.gmail.com

- Add missing parameter handling in to_char()
  Very old

- Throw an error from to_char() instead of printing a string of "#" when a
number doesn't fit in the desired output format.
  Lack of consensus

- Fix to_number() handling for values not matching the format string
  Large amount of work for questionable benefit


*Multi-Language Support

- Add a cares-about-collation column to pg_proc, so that
unresolved-collation errors can be thrown at parse time
  Proposed while listing open items during 9.1. Doesn't seem to justify the
amount of work it would take.

- Add octet_length_server() and octet_length_client()
- Make octet_length_client() the same as octet_length()?
  Very old

- Improve UTF8 combined character handling?
  Too vague

- Fix problems with wrong runtime encoding conversion for NLS message files
  What problems? There is no discussion thread.

- More sensible support for Unicode combining characters, normal forms
  We have normalization as of PG13, so I propose to mark this Done rather
than move it.

--
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: cutting down the TODO list thread

2020-11-19 Thread John Naylor

On Wed, Nov 18, 2020 at 2:42 PM Bruce Momjian  wrote:

> On Wed, Nov 18, 2020 at 02:26:46PM -0400, John Naylor wrote:
> > Here are the next couple of sections with items proposed to be moved to
> the
> > "not worth doing" page. As before, if there are any objections, let me
> know.
> > I'll make the move in a few days.
> >
> > Also, since 13 has been released, I'll change the explanation of Done
> items to
> > "will appear in the PostgreSQL 14 release." Also, since that wasn't
> updated,
>
> Yes, please do that.  I didn't go through this for PG 13 since you were
> already working in this area.
>

OK, I'll do that and remove items done during or before the PG13 cycle.

-- 
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: cutting down the TODO list thread

2020-11-19 Thread John Naylor

On Wed, Nov 18, 2020 at 3:05 PM Tom Lane  wrote:

> John Naylor  writes:
> > Here are the next couple of sections with items proposed to be moved to
> the
> > "not worth doing" page. As before, if there are any objections, let me
> > know. I'll make the move in a few days.
>
> > - Fix /contrib/ltree operator
> >   Bug from 2007 with zero followup
>
> Actually, I believe this was fixed by 70dc4c509; at least, the
> case shown in the bug report now yields "false" as expected.
>

I'll remove this item.


> > - Fix /contrib/btree_gist's implementation of inet indexing
> >   Bug from 2010 and apparent lack of interest
>
> This one's pretty clearly a bug.  Lack of interest or no, we
> should keep it around.
>

Okay.


> > - Allow to_date() and to_timestamp() to accept localized month names
>
> This is done too, see d67755049.
>

I'll remove this too.

-- 
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Should we document IS [NOT] OF?

2020-11-20 Thread John Naylor

On Thu, Nov 19, 2020 at 6:43 PM Tom Lane  wrote:

> After digging a bit more I noticed that we'd discussed removing
> IS OF in the 2007 thread, but forebore because there wasn't an easy
> replacement.  pg_typeof() was added a year later (b8fab2411), so we
> could have done this at any point since then.
>
> Pushed.
>

Documenting or improving IS OF was a TODO, so I've removed that entry.

-- 
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: cutting down the TODO list thread

2020-11-23 Thread John Naylor

With the exception of "Fix /contrib/btree_gist's implementation of inet
indexing", all items above have been either moved over, or removed if they
were done already by PG13.

-- 
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: truncating timestamps on arbitrary intervals

2020-11-23 Thread John Naylor

On Thu, Nov 12, 2020 at 9:56 AM Peter Eisentraut <
peter.eisentr...@enterprisedb.com> wrote:
>
> On 2020-06-30 06:34, John Naylor wrote:
> > In v9, I've simplified the patch somewhat to make it easier for future
> > work to build on.
> >
> > - When truncating on month-or-greater intervals, require the origin to
> > align on month. This removes the need to handle weird corner cases
> > that have no straightforward behavior.
> > - Remove hackish and possibly broken code to allow origin to be after
> > the input timestamp. The default origin is Jan 1, 1 AD, so only AD
> > dates will behave correctly by default. This is not enforced for now,
> > since it may be desirable to find a way to get this to work in a nicer
> > way.
> > - Rebase docs over PG13 formatting changes.
>
> This looks pretty solid now.  Are there any more corner cases and other
> areas with unclear behavior that you are aware of?

Hi Peter,

Thanks for taking a look!

I believe there are no known corner cases aside from not throwing an error
if origin > input, but I'll revisit that when we are more firm on what
features we want support.

> A couple of thoughts:
>
> - After reading the discussion a few times, I'm not so sure anymore
> whether making this a cousin of date_trunc is the right way to go.  As
> you mentioned, there are some behaviors specific to date_trunc that
> don't really make sense in date_trunc_interval, and maybe we'll have
> more of those.

As far as the behaviors, I'm not sure exactly what you what you were
thinking of, but here are two issues off the top of my head:

- If the new functions are considered variants of date_trunc(), there is
the expectation that the options work the same way, in particular the
timezone parameter. You asked specifically about that below, so I'll
address that separately.
- In the "week" case, the boundary position depends on the origin, since a
week-long interval is just 7 days.

> Also, date_trunc_interval isn't exactly a handy name.
> Maybe something to think about.  It's obviously fairly straightforward
> to change it.

Effectively, it puts timestamps into bins, so maybe date_bin() or something
like that?

> - There were various issues with the stride interval having months and
> years.  I'm not sure we even need that.  It could be omitted unless you
> are confident that your implementation is now sufficient.

Months and years were a bit tricky, so I'd be happy to leave that out if
there is not much demand for it. date_trunc() already has quarters,
decades, centuries, and millenia.

> - Also, negative intervals could be prohibited, but I suppose that
> matters less.

Good for the sake of completeness. I think they happen to work in v9 by
accident, but it would be better not to expose that.

> - I'm curious about the origin being set to 0001-01-01.  This seems to
> work correctly in that it sets the origin to a Monday, which is what we
> wanted, but according to Google that day was a Saturday.  Something to
> do with Julian vs. Gregorian calendar?

Right, working backwards from our calendar today, it's Monday, but at the
time it would theoretically be Saturday, barring leap year miscalculations.

> Maybe we should choose a date
> that is a bit more recent and easier to reason with.

2001-01-01 would also be a Monday aligned with centuries and millenia, so
that would be my next suggestion. If we don't care to match with
date_trunc() on those larger units, we could also use 1900-01-01, so the
vast majority of dates in databases are after the origin.

> - Then again, I'm thinking that maybe we should make the origin
> mandatory.  Otherwise, the default answers when having strides larger
> than a day are entirely arbitrary, e.g.,
>
> => select date_trunc_interval('10 year', '0196-05-20 BC'::timestamp);
> 0190-01-01 00:00:00 BC
>
> => select date_trunc_interval('10 year', '0196-05-20 AD'::timestamp);
> 0191-01-01 00:00:00

Right. In the first case, the default origin is also after the input, and
crosses the AD/BC boundary. Tricky to get right.

> Perhaps the origin could be defaulted if the interval is less than a day
> or something like that.

If we didn't allow months and years to be units, it seems the default would
always make sense?

> - I'm wondering whether we need the date_trunc_interval(interval,
> timestamptz, timezone) variant.  Isn't that the same as
> date_trunc_interval(foo AT ZONE xyz, value)?

I based this on 600b04d6b5ef6 for date_trunc(), whose message states:

date_trunc(field, timestamptz, zone_name)

is the same as

date_trunc(field, timestamptz at time zone zone_name) at time zone zone_name

so without the shorthand, you need to specify the timezone twice, once for
the calculation, and once for the output.

--
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP: BRIN multi-range indexes

2021-01-12 Thread John Naylor

On Sat, Dec 19, 2020 at 8:15 PM Tomas Vondra 
wrote:
> [12-20 version]

Hi Tomas,

The measurements look good. In case it fell through the cracks, my earlier
review comments for Bloom BRIN indexes regarding minor details don't seem
to have been addressed in this version. I'll point to earlier discussion
for convenience:

https://www.postgresql.org/message-id/CACPNZCt%3Dx-fOL0CUJbjR3BFXKgcd9HMPaRUVY9cwRe58hmd8Xg%40mail.gmail.com

https://www.postgresql.org/message-id/CACPNZCuqpkCGt8%3DcywAk1kPu0OoV_TjPXeV-J639ABQWyViyug%40mail.gmail.com

> The improvements are fairly minor:
>
> 1) Rejecting bloom filters that are clearly too large (larger than page)
> early. This is imperfect, as it works for individual index keys, not the
> whole row. But per discussion it seems useful.

I think this is good enough.

> So based on this I'm tempted to just use the version with two hashes, as
> implemented in 0005. It's much simpler than the partitioning scheme,
> does not need any of the logic to generate primes etc.

Sounds like the best engineering decision.

Circling back to multi-minmax build times, I ran a couple quick tests on
bigger hardware, and found that not only is multi-minmax slower than
minmax, which is to be expected, but also slower than btree. (unlogged
table ~12GB in size, maintenance_work_mem = 1GB, median of three runs)

btree  38.3s
minmax 26.2s
multi-minmax  110s

Since btree indexes are much larger, I imagine something algorithmic is
involved. Is it worth digging further to see if some code path is taking
more time than we would expect?

--
John Naylor
EDB: http://www.enterprisedb.com

outdated references to replication timeout

2021-01-12 Thread John Naylor

Hi,

The parameter replication_timeout was retired in commit 6f60fdd701 back in
2012, but some comments and error messages seem to refer to that old
setting instead of wal_sender_timeout or wal_receiver_timeout. The attached
patch replaces the old language with more specific references.

-- 
John Naylor
EDB: http://www.enterprisedb.com


stray-repl-timeout.patch
Description: Binary data

Re: outdated references to replication timeout

2021-01-13 Thread John Naylor

On Tue, Jan 12, 2021 at 9:37 PM Fujii Masao  wrote:
>
> Thanks for the patch! I think this change makes sense.
>
> -   (errmsg("terminating walsender process
> due to replication timeout")));
> +   (errmsg("terminating walsender process
> due to WAL sender timeout")));
>
> Isn't it a bit strange to include different expressions "walsender" and
> "WAL sender" for the same thing in one message?

It is strange, now that I think about it. My thinking was that the former
wording of "replication timeout" was a less literal reference to the
replication_timeout parameter, so I did the same for wal_sender_timeout. A
quick look shows we are not consistent in the documentation as far
as walsender vs. WAL sender. For the purpose of the patch I agree it should
be consistent within a single message. Maybe the parameter should be
spelled exactly as is, with underscores? I'll take a broader look and send
an updated patch.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: outdated references to replication timeout

2021-01-14 Thread John Naylor

On Thu, Jan 14, 2021 at 1:55 AM Michael Paquier  wrote:
>
> On Wed, Jan 13, 2021 at 11:28:55PM +0900, Fujii Masao wrote:
> > On Wed, Jan 13, 2021 at 10:51 PM John Naylor <
john.nay...@enterprisedb.com> wrote:
> >> It is strange, now that I think about it. My thinking was that the
> >> former wording of "replication timeout" was a less literal
> >> reference to the replication_timeout parameter, so I did the same
> >> for wal_sender_timeout. A quick look shows we are not consistent
> >> in the documentation as far as walsender vs. WAL sender. For the
> >> purpose of the patch I agree it should be consistent within a
> >> single message. Maybe the parameter should be spelled exactly as
> >> is, with underscores?
> >
> > I'm ok with that. But there seems no other timeout messages using
> > the parameter name.
>
> Could it be that nothing needs to change here because this refers to
> timeouts with the replication protocol?  The current state of things
> on HEAD does not sound completely incorrect to me either.

It was off enough to cause brief confusion during a customer emergency. To
show a bit more context around one of the proposed corrections:

 timeout = TimestampTzPlusMilliseconds(last_reply_timestamp,
 wal_sender_timeout);

if (wal_sender_timeout > 0 && last_processing >= timeout)
{
/*
* Since typically expiration of replication timeout means
* communication problem, we don't send the error message to the
* standby.
*/
ereport(COMMERROR,
(errmsg("terminating walsender process due to replication timeout")));

My reading is, this is a case where the message didn't keep up with the
change in parameter name.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: truncating timestamps on arbitrary intervals

2021-01-18 Thread John Naylor

On Mon, Nov 23, 2020 at 1:44 PM John Naylor 
wrote:
>
> On Thu, Nov 12, 2020 at 9:56 AM Peter Eisentraut <
peter.eisentr...@enterprisedb.com> wrote:
> > - After reading the discussion a few times, I'm not so sure anymore
> > whether making this a cousin of date_trunc is the right way to go.  As
> > you mentioned, there are some behaviors specific to date_trunc that
> > don't really make sense in date_trunc_interval, and maybe we'll have
> > more of those.

For v10, I simplified the behavior by decoupling the behavior from
date_trunc() and putting in some restrictions as discussed earlier. It's
much simpler now. It could be argued that it goes too far in that
direction, but it's easy to reason about and we can put back some features
as we see fit.

> > Also, date_trunc_interval isn't exactly a handy name.
> > Maybe something to think about.  It's obviously fairly straightforward
> > to change it.
>
> Effectively, it puts timestamps into bins, so maybe date_bin() or
something like that?

For v10 I went with date_bin() so we can see how that looks.

> > - There were various issues with the stride interval having months and
> > years.  I'm not sure we even need that.  It could be omitted unless you
> > are confident that your implementation is now sufficient.
>
> Months and years were a bit tricky, so I'd be happy to leave that out if
there is not much demand for it. date_trunc() already has quarters,
decades, centuries, and millenia.

I removed months and years for this version, but that can be reconsidered
of course. The logic is really simple now.

> > - Also, negative intervals could be prohibited, but I suppose that
> > matters less.

I didn't go this far, but probably should before long.

> > - Then again, I'm thinking that maybe we should make the origin
> > mandatory.  Otherwise, the default answers when having strides larger
> > than a day are entirely arbitrary, e.g.,

I've tried this and like the resulting simplification.

> > - I'm wondering whether we need the date_trunc_interval(interval,
> > timestamptz, timezone) variant.  Isn't that the same as
> > date_trunc_interval(foo AT ZONE xyz, value)?
>
> I based this on 600b04d6b5ef6 for date_trunc(), whose message states:
>
> date_trunc(field, timestamptz, zone_name)
>
> is the same as
>
> date_trunc(field, timestamptz at time zone zone_name) at time zone
zone_name
>
> so without the shorthand, you need to specify the timezone twice, once
for the calculation, and once for the output.

In light of making the origin mandatory, it no longer makes sense to have a
time zone parameter, since we can specify the time zone on the origin; and
if desired on the output as well.

--
John Naylor
EDB: http://www.enterprisedb.com


v10-datetrunc-interval.patch
Description: Binary data

Re: WIP: BRIN multi-range indexes

2021-01-19 Thread John Naylor

On Tue, Jan 12, 2021 at 1:42 PM Tomas Vondra 
wrote:
> I suspect it'd due to minmax having to decide which "ranges" to merge,
> which requires repeated sorting, etc. I certainly don't dare to claim
> the current algorithm is perfect. I wouldn't have expected such big
> difference, though - so definitely worth investigating.

It seems that monotonically increasing (or decreasing) values in a table
are a worst case scenario for multi-minmax indexes, or basically, unique
values within a range. I'm guessing it's because it requires many passes
to fit all the values into a limited number of ranges. I tried using
smaller pages_per_range numbers, 32 and 8, and that didn't help.

Now, with a different data distribution, using only 10 values that repeat
over and over, the results are much more sympathetic to multi-minmax:

insert into iot (num, create_dt)
select random(), '2020-01-01 0:00'::timestamptz + (x % 10 || '
seconds')::interval
from generate_series(1,5*365*24*60*60) x;

create index cd_single on iot using brin(create_dt);
27.2s

create index cd_multi on iot using brin(create_dt
timestamptz_minmax_multi_ops);
30.4s

create index cd_bt on iot using btree(create_dt);
61.8s

Circling back to the monotonic case, I tried running a simple perf record
on a backend creating a multi-minmax index on a timestamptz column and
these were the highest non-kernel calls:
+   21.98%21.91%  postgres postgres[.]
FunctionCall2Coll
+9.31% 9.29%  postgres postgres[.]
compare_combine_ranges
+8.60% 8.58%  postgres postgres[.] qsort_arg
+5.68% 5.66%  postgres postgres[.]
brin_minmax_multi_add_value
+5.63% 5.60%  postgres postgres[.] timestamp_lt
+4.73% 4.71%  postgres postgres[.]
reduce_combine_ranges
+3.80% 0.00%  postgres [unknown]   [.]
0x032001680004
+3.51% 3.50%  postgres postgres[.] timestamp_eq

There's no one place that's pathological enough to explain the 4x slowness
over traditional BRIN and nearly 3x slowness over btree when using a large
number of unique values per range, so making progress here would have to
involve a more holistic approach.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: WIP: BRIN multi-range indexes

2021-01-22 Thread John Naylor

On Thu, Jan 21, 2021 at 9:06 PM Tomas Vondra 
wrote:
> [wip optimizations]

> The pathological case (monotonic-asc) is now 4x faster, roughly equal to
> regular minmax index build. The opposite (monotonic-desc) is about 33%
> faster, roughly in line with btree.

Those numbers look good. I get similar results, shown below. I've read
0007-8 briefly but not in depth.

> There are a couple cases where it's actually a bit slower - those are
> the cases with very few distinct values per range. I believe this
> happens because in the batch mode the code does not check if the summary
> already contains this value, adds it to the buffer and the last step
> ends up being more expensive than this.

I think if it's worst case a bit faster than btree and best case a bit
slower than traditional minmax, that's acceptable.

> I believe there's some "compromise" between those two extremes, i.e. we
> should use buffer that is too small or too large, but something in
> between, so that the reduction happens once in a while but not too often
> (as with the original aggressive approach).

This sounds good also.

> FWIW, none of this is likely to be an issue in practice, because (a)
> tables usually don't have such strictly monotonic patterns, (b) people
> should stick to plain minmax for cases that do.

Still, it would be great if multi-minmax can be a drop in replacement. I
know there was a sticking point of a distance function not being available
on all types, but I wonder if that can be remedied or worked around somehow.

And (c) regular tables
> tend to have much wider rows, so there are fewer values per range (so
> that other stuff is likely more expensive than building BRIN).

True. I'm still puzzled that it didn't help to use 8 pages per range, but
it's moot now.


Here are some numbers (median of 3) with a similar scenario as before,
repeated here with some added details. I didn't bother with what you call
"unpatched":

   btree   minmax   multi
monotonic-asc  44.426.5 27.8
mono-del-ins   38.724.6 30.4
mono-10-asc61.825.6 33.5


create unlogged table iot (
id bigint generated by default as identity primary key,
num double precision not null,
create_dt timestamptz not null,
stuff text generated always as (md5(id::text)) stored
)
with (fillfactor = 95);

-- monotonic-asc:

insert into iot (num, create_dt)
select random(), x
from generate_series(
  '2020-01-01 0:00'::timestamptz,
  '2020-01-01 0:00'::timestamptz +'5 years'::interval,
  '1 second'::interval) x;

-- mono-del-ins:
-- Here I deleted a few values from (likely) each page in the above table,
and reinserted values that shouldn't be in existing ranges:

delete from iot1
where num < 0.05
or num > 0.95;

vacuum iot1;

insert into iot (num, create_dt)
select random(), x
from generate_series(
  '2020-01-01 0:00'::timestamptz,
  '2020-02-01 23:59'::timestamptz,
  '1 second'::interval) x;

-- mono-10-asc

truncate table iot;

insert into iot (num, create_dt)
select random(), '2020-01-01 0:00'::timestamptz + (x % 10 || '
seconds')::interval
from generate_series(1,5*365*24*60*60) x;

--
John Naylor
EDB: http://www.enterprisedb.com

Re: WIP: BRIN multi-range indexes

2021-01-26 Thread John Naylor

Hi Tomas,
I took another look through the Bloom opclass portion (0004) with
sorted_mode omitted, and it looks good to me code-wise. I think this part
is close to commit-ready. I also did one more proofreading pass for
minor details.

+ rows per block). The default values is -0.1, and

+ greater than 0.0 and smaller than 1.0. The default values is 0.01,

s/values/value/

+ * bloom filter, we can easily and cheaply test wheter it contains values

s/wheter/whether/

+ * XXX We assume the bloom filters have the same parameters fow now. In the

s/fow/for/

+ * or null if it does not exists.

s/exists/exist/

+ * We do expect the bloom filter to eventually switch to hashing mode,
+ * and it's bound to be almost perfectly random, so not compressible.

Leftover from when it started out in sorted mode.

+ if ((m/8) > BLCKSZ)

It seems we need something more safe, to account for page header and tuple
header at least. As the comment before says, the filter will eventually not
be compressible. I remember we can't be exact here, with the possibility of
multiple columns, but we can leave a little slack space.

-- 
John Naylor
EDB: http://www.enterprisedb.com

Re: WIP: BRIN multi-range indexes

2021-01-26 Thread John Naylor

On Fri, Jan 22, 2021 at 10:59 PM Tomas Vondra 
wrote:
>
>
> On 1/23/21 12:27 AM, John Naylor wrote:

> > Still, it would be great if multi-minmax can be a drop in replacement. I
> > know there was a sticking point of a distance function not being
> > available on all types, but I wonder if that can be remedied or worked
> > around somehow.
> >
>
> Hmm. I think Alvaro also mentioned he'd like to use this as a drop-in
> replacement for minmax (essentially, using these opclasses as the
> default ones, with the option to switch back to plain minmax). I'm not
> convinced we should do that - though. Imagine you have minmax indexes in
> your existing DB, it's working perfectly fine, and then we come and just
> silently change that during dump/restore. Is there some past example
> when we did something similar and it turned it to be OK?

I was assuming pg_dump can be taught to insert explicit opclasses for
minmax indexes, so that upgrade would not cause surprises. If that's true,
only new indexes would have the different default opclass.

> As for the distance functions, I'm pretty sure there are data types
> without "natural" distance - like most strings, for example. We could
> probably invent something, but the question is how much we can rely on
> it working well enough in practice.
>
> Of course, is minmax even the right index type for such data types?
> Strings are usually "labels" and not queried using range queries,
> although sometimes people encode stuff as strings (but then it's very
> unlikely we'll define the distance definition well). So maybe for those
> types a hash / bloom would be a better fit anyway.

Right.

> But I do have an idea - maybe we can do without distances, in those
> cases. Essentially, the primary issue of minmax indexes are outliers, so
> what if we simply sort the values, keep one range in the middle and as
> many single points on each tail?

That's an interesting idea. I think it would be a nice bonus to try to do
something along these lines. On the other hand, I'm not the one
volunteering to do the work, and the patch is useful as is.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: Perform COPY FROM encoding conversions in larger chunks

2021-01-27 Thread John Naylor

Hi Heikki,

0001 through 0003 are straightforward, and I think they can be committed
now if you like.

0004 is also pretty straightforward. The check you proposed upthread for
pg_upgrade seems like the best solution to make that workable. I'll take a
look at 0005 soon.

I measured the conversions that were rewritten in 0003, and there is indeed
a noticeable speedup:

Big5 to EUC-TW:

head196ms
0001-3  152ms

EUC-TW to Big5:

head190ms
0001-3  144ms

I've attached the driver function for reference. Example use:

select drive_conversion(
  1000, 'euc_tw'::name, 'big5'::name,
  convert('a few kB of utf8 text here', 'utf8', 'euc_tw')
);

I took a look at the test suite also, and the only thing to note is a
couple places where the comment doesn't match the code:

+  -- JIS X 0201: 2-byte encoded chars starting with 0x8e (SS2)
+  byte1 = hex('0e');
+  for byte2 in hex('a1')..hex('df') loop
+return next b(byte1, byte2);
+  end loop;
+
+  -- JIS X 0212: 3-byte encoded chars, starting with 0x8f (SS3)
+  byte1 = hex('0f');
+  for byte2 in hex('a1')..hex('fe') loop
+for byte3 in hex('a1')..hex('fe') loop
+  return next b(byte1, byte2, byte3);
+end loop;
+  end loop;

Not sure if it matters , but thought I'd mention it anyway.

--
John Naylor
EDB: http://www.enterprisedb.com


drive_conversion.c
Description: Binary data

Re: Perform COPY FROM encoding conversions in larger chunks

2021-01-30 Thread John Naylor

On Thu, Jan 28, 2021 at 7:36 AM Heikki Linnakangas  wrote:
>
> Even more surprising was that the second patch
> (0002-Replace-pg_utf8_verifystr-with-a-faster-implementati.patch)
> actually made things worse again. I thought it would give a modest gain,
> but nope.

Hmm, that surprised me too.

> Based on these results, I'm going to commit the first patch, but not the
> second one. There are much faster UTF-8 verification routines out there,
> using SIMD instructions and whatnot, and we should consider adopting one
> of those, but that's future work.

I have something in mind for that.

I took a look at v2, and for the first encoding I tried, it fails to report
the error for invalid input:

create database euctest WITH ENCODING 'EUC_CN' LC_COLLATE='zh_CN.eucCN'
LC_CTYPE='zh_CN.eucCN' TEMPLATE=template0;

\c euctest
create table foo (a text);

master:

euctest=# copy foo from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> ä
>> \.
ERROR:  character with byte sequence 0xc3 0xa4 in encoding "UTF8" has no
equivalent in encoding "EUC_CN"
CONTEXT:  COPY foo, line 1

patch:

euctest=# copy foo from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> ä
>> \.
COPY 0
euctest=#

I believe the problem is in UtfToLocal(). I've attached a fix formatted as
a text file to avoid confusing the cfbot. The fix keeps the debugging
ereport() in case you find it useful. Some additional test coverage might
be good here, but not sure how much work that would be. I didn't check any
other conversions yet.

v2-0002 seems fine to me, I just have cosmetic comments here:

+ * the same, no conversion is required by we must still validate that the

s/by/but/

This comment in copyfrom_internal.h above the *StateData struct is the same
as the corresponding one in copyto.c:

 * Multi-byte encodings: all supported client-side encodings encode
multi-byte
 * characters by having the first byte's high bit set. Subsequent bytes of
the
 * character can have the high bit not set. When scanning data in such an
 * encoding to look for a match to a single-byte (ie ASCII) character, we
must
 * use the full pg_encoding_mblen() machinery to skip over multibyte
 * characters, else we might find a false match to a trailing byte. In
 * supported server encodings, there is no possibility of a false match, and
 * it's faster to make useless comparisons to trailing bytes than it is to
 * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is
true
 * when we have to do it the hard way.

The references to pg_encoding_mblen() and encoding_embeds_ascii, are out of
date for copy-from. I'm not sure the rest is relevant to copy-from anymore,
either. Can you confirm?

This comment inside the struct is now out of date as well:

 * Similarly, line_buf holds the whole input line being processed. The
 * input cycle is first to read the whole line into line_buf, convert it
 * to server encoding there, and then extract the individual attribute

HEAD has this macro already:

/* Shorthand for number of unconsumed bytes available in raw_buf */
#define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len -
(cstate)->raw_buf_index)

It might make sense to create a CONVERSION_BUF_BYTES equivalent since the
patch calculates cstate->conversion_buf_len - cstate->conversion_buf_index
in a couple places.

--
John Naylor
EDB: http://www.enterprisedb.com
diff --git a/src/backend/commands/copyfromparse.c 
b/src/backend/commands/copyfromparse.c
index 60dfebb0bd..1681efc7a0 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -397,6 +397,9 @@ CopyConvertBuf(CopyFromState cstate)

   (unsigned char *) cstate->raw_buf + cstate->raw_buf_len,

   dstlen,

   true);
+
+ereport(NOTICE, errmsg_internal("convertedbytes: %d", convertedbytes));
+
if (convertedbytes == 0)
{
/*
diff --git a/src/backend/utils/mb/conv.c b/src/backend/utils/mb/conv.c
index b83358bc7a..26600e68ee 100644
--- a/src/backend/utils/mb/conv.c
+++ b/src/backend/utils/mb/conv.c
@@ -497,7 +497,7 @@ UtfToLocal(const unsigned char *utf, int len,
int l;
const pg_utf_to_local_combined *cp;
const unsigned char *start = utf;
-   const unsigned char *cur = utf;
+   unsigned char *cur = unconstify(unsigned char *, utf);

if (!PG_VALID_ENCODING(encoding))
ereport(ERROR,
@@ -5

Re: WIP: BRIN multi-range indexes

2021-01-30 Thread John Naylor

On Tue, Jan 26, 2021 at 6:59 PM Tomas Vondra 
wrote:
>
>
>
> On 1/26/21 7:52 PM, John Naylor wrote:
> > On Fri, Jan 22, 2021 at 10:59 PM Tomas Vondra
> > mailto:tomas.von...@enterprisedb.com>>
> > wrote:
> >  > Hmm. I think Alvaro also mentioned he'd like to use this as a drop-in
> >  > replacement for minmax (essentially, using these opclasses as the
> >  > default ones, with the option to switch back to plain minmax). I'm
not
> >  > convinced we should do that - though. Imagine you have minmax
indexes in
> >  > your existing DB, it's working perfectly fine, and then we come and
just
> >  > silently change that during dump/restore. Is there some past example
> >  > when we did something similar and it turned it to be OK?
> >
> > I was assuming pg_dump can be taught to insert explicit opclasses for
> > minmax indexes, so that upgrade would not cause surprises. If that's
> > true, only new indexes would have the different default opclass.
> >
>
> Maybe, I suppose we could do that. But I always found such changes
> happening silently in the background a bit suspicious, because it may be
> quite confusing. I certainly wouldn't expect such difference between
> creating a new index and index created by dump/restore. Did we do such
> changes in the past? That might be a precedent, but I don't recall any
> example ...

I couldn't think of a comparable example either. It comes down to
evaluating risk. On the one hand it's nice if users get an enhancement
without having to know about it, on the other hand if there is some kind of
noticeable regression, that's bad.

--
John Naylor
EDB: http://www.enterprisedb.com

[POC] verifying UTF-8 using SIMD instructions

2021-02-01 Thread John Naylor

Hi,

As of b80e10638e3, there is a new API for validating the encoding of
strings, and one of the side effects is that we have a wider choice of
algorithms. For UTF-8, it has been demonstrated that SIMD is much faster at
decoding [1] and validation [2] than the standard approach we use.

It makes sense to start with the ascii subset of UTF-8 for a couple
reasons. First, ascii is very widespread in database content, particularly
in bulk loads. Second, ascii can be validated using the simple SSE2
intrinsics that come with (I believe) any x64-64 chip, and I'm guessing we
can detect that at compile time and not mess with runtime checks. The
examples above using SSE for the general case are much more complicated and
involve SSE 4.2 or AVX.

Here are some numbers on my laptop (MacOS/clang 10 -- if the concept is
okay, I'll do Linux/gcc and add more inputs). The test is the same as
Heikki shared in [3], but I added a case with >95% Chinese characters just
to show how that compares to the mixed ascii/multibyte case.

master:

 chinese | mixed | ascii
-+---+---
1081 |   761 |   366

patch:

 chinese | mixed | ascii
-+---+---
1103 |   498 |51

The speedup in the pure ascii case is nice.

In the attached POC, I just have a pro forma portability stub, and left
full portability detection for later. The fast path is inlined inside
pg_utf8_verifystr(). I imagine the ascii fast path could be abstracted into
a separate function to which is passed a function pointer for full encoding
validation. That would allow other encodings with strict ascii subsets to
use this as well, but coding that abstraction might be a little messy, and
b80e10638e3 already gives a performance boost over PG13.

I also gave a shot at doing full UTF-8 recognition using a DFA, but so far
that has made performance worse. If I ever have more success with that,
I'll add that in the mix.

[1] https://woboq.com/blog/utf-8-processing-using-simd.html
[2]
https://lemire.me/blog/2020/10/20/ridiculously-fast-unicode-utf-8-validation/
[3]
https://www.postgresql.org/message-id/06d45421-61b8-86dd-e765-f1ce527a5...@iki.fi

-- 
John Naylor
EDB: http://www.enterprisedb.com
diff --git a/src/common/wchar.c b/src/common/wchar.c
index 6e7d731e02..12b3a5e1a2 100644
--- a/src/common/wchar.c
+++ b/src/common/wchar.c
@@ -13,6 +13,10 @@
 #include "c.h"
 
 #include "mb/pg_wchar.h"
+#include "port/pg_bitutils.h"
+
+/* FIXME -- should go in src/include/port */
+#include 
 
 
 /*
@@ -1762,6 +1766,80 @@ pg_utf8_verifystr(const unsigned char *s, int len)
 {
 	const unsigned char *start = s;
 
+#ifdef __x86_64__
+
+
+	const __m128i	zero = _mm_setzero_si128();
+	__m128i			chunk,
+	cmp;
+
+	const int		chunk_size = sizeof(__m128i);
+	intzero_mask,
+	highbit_mask,
+	ascii_count,
+	remainder;
+
+	while (len >= chunk_size)
+	{
+		/* load next chunk */
+		chunk = _mm_loadu_si128((const __m128i *) s);
+
+		/* first detect any zero bytes */
+		cmp = _mm_cmpeq_epi8(chunk, zero);
+		zero_mask = _mm_movemask_epi8(cmp);
+
+		/* if there is a zero byte, let the slow path encounter it */
+		if (zero_mask)
+			break;
+
+		/* now check for non-ascii bytes */
+		highbit_mask = _mm_movemask_epi8(chunk);
+
+		if (!highbit_mask)
+		{
+			/* all ascii, so advance to the next chunk */
+			s += chunk_size;
+			len -= chunk_size;
+			continue;
+		}
+
+		/*
+		 * if not all ascii, maybe there is a solid block of ascii
+		 * at the beginning of the chunk. if so, skip it
+		 */
+		ascii_count = pg_rightmost_one_pos32(highbit_mask);
+
+		s += ascii_count;
+		len -= ascii_count;
+		remainder = chunk_size - ascii_count;
+
+		/* found non-ascii, so handle the remainder in the normal way */
+		while (remainder > 0)
+		{
+			int			l;
+
+			/*
+			 * fast path for ASCII-subset characters
+			 * we already know they're non-zero
+			 */
+			if (!IS_HIGHBIT_SET(*s))
+l = 1;
+			else
+			{
+l = pg_utf8_verifychar(s, len);
+if (l == -1)
+	goto finish;
+			}
+			s += l;
+			len -= l;
+			remainder -= l;
+
+		}
+	}
+
+#endif			/* __x86_64__ */
+
+	/* handle last few bytes */
 	while (len > 0)
 	{
 		int			l;
@@ -1770,19 +1848,20 @@ pg_utf8_verifystr(const unsigned char *s, int len)
 		if (!IS_HIGHBIT_SET(*s))
 		{
 			if (*s == '\0')
-break;
+goto finish;
 			l = 1;
 		}
 		else
 		{
 			l = pg_utf8_verifychar(s, len);
 			if (l == -1)
-break;
+goto finish;
 		}
 		s += l;
 		len -= l;
 	}
 
+finish:
 	return s - start;
 }

Re: Perform COPY FROM encoding conversions in larger chunks

2021-02-02 Thread John Naylor

On Mon, Feb 1, 2021 at 12:15 PM Heikki Linnakangas  wrote:
> Thanks. I fixed it slightly differently, and also changed LocalToUtf()
> to follow the same pattern, even though LocalToUtf() did not have the
> same bug.

Looks good to me.

> I added a bunch of tests for various built-in conversions.

Nice! I would like to have utf8 tests for every category of invalid byte
(overlong, surrogate, 5 bytes, etc), but it's not necessary for this patch.

> I spent some time refactoring and adding comments all around the patch,
> hopefully making it all more clear. One notable difference is that I
> renamed 'raw_buf' (which exists in master too) to 'input_buf', and
> renamed 'conversion_buf' to 'raw_buf'. I'm going to read through this
> patch again another day with fresh eyes, and also try to add some tests
> for the corner cases at buffer boundaries.

The comments and renaming are really helpful in understanding that file!

Although a new patch is likely forthcoming, I did take a brief look and
found the following:

In copyfromparse.c, this is now out of date:

 * Read the next input line and stash it in line_buf, with conversion to
 * server encoding.

One of your FIXME comments seems to allude to this, but if we really need a
difference here, maybe it should be explained:

+#define INPUT_BUF_SIZE 65536 /* we palloc INPUT_BUF_SIZE+1 bytes */

+#define RAW_BUF_SIZE 65536 /* allocated size of the buffer */

Lastly, it looks like pg_do_encoding_conversion_buf() ended up in 0003
accidentally?

--
John Naylor
EDB: http://www.enterprisedb.com

Re: Bug in COPY FROM backslash escaping multi-byte chars

2021-02-03 Thread John Naylor

On Wed, Feb 3, 2021 at 8:08 AM Heikki Linnakangas  wrote:
>
> Hi,
>
> While playing with COPY FROM refactorings in another thread, I noticed
> corner case where I think backslash escaping doesn't work correctly.
> Consider the following input:
>
> \么.foo

I've seen multibyte delimiters in the wild, so it's not as outlandish as it
seems. The fix is simple enough, so +1.

--
John Naylor
EDB: http://www.enterprisedb.com

get rid of tags in the docs?

2021-02-04 Thread John Naylor

Hi,

While looking at the proposed removal of the v2 protocol, I noticed that we
italicize some, but not all, instances of 'per se', 'pro forma', and 'ad
hoc'. I'd say these are widespread enough in formal registers of English
that they hardly need to be called out as foreign, so I propose removing
the tags for those words. Alternatively, we could just add tags to make
existing usage consistent, but I have little reason to think it will stay
that way. It's also impractical to go and search for other possible words
that should have been italicized but weren't.

The other case is 'voilà', found in rules.sgml. The case for italics here
is stronger, but looking at that file, I actually think a more
generic-sounding phrase here would be preferable.

Other opinions?

-- 
John Naylor
EDB: http://www.enterprisedb.com

Re: [POC] verifying UTF-8 using SIMD instructions

2021-02-04 Thread John Naylor

On Mon, Feb 1, 2021 at 2:01 PM Heikki Linnakangas  wrote:
>
> On 01/02/2021 19:32, John Naylor wrote:
> > It makes sense to start with the ascii subset of UTF-8 for a couple
> > reasons. First, ascii is very widespread in database content,
> > particularly in bulk loads. Second, ascii can be validated using the
> > simple SSE2 intrinsics that come with (I believe) any x64-64 chip, and
> > I'm guessing we can detect that at compile time and not mess with
> > runtime checks. The examples above using SSE for the general case are
> > much more complicated and involve SSE 4.2 or AVX.
>
> I wonder how using SSE compares with dealing with 64 or 32-bit words at
> a time, using regular instructions? That would be more portable.

I gave that a shot, and it's actually pretty good. According to this paper,
[1], 16 bytes was best and gives a good apples-to-apples comparison to SSE
registers, so I tried both 16 and 8 bytes.

> All supported encodings are ASCII subsets. Might be best to putt the
> ASCII-check into a static inline function and use it in all the verify
> functions. I presume it's only a few instructions, and these functions
> can be pretty performance sensitive.

I tried both the static inline function and also putting the whole
optimized utf-8 loop in a separate function to which the caller passes a
pointer to the appropriate pg_*_verifychar().

In the table below, "inline" refers to coding directly inside
pg_utf8_verifystr(). Both C and SSE are in the same patch, with an #ifdef.
I didn't bother splitting them out because for other encodings, we want one
of the other approaches above. For those, "C retail" refers to a static
inline function to code the contents of the inner loop, if I understood
your suggestion correctly. This needs more boilerplate in each function, so
I don't prefer this. "C func pointer" refers to the pointer approach I just
mentioned. That is the cleanest looking way to generalize it, so I only
tested that version with different strides -- 8- and 16-bytes

This is the same test I used earlier, which is the test in [2] but adding
an almost-pure multibyte Chinese text of about the same size.

x64-64 Linux gcc 8.4.0:

  build   | chinese | mixed | ascii
--+-+---+---
 master   |1480 |   848 |   428
 inline SSE   |1617 |   634 |63
 inline C |1481 |   843 |50
 C retail |1493 |   838 |49
 C func pointer   |1467 |   851 |49
 C func pointer 8 |1518 |   757 |56

x64-64 MacOS clang 10.0.0:

  build   | chinese | mixed | ascii
--+-+---+---
 master   |1086 |   760 |   374
 inline SSE   |1081 |   529 |70
 inline C |1093 |   649 |49
 C retail |1132 |   695 |   152
 C func pointer   |1085 |   609 |59
 C func pointer 8 |1099 |   571 |71

PowerPC-LE Linux gcc 4.8.5:

  build   | chinese | mixed | ascii
--+-+---+---
 master   |2961 |  1525 |   871
 inline SSE   |   (n/a) | (n/a) | (n/a)
 inline C |2911 |  1329 |80
 C retail |2838 |  1311 |   102
 C func pointer   |2828 |  1314 |80
 C func pointer 8 |3143 |  1249 |   133

Looking at the results, the main advantage of SSE here is it's more robust
for mixed inputs. If a 16-byte chunk is not ascii-only but contains a block
of ascii at the front, we can skip those with a single CPU instruction, but
in C, we have to verify the whole chunk using the slow path.

The "C func pointer approach" seems to win out over the "C retail" approach
(static inline function).

Using an 8-byte stride is slightly better for mixed inputs on all platforms
tested, but regresses on pure ascii and also seems to regress on pure
multibyte. The difference in the multibyte caes is small enough that it
could be random, but it happens on two platforms, so I'd say it's real. On
the other hand, pure multibyte is not as common as mixed text.

Overall, I think the function pointer approach with an 8-byte stride is the
best balance. If that's agreeable, next I plan to test with short inputs,
because I think we'll want a guard if-statement to only loop through the
fast path if the string is long enough to justify that.

> > I also gave a shot at doing full UTF-8 recognition using a DFA, but so
> > far that has made performance worse. If I ever have more success with
> > that, I'll add that in the mix.
>
> That's disappointing. Perhaps the SIMD algorithms have higher startup
> costs, so that you need longer inputs to benefit? In that case, it might
> make sense to check the length of the input and only use the SIMD
> algorithm if the input is long enough.

I changed topics a

Re: [POC] verifying UTF-8 using SIMD instructions

2021-02-07 Thread John Naylor

Here is a more polished version of the function pointer approach, now
adapted to all multibyte encodings. Using the not-yet-committed tests from
[1], I found a thinko bug that resulted in the test for nul bytes to not
only be wrong, but probably also elided by the compiler. Doing it correctly
is noticeably slower on pure ascii, but still several times faster than
before, so the conclusions haven't changed any. I'll run full measurements
later this week, but I'll share the patch now for review.

[1]
https://www.postgresql.org/message-id/11d39e63-b80a-5f8d-8043-fff04201f...@iki.fi

-- 
John Naylor
EDB: http://www.enterprisedb.com


v1-0001-Add-an-ASCII-fast-path-to-multibyte-encoding-veri.patch
Description: Binary data

Re: [POC] verifying UTF-8 using SIMD instructions

2021-02-08 Thread John Naylor

On Mon, Feb 8, 2021 at 6:17 AM Heikki Linnakangas  wrote:
> As a quick test, I hacked up pg_utf8_verifystr() to use Lemire's
> algorithm from the simdjson library [1], see attached patch. I
> microbenchmarked it using the the same test I used before [2].

I've been looking at various iterations of Lemire's utf8 code, and trying
it out was next on my list, so thanks for doing that!

> These results are with "gcc -O2" using "gcc (Debian 10.2.1-6) 10.2.1
> 20210110"
>
> unpatched master:
>
> postgres=# \i mbverifystr-speed.sql
> CREATE FUNCTION
>   mixed | ascii
> ---+---
> 728 |   393
> (1 row)
>
> v1-0001-Add-an-ASCII-fast-path-to-multibyte-encoding-veri.patch:
>
>   mixed | ascii
> ---+---
> 759 |98
> (1 row)

Hmm, the mixed case got worse -- I haven't seen that in any of my tests.

> simdjson-utf8-hack.patch:
>
>   mixed | ascii
> ---+---
>  53 |31
> (1 row)
>
> So clearly that algorithm is fast. Not sure if it has a high startup
> cost, or large code size, or other tradeoffs that we don't want.

The simdjson lib uses everything up through AVX512 depending on what
hardware is available. I seem to remember reading that high start-up cost
is more relevant to floating point than to integer ops, but I could be
wrong. Just the utf8 portion is surely tiny also.

> At
> least it depends on SIMD instructions, so it requires more code for the
> architecture-specific implementations and autoconf logic and all that.

One of his earlier demos [1] (in simdutf8check.h) had a version that used
mostly SSE2 with just three intrinsics from SSSE3. That's widely available
by now. He measured that at 0.7 cycles per byte, which is still good
compared to AVX2 0.45 cycles per byte [2].

Testing for three SSSE3 intrinsics in autoconf is pretty easy. I would
assume that if that check (and the corresponding runtime check) passes, we
can assume SSE2. That code has three licenses to choose from -- Apache 2,
Boost, and MIT. Something like that might be straightforward to start
from. I think the only obstacles to worry about are license and getting it
to fit into our codebase. Adding more than zero high-level comments with a
good description of how it works in detail is also a bit of a challenge.

> I also tested the fallback implementation from the simdjson library
> (included in the patch, if you uncomment it in simdjson-glue.c):
>
>   mixed | ascii
> ---+---
> 447 |46
> (1 row)
>
> I think we should at least try to adopt that. At a high level, it looks
> pretty similar your patch: you load the data 8 bytes at a time, check if
> there are all ASCII. If there are any non-ASCII chars, you check the
> bytes one by one, otherwise you load the next 8 bytes. Your patch should
> be able to achieve the same performance, if done right. I don't think
> the simdjson code forbids \0 bytes, so that will add a few cycles, but
> still.

Okay, I'll look into that.

> PS. Your patch as it stands isn't safe on systems with strict alignment,
> the string passed to the verify function isn't guaranteed to be 8 bytes
> aligned. Use memcpy to fetch the next 8-byte chunk to fix.

Will do.

[1] https://github.com/lemire/fastvalidate-utf-8/tree/master/include
[2]
https://lemire.me/blog/2018/10/19/validating-utf-8-bytes-using-only-0-45-cycles-per-byte-avx-edition/

--
John Naylor
EDB: http://www.enterprisedb.com

Re: Perform COPY FROM encoding conversions in larger chunks

2021-02-09 Thread John Naylor

On Sun, Feb 7, 2021 at 2:13 PM Heikki Linnakangas  wrote:
>
> On 02/02/2021 23:42, John Naylor wrote:
> >
> > In copyfromparse.c, this is now out of date:
> >
> >   * Read the next input line and stash it in line_buf, with conversion
to
> >   * server encoding.

This comment for CopyReadLine() is still there. Conversion already happened
by now, so I think this comment is outdated.

Other than that, I think this is ready for commit.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: WIP: BRIN multi-range indexes

2021-02-09 Thread John Naylor

On Wed, Feb 3, 2021 at 7:54 PM Tomas Vondra 
wrote:
>
> [v-20210203]

Hi Tomas,

I have some random comments from reading the patch, but haven't gone into
detail in the newer aspects. I'll do so in the near future.

The cfbot seems to crash on this patch during make check, but it doesn't
crash for me. I'm not even sure what date that cfbot status is from.

> BLOOM
> -

Looks good, but make sure you change the commit message -- it still refers
to sorted mode.

+ * not entirely clear how to distrubute the space between those columns.

s/distrubute/distribute/

> MINMAX-MULTI
> 

> c) 0007 - A hybrid approach, using a buffer that is multiple of the
> user-specified value, with some safety min/max limits. IMO this is what
> we should use, although perhaps with some tuning of the exact limits.

That seems like a good approach.

+#include "access/hash.h" /* XXX strange that it fails because of
BRIN_AM_OID without this */

I think you want #include "catalog/pg_am.h" here.

> Attached is a spreadsheet with benchmark results for each of those three
> approaches, on different data types (byval/byref), data set types, index
> parameters (pages/values per range) etc. I think 0007 is a reasonable
> compromise overall, with performance somewhere in betwen 0005 and 0006.
> Of course, there are cases where it's somewhat slow, e.g. for data types
> with expensive comparisons and data sets forcing frequent compactions,
> in which case it's ~10x slower compared to regular minmax (in most cases
> it's ~1.5x). Compared to btree, it's usually much faster - ~2-3x as fast
> (except for some extreme cases, of course).
>
>
> As for the opclasses for indexes without "natural" distance function,
> implemented in 0008, I think we should drop it. In theory it works, but

Sounds reasonable.

> The other thing we were considering was using the new multi-minmax
> opclasses as default ones, replacing the existing minmax ones. IMHO we
> shouldn't do that either. For existing minmax indexes that's useless
> (the opclass seems to be working, otherwise the index would be dropped).
> But even for new indexes I'm not sure it's the right thing, so I don't
> plan to change this.

Okay.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: [POC] verifying UTF-8 using SIMD instructions

2021-02-09 Thread John Naylor

On Mon, Feb 8, 2021 at 6:17 AM Heikki Linnakangas  wrote:
>
> I also tested the fallback implementation from the simdjson library
> (included in the patch, if you uncomment it in simdjson-glue.c):
>
>   mixed | ascii
> ---+---
> 447 |46
> (1 row)
>
> I think we should at least try to adopt that. At a high level, it looks
> pretty similar your patch: you load the data 8 bytes at a time, check if
> there are all ASCII. If there are any non-ASCII chars, you check the
> bytes one by one, otherwise you load the next 8 bytes. Your patch should
> be able to achieve the same performance, if done right. I don't think
> the simdjson code forbids \0 bytes, so that will add a few cycles, but
> still.

That fallback is very similar to my "inline C" case upthread, and they both
actually check 16 bytes at a time (the comment is wrong in the patch you
shared). I can work back and show how the performance changes with each
difference (just MacOS, clang 10 here):

master

 mixed | ascii
---+---
   757 |   366

v1, but using memcpy()

 mixed | ascii
---+---
   601 |   129

remove zero-byte check:

 mixed | ascii
---+---
   588 |93

inline ascii fastpath into pg_utf8_verifystr()

 mixed | ascii
---+---
   595 |71

use 16-byte stride

 mixed | ascii
---+---
   652 |49

With this cpu/compiler, v1 is fastest on the mixed input all else being
equal.

Maybe there's a smarter way to check for zeros in C. Or maybe be more
careful about cache -- running memchr() on the whole input first might not
be the best thing to do.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: [POC] verifying UTF-8 using SIMD instructions

2021-02-09 Thread John Naylor

I wrote:
>
> On Mon, Feb 8, 2021 at 6:17 AM Heikki Linnakangas  wrote:
> One of his earlier demos [1] (in simdutf8check.h) had a version that used
mostly SSE2 with just three intrinsics from SSSE3. That's widely available
by now. He measured that at 0.7 cycles per byte, which is still good
compared to AVX2 0.45 cycles per byte [2].
>
> Testing for three SSSE3 intrinsics in autoconf is pretty easy. I would
assume that if that check (and the corresponding runtime check) passes, we
can assume SSE2. That code has three licenses to choose from -- Apache 2,
Boost, and MIT. Something like that might be straightforward to start from.
I think the only obstacles to worry about are license and getting it to fit
into our codebase. Adding more than zero high-level comments with a good
description of how it works in detail is also a bit of a challenge.

I double checked, and it's actually two SSSE3 intrinsics and one SSE4.1,
but the 4.1 one can be emulated with a few SSE2 intrinsics. But we could
probably fold all three into the SSE4.2 CRC check and have a single symbol
to save on boilerplate.

I hacked that demo [1] into wchar.c (very ugly patch attached), and got the
following:

master

 mixed | ascii
---+---
   757 |   366

Lemire demo:

 mixed | ascii
---+---
   172 |   168

This one lacks an ascii fast path, but the AVX2 version in the same file
has one that could probably be easily adapted. With that, I think this
would be worth adapting to our codebase and license. Thoughts?

The advantage of this demo is that it's not buried in a mountain of modern
C++.

Simdjson can use AVX -- do you happen to know which target it got compiled
to? AVX vectors are 256-bits wide and that requires OS support. The OS's we
care most about were updated 8-12 years ago, but that would still be
something to check, in addition to more configure checks.

[1] https://github.com/lemire/fastvalidate-utf-8/tree/master/include

--
John Naylor
EDB: http://www.enterprisedb.com

utf-sse42-demo.patch
Description: Binary data

Re: [POC] verifying UTF-8 using SIMD instructions

2021-02-09 Thread John Naylor

On Tue, Feb 9, 2021 at 4:22 PM Heikki Linnakangas  wrote:
>
> On 09/02/2021 22:08, John Naylor wrote:
> > Maybe there's a smarter way to check for zeros in C. Or maybe be more
> > careful about cache -- running memchr() on the whole input first might
> > not be the best thing to do.
>
> The usual trick is the haszero() macro here:
> https://graphics.stanford.edu/~seander/bithacks.html#ZeroInWord. That's
> how memchr() is typically implemented, too.

Thanks for that. Checking with that macro each loop iteration gives a small
boost:

v1, but using memcpy()

 mixed | ascii
---+---
   601 |   129

with haszero()

 mixed | ascii
---+---
   583 |   105

remove zero-byte check:

 mixed | ascii
---+---
   588 |93

--
John Naylor
EDB: http://www.enterprisedb.com

Re: cutting down the TODO list thread

2021-12-08 Thread John Naylor

It's been a while, but here are a few more suggested
removals/edits/additions to the TODO list. Any objections or new
information, let me know:

- Auto-fill the free space map by scanning the buffer cache or by
checking pages written by the background writer
http://archives.postgresql.org/pgsql-hackers/2006-02/msg01125.php
https://www.postgresql.org/message-id/200603011716.16984.pete...@gmx.net

Both these threads are from 2006, so have nothing to do with the current FSM.

- Allow concurrent inserts to use recently created pages rather than
creating new ones
http://archives.postgresql.org/pgsql-hackers/2010-05/msg00853.php

Skimming the first few messages, I believe this has been covered by
commit 719c84c1b? (Extend relations multiple blocks at a time to
improve scalability.)

- Allow VACUUM FULL and CLUSTER to update the visibility map

This topic has a current CF entry which seems to have stalled, so that
newer thread would be better to list here than the one from 2013.

- Bias FSM towards returning free space near the beginning of the heap
file, in hopes that empty pages at the end can be truncated by VACUUM
http://archives.postgresql.org/pgsql-hackers/2009-09/msg01124.php
https://www.postgresql.org/message-id/20150424190403.gp4...@alvh.no-ip.org

I'm not sure what to think of this, but independently of that, the
second thread is actually talking about bringing back something like
the pre-9.0 vacuum full, so maybe it should be its own entry?

- Consider a more compact data representation for dead tuple locations
within VACUUM
http://archives.postgresql.org/pgsql-patches/2007-05/msg00143.php

Great, but let's link to this more recent thread instead:
https://www.postgresql.org/message-id/flat/CAD21AoAfOZvmfR0j8VmZorZjL7RhTiQdVttNuC4W-Shdc2a-AA%40mail.gmail.com

- Improve autovacuum tuning
http://www.postgresql.org/message-id/5078ad6b.8060...@agliodbs.com
http://www.postgresql.org/message-id/20130124215715.ge4...@alvh.no-ip.org

I'm kind of on the fence about these. The title is way too broad, and
I doubt we are going to forget to keep improving this area.

It seems the first thread is really about auto-analyze thresholds, so
maybe it should be in a separate entry if we want to do anything
mentioned in the thread?

The second thread is really about autovacuum launcher scheduling.
Probably still relevant, but the thread is very long and doesn't seem
terribly helpful to someone trying to get up to speed on the issues
that are still relevant. I don't see any more recent discussion,
either. Thoughts?

-- 
John Naylor
EDB: http://www.enterprisedb.com

Re: speed up verifying UTF-8

2021-12-08 Thread John Naylor

It occurred to me that the DFA + ascii quick check approach could also
be adapted to speed up some cases where we currently walk a string
counting characters, like this snippet in
text_position_get_match_pos():

/* Convert the byte position to char position. */
while (state->refpoint < state->last_match)
{
state->refpoint += pg_mblen(state->refpoint);
state->refpos++;
}

This coding changed in 9556aa01c69 (Use single-byte
Boyer-Moore-Horspool search even with multibyte encodings), in which I
found the majority of cases were faster, but some were slower. It
would be nice to regain the speed lost and do even better.

In the case of UTF-8, we could just run it through the DFA,
incrementing a count of the states found. The number of END states
should be the number of characters. The ascii quick check would still
be applicable as well. I think all that is needed is to export some
symbols and add the counting function. That wouldn't materially affect
the current patch for input verification, and would be separate, but
it would be nice to get the symbol visibility right up front. I've set
this to waiting on author while I experiment with that.

--
John Naylor
EDB: http://www.enterprisedb.com

do only critical work during single-user vacuum?

2021-12-09 Thread John Naylor

When a user must shut down and restart in single-user mode to run
vacuum on an entire database, that does a lot of work that's
unnecessary for getting the system online again, even without
index_cleanup. We had a recent case where a single-user vacuum took
around 3 days to complete.

Now that we have a concept of a fail-safe vacuum, maybe it would be
beneficial to skip a vacuum in single-user mode if the fail-safe
criteria were not met at the beginning of vacuuming a relation. This
is not without risk, of course, but it should be much faster than
today and once up and running the admin would have a chance to get a
handle on things. Thoughts?

-- 
John Naylor
EDB: http://www.enterprisedb.com

Re: do only critical work during single-user vacuum?

2021-12-09 Thread John Naylor

On Thu, Dec 9, 2021 at 5:13 PM Peter Geoghegan  wrote:
> Oh, I think I misunderstood. Your concern is for the case where the
> DBA runs a simple "VACUUM" in single-user mode; you want to skip over
> tables that don't really need to advance relfrozenxid, automatically.

Right.

> I can see an argument for something like that, but I think that it
> should be a variant of VACUUM. Or maybe it could be addressed with a
> better user interface;

On Thu, Dec 9, 2021 at 6:08 PM Andres Freund  wrote:
> What if the user tried to reclaim space by vacuuming (via truncation)? Or is
> working around some corruption or such?  I think this is too much magic.
>
> That said, having a VACUUM "selector" that selects the oldest tables could be
> quite useful. And address this usecase both for single-user and normal
> operation.

All good points.

[Peter again]
> single-user mode should prompt the user about
> what exact VACUUM command they ought to run to get things going.

The current message is particularly bad in its vagueness because some
users immediately reach for VACUUM FULL, which quite logically seems
like the most complete thing to do.

-- 
John Naylor
EDB: http://www.enterprisedb.com

Re: speed up verifying UTF-8

2021-12-13 Thread John Naylor

; > + * In a shift-based DFA, the input byte is an index into array of integers
> > + * whose bit pattern encodes the state transitions. To compute the current
> > + * state, we simply right-shift the integer by the current state and apply 
> > a
> > + * mask. In this scheme, the address of the transition only depends on the
> > + * input byte, so there is better pipelining.
>
> Should be "To compute the *next* state, ...", I think.

Fixed.

> The way the state transition table works is pretty inscrutable. That's
> understandable, because the values were found by an SMT solver, so I'm
> not sure if anything can be done about it.

Do you mean in general, or just the state values?

Like any state machine, the code is simple, and the complexity is
hidden in the data. Hopefully the first link I included in the comment
is helpful.

The SMT solver was only needed to allow 32-bit (instead of 64-bit)
entries in the transition table, so it's not strictly necessary. A
lookup table that fits in 1kB is nice from a cache perspective,
however.

With 64-bit, the state values are less weird-looking but they're still
just arbitrary numbers. As long as ERR = 0 and the largest is at most
9, it doesn't matter what they are, so I'm not sure it's much less
mysterious. You can see the difference between 32-bit and 64-bit in
[1].

--
In addition to Heikki's. review points, I've made a couple small
additional changes from v24: I rewrote this part, so we don't need
these macros anymore:

-   if (!IS_HIGHBIT_SET(*s) ||
-   IS_UTF8_2B_LEAD(*s) ||
-   IS_UTF8_3B_LEAD(*s) ||
-   IS_UTF8_4B_LEAD(*s))
+   if (!IS_HIGHBIT_SET(*s) || pg_utf_mblen(s) > 1)

And I moved is_valid_ascii() to pg_wchar.h so it can be used
elsewhere. I'm not sure there's a better place to put it. I tried
using this for text_position(), for which I'll start a new thread.

[1] 
https://www.postgresql.org/message-id/attachment/125672/v22-addendum-32-bit-transitions.txt



--
John Naylor
EDB: http://www.enterprisedb.com


--
John Naylor
EDB: http://www.enterprisedb.com
From 0c20e37ac47a5759e6ec674bb4184d835c562af8 Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Tue, 19 Oct 2021 16:43:14 -0400
Subject: [PATCH v25] Add fast path for validating UTF-8 text

Our previous validator used a traditional algorithm that performed
comparison and branching one byte at a time. It's useful in that
we always know exactly how many bytes we have validated, but that
precision comes at a cost. Input validation can show up prominently
in profiles of COPY FROM, and future improvements to COPY FROM such
as parallelism or faster line parsing will put more pressure on input
validation. Hence, add fast paths for both ASCII and multibyte UTF-8:

Use bitwise operations to check 16 bytes at a time for ASCII. If
that fails, use a "shift-based" DFA on those bytes to handle the
general case, including multibyte. These paths are relatively free
of branches and thus robust against all kinds of byte patterns. With
these algorithms, UTF-8 validation is several times faster, depending
on platform and the input byte distribution.

The previous coding in pg_utf8_verifystr() is retained for short
strings and for when the fast path returns an error.

Review, performance testing, and additional hacking by: Heikki
Linakangas, Vladimir Sitnikov, Amit Khandekar, Thomas Munro, and
Greg Stark

Discussion:
https://www.postgresql.org/message-id/CAFBsxsEV_SzH%2BOLyCiyon%3DiwggSyMh_eF6A3LU2tiWf3Cy2ZQg%40mail.gmail.com
---
 src/common/wchar.c   | 215 +++
 src/include/mb/pg_wchar.h|  53 ++
 src/test/regress/expected/conversion.out | 169 ++
 src/test/regress/sql/conversion.sql  | 133 ++
 4 files changed, 570 insertions(+)

diff --git a/src/common/wchar.c b/src/common/wchar.c
index a6bffd0642..be931c5e92 100644
--- a/src/common/wchar.c
+++ b/src/common/wchar.c
@@ -1750,11 +1750,226 @@ pg_utf8_verifychar(const unsigned char *s, int len)
 	return l;
 }
 
+/*
+ * The fast path of the UTF-8 verifier uses a deterministic finite automaton
+ * (DFA) for multibyte characters. In a traditional table-driven DFA, the
+ * input byte and current state are used to compute an index into an array of
+ * state transitions. Since the address of the next transition is dependent
+ * on this computation, there is latency in executing the load instruction,
+ * and the CPU is not kept busy.
+ *
+ * Instead, we use a "shift-based" DFA as described by Per Vognsen:
+ *
+ * https://gist.github.com/pervognsen/218ea17743e1442e59bb60d29b1aa725
+ *
+ * In a shift-based DFA, the input byte is an index into array of integers
+ * whose bit pattern encodes the state transitions. To compute the n

speed up text_position() for utf-8

2021-12-13 Thread John Naylor

Hi,

Commit 9556aa01c69 (Use single-byte Boyer-Moore-Horspool search even
with multibyte encodings), was a speed improvement for the majority of
cases, but when the match was toward the end of the string, the
slowdown in text_position_get_match_pos() was noticeable. It was found
that there was a lot of overhead in pg_mblen(). [1]

The attached exploratory PoC improves this for utf-8. It applies on
top v25 of my utf-8 verification patch in [2], since one approach
relies on the DFA from it. The other three approaches are:
- a version of pg_utf_mblen() that uses a lookup table [3]
- an inlined copy of pg_utf_mblen()
- an ascii fast path with a fallback to the inlined copy of pg_utf_mblen()

The test is attached and the test function is part of the patch. It's
based on the test used in the commit above. The test searches for a
string that's at the end of a ~1 million byte string. This is on gcc
11 with 2-3 runs to ensure repeatability, but I didn't bother with
statistics because the differences are pretty big:

  patch  | no match | ascii | mulitbyte
-+--+---+---
 PG11| 1120 |  1100 |   900
 master  |  381 |  2350 |  1900
 DFA |  386 |  1640 |  1640
 branchless utf mblen|  387 |  4100 |  2600
 inline pg_utf_mblen()   |  380 |  1080 |   920
 inline pg_utf_mblen() + ascii fast path |  382 |   470 |   918

Neither of the branchless approaches worked well. The DFA can't work
as well here as in verification because it must do additional work.
Inlining pg_utf_mblen() restores worst-case performance to PG11
levels. The ascii fast path is a nice improvement on top of that. A
similar approach could work for pg_mbstrlen() as well, but I haven't
looked into that yet. There are other callers of pg_mblen(), but I
haven't looked into whether they are performance-sensitive. A more
general application would be preferable to a targeted one.

[1] 
https://www.postgresql.org/message-id/b65df3d8-1f59-3bd7-ebbe-68b81d5a76a4%40iki.fi
[2] 
https://www.postgresql.org/message-id/CAFBsxsHG%3Dg6W8Mie%2B_NO8dV6O0pO2stxrnS%3Dme5ZmGqk--fd5g%40mail.gmail.com
[3] https://github.com/skeeto/branchless-utf8/blob/master/utf8.h

--
John Naylor
EDB: http://www.enterprisedb.com
 src/backend/utils/adt/varlena.c | 112 
 src/common/wchar.c  |  90 ++--
 src/include/mb/pg_wchar.h   |  53 ---
 src/test/regress/regress.c  |  18 +++
 4 files changed, 133 insertions(+), 140 deletions(-)

diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index bd3091bbfb..cc93818007 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -26,6 +26,7 @@
 #include "common/unicode_norm.h"
 #include "lib/hyperloglog.h"
 #include "libpq/pqformat.h"
+#include "mb/pg_utf8.h"
 #include "miscadmin.h"
 #include "nodes/execnodes.h"
 #include "parser/scansup.h"
@@ -1468,6 +1469,116 @@ text_position_get_match_pos(TextPositionState *state)
 {
 	if (!state->is_multibyte)
 		return state->last_match - state->str1 + 1;
+	else if (GetDatabaseEncoding() == PG_UTF8)
+	{
+#if 0	/* dfa */
+
+		int			utf8_state_count[UTF8_LAST_STATE + 1] = {0};
+		uint32		dfa_state = BGN;
+
+		/*
+		 * See pg_utf8.h for an explanation of the state machine. Unlike in
+		 * pg_wchar.c, we must calculate the exact state so we can use it as
+		 * an array index.
+		 */
+		while (state->refpoint < state->last_match)
+		{
+			const unsigned char *s = (const unsigned char *) state->refpoint++;
+
+			dfa_state = (Utf8Transition[*s] >> dfa_state) & 31;
+			utf8_state_count[dfa_state]++;
+		}
+		Assert(state->refpoint == state->last_match);
+		state->refpos += utf8_state_count[END];
+		return state->refpos + 1;
+
+#endif
+#if 0	/* like pg_utf_mblen(), but different algorithm: */
+
+		static const char lengths[] = {
+			1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+			1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 4, 1
+		};
+
+		while (state->refpoint < state->last_match)
+		{
+			const unsigned char *s = (const unsigned char *) state->refpoint;
+
+			state->refpoint += lengths[s[0] >> 3];
+			state->refpos++;
+		}
+
+		Assert(state->refpoint == state->last_match);
+		return state->refpos + 1;
+#endif
+#if 0	/* inline pg_utf_mblen()*/
+
+		int len = 0;
+
+		while (state->refpoint < state->last_match)
+		{
+			const unsigned char *s = (const unsigned char *) state->refpoint;
+
+			if ((*s & 0x80) == 0)
+len = 1;
+			else if ((*s & 0xe0) == 0xc0)
+len = 2;
+			else if ((*s & 0xf0

Re: cutting down the TODO list thread

2021-12-14 Thread John Naylor

On Wed, Dec 8, 2021 at 1:40 PM John Naylor  wrote:
>
> It's been a while, but here are a few more suggested
> removals/edits/additions to the TODO list. Any objections or new
> information, let me know:
>
> - Auto-fill the free space map by scanning the buffer cache or by
> checking pages written by the background writer
> http://archives.postgresql.org/pgsql-hackers/2006-02/msg01125.php
> https://www.postgresql.org/message-id/200603011716.16984.pete...@gmx.net
>
> Both these threads are from 2006, so have nothing to do with the current FSM.

Moved to the Not Worth Doing list.

> - Allow concurrent inserts to use recently created pages rather than
> creating new ones
> http://archives.postgresql.org/pgsql-hackers/2010-05/msg00853.php
>
> Skimming the first few messages, I believe this has been covered by
> commit 719c84c1b? (Extend relations multiple blocks at a time to
> improve scalability.)

Removed.

> - Allow VACUUM FULL and CLUSTER to update the visibility map
>
> This topic has a current CF entry which seems to have stalled, so that
> newer thread would be better to list here than the one from 2013.

Added.

> - Bias FSM towards returning free space near the beginning of the heap
> file, in hopes that empty pages at the end can be truncated by VACUUM
> http://archives.postgresql.org/pgsql-hackers/2009-09/msg01124.php
> https://www.postgresql.org/message-id/20150424190403.gp4...@alvh.no-ip.org
>
> I'm not sure what to think of this, but independently of that, the
> second thread is actually talking about bringing back something like
> the pre-9.0 vacuum full, so maybe it should be its own entry?

Done.

> - Consider a more compact data representation for dead tuple locations
> within VACUUM
> http://archives.postgresql.org/pgsql-patches/2007-05/msg00143.php
>
> Great, but let's link to this more recent thread instead:
> https://www.postgresql.org/message-id/flat/CAD21AoAfOZvmfR0j8VmZorZjL7RhTiQdVttNuC4W-Shdc2a-AA%40mail.gmail.com

Done.

> > The second thread is really about autovacuum launcher scheduling.
> > Probably still relevant, but the thread is very long and doesn't seem
> > terribly helpful to someone trying to get up to speed on the issues
> > that are still relevant. I don't see any more recent discussion,
> > either. Thoughts?

Split into two entries.

On Wed, Dec 8, 2021 at 8:12 PM Masahiko Sawada  wrote:
> There is another discussion on autovacuum scheduling in 2018 here:
>
> https://www.postgresql.org/message-id/0A3221C70F24FB45833433255569204D1F8A4DC6%40G01JPEXMBYT05
>
> Some algorithms were proposed there and I implemented a PoC patch:
>
> https://www.postgresql.org/message-id/CAD21AoBUaSRBypA6pd9ZD%3DU-2TJCHtbyZRmrS91Nq0eVQ0B3BA%40mail.gmail.com

Added, thanks!

--
John Naylor
EDB: http://www.enterprisedb.com

Re: speed up text_position() for utf-8

2021-12-15 Thread John Naylor

I wrote:

> The test is attached and the test function is part of the patch. It's
> based on the test used in the commit above. The test searches for a
> string that's at the end of a ~1 million byte string. This is on gcc
> 11 with 2-3 runs to ensure repeatability, but I didn't bother with
> statistics because the differences are pretty big:
>
>   patch  | no match | ascii | mulitbyte
> -+--+---+---
>  PG11| 1120 |  1100 |   900
>  master  |  381 |  2350 |  1900
>  DFA |  386 |  1640 |  1640
>  branchless utf mblen|  387 |  4100 |  2600
>  inline pg_utf_mblen()   |  380 |  1080 |   920
>  inline pg_utf_mblen() + ascii fast path |  382 |   470 |   918

I failed to mention that the above numbers are milliseconds, so
smaller is better.

-- 
John Naylor
EDB: http://www.enterprisedb.com

Re: speed up verifying UTF-8

2021-12-17 Thread John Naylor

I plan to push v25 early next week, unless there are further comments.

-- 
John Naylor
EDB: http://www.enterprisedb.com

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1520 matches

Mail list logo