small cleanup in unicode_norm.c
We've had get_canonical_class() for a while as a backend-only function. There is some ad-hoc code elsewhere that implements the same logic in a couple places, so it makes sense for all sites to use this function instead, as in the attached. -- John Naylor EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company use-canonical-class-func.patch Description: Binary data
Re: small cleanup in unicode_norm.c
On Tue, Dec 8, 2020 at 5:45 AM Michael Paquier wrote: > > On Mon, Dec 07, 2020 at 03:24:56PM -0400, John Naylor wrote: > > We've had get_canonical_class() for a while as a backend-only function. > > There is some ad-hoc code elsewhere that implements the same logic in a > > couple places, so it makes sense for all sites to use this function > > instead, as in the attached. > > Thanks John for caring about that. This is a nice simplification, and > it looks fine to me. > > -static uint8 > -get_canonical_class(pg_wchar ch) > -{ > Two nits here. I would use "code" for the name of the argument for > consistency with get_code_entry(), and add a description at the top of > this helper routine (say a simple "get the combining class of given > code"). Anything else you can think of? Thanks for taking a look. Sounds good, I've made those adjustments and wrote a commit message. I took another look and didn't see anything else to address. -- John Naylor EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company v2-0001-Simplify-ad-hoc-code-for-getting-a-Unicode-codepo.patch Description: Binary data
Re: cutting down the TODO list thread
Hi, Continuing with TODO list maintenance, first a couple things to clean up: - Allow ALTER INDEX ... RENAME concurrently This was in the wrong section, but it's irrelevant: The lock level was lowered in commit 1b5d797cd4f, so I went ahead and removed this already. - Add CORRESPONDING BY to UNION/INTERSECT/EXCEPT The link titled "how not to write this patch" points to a web archive of the author's description of how he implemented the rejected patch. That doesn't seem useful, since it was...rejected. I propose to replace that with the -hackers thread, where there is discussion of the design problem: https://www.postgresql.org/message-id/flat/CAJZSWkWN3YwQ01C3%2Bcq0%2BeyZ1DmK%3D69_6vryrsVGMC%3D%2BfWrSZA%40mail.gmail.com Now, for the proposed items to move to "Not Worth Doing". As before, let me know of any objections. I plan to move these early next week: *Views and Rules - Allow VIEW/RULE recompilation when the underlying tables change The entry itself says "This is both difficult and controversial." and the linked threads confirm that. - Make it possible to use RETURNING together with conditional DO INSTEAD rules, such as for partitioning setups This was from before we got native partitioning, so the stated rationale is outdated. *SQL Commands (this is a huge section, for now just doing the miscellany at the top before the various subsections) - Add a GUC variable to warn about non-standard SQL usage in queries I don't see the reason for this, and sounds difficult anyway. - Add NOVICE output level for helpful messages This would only be useful if turned on, so is going to be least used where it might help the most. It also sounds like a lot of slow menial work to implement. - Allow DISTINCT to work in multiple-argument aggregate calls Tom suggested this in 2006 for the sake of orthogonality. Given the amount of time passed, it seems not very important. - Allow DELETE and UPDATE to be used with LIMIT and ORDER BY Some use cases mentioned, but nearly all have some kind of workaround already. -- John Naylor EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: cutting down the TODO list thread
On Thu, Dec 10, 2020 at 3:29 PM John Naylor wrote: > > *Views and Rules > *SQL Commands Hearing no objections, the items mentioned have been moved over. -- John Naylor EDB: http://www.enterprisedb.com
Re: Perform COPY FROM encoding conversions in larger chunks
On Wed, Dec 16, 2020 at 8:18 AM Heikki Linnakangas wrote: > > Currently, COPY FROM parses the input one line at a time. Each line is > converted to the database encoding separately, or if the file encoding > matches the database encoding, we just check that the input is valid for > the encoding. It would be more efficient to do the encoding > conversion/verification in larger chunks. At least potentially; the > current conversion/verification implementations work one byte a time so > it doesn't matter too much, but there are faster algorithms out there > that use SIMD instructions or lookup tables that benefit from larger inputs. Hi Heikki, This is great news. I've seen examples of such algorithms and that'd be nice to have. I haven't studied the patch in detail, but it looks fine on the whole. In 0004, it seems you have some doubts about upgrade compatibility. Is that because user-defined conversions would no longer have the right signature? -- John Naylor EDB: http://www.enterprisedb.com
Re: Perform COPY FROM encoding conversions in larger chunks
On Wed, Dec 23, 2020 at 3:41 AM Heikki Linnakangas wrote: > > I'm not sure it's worth the trouble, though. Custom conversions are very > rare. And I don't think any other object can depend on a conversion, so > you can always drop the conversion before upgrade, and re-create it with > the new function signature afterwards. A note in the release notes and a > check in pg_upgrade, with instructions to drop and recreate the > conversion, are probably enough. > That was my thought as well. -- John Naylor EDB: http://www.enterprisedb.com
Re: [POC] verifying UTF-8 using SIMD instructions
On Mon, Feb 8, 2021 at 6:17 AM Heikki Linnakangas wrote: > > I also tested the fallback implementation from the simdjson library > (included in the patch, if you uncomment it in simdjson-glue.c): > > mixed | ascii > ---+--- > 447 |46 > (1 row) > > I think we should at least try to adopt that. At a high level, it looks > pretty similar your patch: you load the data 8 bytes at a time, check if > there are all ASCII. If there are any non-ASCII chars, you check the > bytes one by one, otherwise you load the next 8 bytes. Your patch should > be able to achieve the same performance, if done right. I don't think > the simdjson code forbids \0 bytes, so that will add a few cycles, but > still. Attached is a patch that does roughly what simdjson fallback did, except I use straight tests on the bytes and only calculate code points in assertion builds. In the course of doing this, I found that my earlier concerns about putting the ascii check in a static inline function were due to my suboptimal loop implementation. I had assumed that if the chunked ascii check failed, it had to check all those bytes one at a time. As it turns out, that's a waste of the branch predictor. In the v2 patch, we do the chunked ascii check every time we loop. With that, I can also confirm the claim in the Lemire paper that it's better to do the check on 16-byte chunks: (MacOS, Clang 10) master: chinese | mixed | ascii -+---+--- 1081 | 761 | 366 v2 patch, with 16-byte stride: chinese | mixed | ascii -+---+--- 806 | 474 |83 patch but with 8-byte stride: chinese | mixed | ascii -+---+--- 792 | 490 | 105 I also included the fast path in all other multibyte encodings, and that is also pretty good performance-wise. It regresses from master on pure multibyte input, but that case is still faster than PG13, which I simulated by reverting 6c5576075b0f9 and b80e10638e3: ~PG13: chinese | mixed | ascii -+---+--- 1565 | 848 | 365 ascii fast-path plus pg_*_verifychar(): chinese | mixed | ascii -+---+--- 1279 | 656 |94 v2 has a rough start to having multiple implementations in src/backend/port. Next steps are: 1. Add more tests for utf-8 coverage (in addition to the ones to be added by the noError argument patch) 2. Add SSE4 validator -- it turns out the demo I referred to earlier doesn't match the algorithm in the paper. I plan to only copy the lookup tables from simdjson verbatim, but the code will basically be written from scratch, using simdjson as a hint. 3. Adjust configure.ac -- John Naylor EDB: http://www.enterprisedb.com v2-add-portability-stub-and-new-fallback.patch Description: Binary data
Re: Preventing free space from being reused
On Fri, Feb 12, 2021 at 6:21 PM Noah Bergbauer wrote: > > A btree index on the same column is 700x the size of BRIN, or 10% of relation itself. It does not perform significantly better than BRIN. The issue here is twofold: not only does slotting these tuples into older pages significantly reduce the effectiveness of BRIN, it also causes fragmentation on disk. Ultimately, this is why CLUSTER exists. One way to look at this situation is that my data is inserted exactly in index order, but Postgres keeps un-clustering it for reasons that are valid in general (don't waste disk space) but don't apply at all in this case (the file system uses compression, no space is wasted). > > Any alternative ideas would of course be much appreciated! But at the moment HEAP_INSERT_SKIP_FSM seems like the most practical solution to me. I would suggest to take a look at the BRIN opclass multi-minmax currently in development. It's designed to address that exact situation, and more review would be welcome: https://commitfest.postgresql.org/32/2523/ -- John Naylor EDB: http://www.enterprisedb.com
Re: [POC] verifying UTF-8 using SIMD instructions
dows support, but it's not developed. - I had to add a large number of casts to get rid of warnings in the magic constants macros. That needs some polish. I also attached a C file that visually demonstrates every step of the algorithm following the example found in Table 9 in the paper. That contains the skeleton coding I started with and got abandoned early, so it might differ from the actual patch. -- John Naylor EDB: http://www.enterprisedb.com v3-SSE4-with-autoconf-support.patch Description: Binary data test-utf8.c Description: Binary data
Re: [POC] verifying UTF-8 using SIMD instructions
I wrote: > [v3] > - It's not smart enough to stop at the last valid character boundary -- it's either all-valid or it must start over with the fallback. That will have to change in order to work with the proposed noError conversions. It shouldn't be very hard, but needs thought as to the clearest and safest way to code it. In v4, it should be able to return an accurate count of valid bytes even when the end crosses a character boundary. > - This is my first time hacking autoconf, and it still seems slightly broken, yet functional on my machine at least. It was actually completely broken if you tried to pass the special flags to configure. I redesigned this part and it seems to work now. -- John Naylor EDB: http://www.enterprisedb.com v4-SSE4-with-autoconf-support.patch Description: Binary data
Re: [POC] verifying UTF-8 using SIMD instructions
On Mon, Feb 15, 2021 at 9:32 PM John Naylor wrote: > > On Mon, Feb 15, 2021 at 9:18 AM Heikki Linnakangas wrote: > > > > I'm guessing that's because the unaligned access in check_ascii() is > > expensive on this platform. > Some possible remedies: > 3) #ifdef out the ascii check for 32-bit platforms. > 4) Same as the non-UTF8 case -- only check for ascii 8 bytes at a time. I'll probably try this first. I've attached a couple patches to try on top of v4; maybe they'll help the Arm32 regression. 01 reduces the stride to 8 bytes, and 02 applies on top of v1 to disable the fallback fast path entirely on 32-bit platforms. A bit of a heavy hammer, but it'll confirm (or not) your theory about unaligned loads. Also, I've included patches to explain more fully how I modeled non-UTF-8 performance while still using the UTF-8 tests. I think it was a useful thing to do, and I have a theory that might predict how a non-UTF8 encoding will perform with the fast path. 03A and 03B are independent of each other and conflict, but both apply on top of v4 (don't need 02). Both replace the v4 fallback with the ascii fastpath + pg_utf8_verifychar() in the loop, similar to utf-8 on master. 03A has a local static copy of pg_utf8_islegal(), and 03B uses the existing global function. (On x86, you can disable SSE4 by passing USE_FALLBACK_UTF8=1 to configure.) While Clang 10 regressed for me on pure multibyte in a similar test upthread, on Linux gcc 8.4 there isn't a regression at all. IIRC, gcc wasn't as good as Clang when the API changed a few weeks ago, so its regression from v4 is still faster than master. Clang only regressed with my changes because it somehow handled master much better to begin with. x86-64 Linux gcc 8.4 master chinese | mixed | ascii -+---+--- 1453 | 857 | 428 v4 (fallback verifier written as a single function) chinese | mixed | ascii -+---+--- 815 | 514 |82 v4 plus addendum 03A -- emulate non-utf-8 using a copy of pg_utf8_is_legal() as a static function chinese | mixed | ascii -+---+--- 1115 | 547 |87 v4 plus addendum 03B -- emulate non-utf-8 using pg_utf8_is_legal() as a global function chinese | mixed | ascii -+---+--- 1279 | 604 |82 (I also tried the same on ppc64le Linux, gcc 4.8.5 and while not great, it never got worse than master either on pure multibyte.) This is supposed to model the performance of a non-utf8 encoding, where we don't have a bespoke function written from scratch. Here's my theory: If an encoding has pg_*_mblen(), a global function, inside pg_*_verifychar(), it seems it won't benefit as much from an ascii fast path as one whose pg_*_verifychar() has no function calls. I'm not sure whether a compiler can inline a global function's body into call sites in the unit where it's defined. (I haven't looked at the assembly.) But recall that you didn't commit 0002 from the earlier encoding change, because it wasn't performing. I looked at that patch again, and while it inlined the pg_utf8_verifychar() call, it still called the global function pg_utf8_islegal(). If the above is anything to go by, on gcc at least, I don't think we need to worry about a regression when adding an ascii fast path to non-utf-8 multibyte encodings. Regarding SSE, I've added an ascii fast path in my local branch, but it's not going to be as big a difference because 1) the check is more expensive in terms of branches than the C case, and 2) because the general case is so fast already, it's hard to improve upon. I just need to do some testing and cleanup on the whole thing, and that'll be ready to share. -- John Naylor EDB: http://www.enterprisedb.com diff --git a/src/include/port/pg_utf8.h b/src/include/port/pg_utf8.h index dac2afc130..a0c94dd4f3 100644 --- a/src/include/port/pg_utf8.h +++ b/src/include/port/pg_utf8.h @@ -53,26 +53,25 @@ extern int pg_validate_utf8_fallback(const unsigned char *s, int len); static inline int check_ascii(const unsigned char *s, int len) { - uint64 half1, half2, + uint64 chunk, highbit_mask; - if (len >= 2 * sizeof(uint64)) + if (len >= sizeof(uint64)) { - memcpy(&half1, s, sizeof(uint64)); - memcpy(&half2, s + sizeof(uint64), sizeof(uint64)); + memcpy(&chunk, s, sizeof(uint64)); /* * If there are any zero bytes, bail and let the slow * path handle it. */ - if (HAS_ZERO(half1) || HAS_ZERO(half2)) + if (HAS_ZERO(chunk)) return 0; /* Check if any bytes in this chunk have the high bit set. */ - highbit_mask = ((half1 | half2) & UINT64CONST(0x8080808080808080)); + highbit_mask = (chunk & UINT64CONST(0x8080808080808080)); if (!highbit_mask) - return 2 * sizeof(uint64); + return sizeof(uint64); else return 0; } diff -
Re: WIP: BRIN multi-range indexes
On Mon, Feb 15, 2021 at 10:37 AM Tomas Vondra wrote: > [v20210215] Hi Tomas, This time I only looked at the cumulative changes in the multiminmax opclass, since I'm pretty sure all the bloom issues have been addressed. * XXX CombineRange name seems a bit weird. Consider renaming, perhaps to * something ExpandedRange or so. The time to do it is now, if you like, or remove the XXX. I agree with the comment, FWIW. In has_matching_range(): * So we know it's in the general min/max, the question is whether it * falls in one of the ranges or gaps. We'll use a binary search on * the ranges. * * it's in the general range, but is it actually covered by any * of the ranges? Repeat the check for each range. * * XXX We simply walk the ranges sequentially, but maybe we could * further leverage the ordering and non-overlap and use bsearch to * speed this up a bit. It looks to me like you already implemented binary search and the last part is out of date, or am I missing something? Same in range_contains_value(): * XXX This might benefit from the fact that both the intervals and exact * values are sorted - we might do bsearch or something. Currently that * does not make much difference (there are only ~32 intervals), but if * this gets increased and/or the comparator function is more expensive, * it might be a huge win. Below that it does binary search if the number of elements > 16. In merge_combine_ranges(): There are a couple assert-related TODOs. In brin_minmax_multi_distance_timetz(): * XXX Does this need to consider the time zones? I wouldn't think so, because the stored values are in UTC -- the time zone calculation only happens during storage and retrieval, and they've already been stored, IIUC. Also, I think you need to copy this part from brin_minmax_multi_distance_timestamp() here as well: if (TIMESTAMP_NOT_FINITE(dt1) || TIMESTAMP_NOT_FINITE(dt2)) PG_RETURN_FLOAT8(0); At this point, I think it's pretty close to commit-ready. I thought maybe I would create a small index with every type, and make sure it looks sane in page_inspect, but that's about it. -- John Naylor EDB: http://www.enterprisedb.com
Re: non-HOT update not looking at FSM for large tuple update
On Wed, Feb 24, 2021 at 10:44 AM Floris Van Nee wrote: > > The problem here is two-folded: for any non-HOT update of a tuple that’s larger than the size of the fillfactor, the fsm will not be used, but instead a new page will be chosen. I confirmed this not only non-HOT updates, but regular inserts, which are the same thing in this context. > This seems to rely on the false assumption that every existing page has at last one tuple on it. Yep. > Secondly, and this is a bit trickier.. Even if we would ask the FSM to come up with a free page with a free size of “MaxHeapTupleSize”, it wouldn’t find anything… This is, because the FSM tracks free space excluding any unused line pointers. > There are 7 line pointers on this page, consuming 28 bytes. Plus the 24 byte header, that means that lower=52. However, all line pointers are unused, so the page really is empty. The FSM does not see the page as empty though, as it only looks at “upper-lower”. > > > > When asking the FSM for slightly less space (MaxHeapTupleSize – 50 for example), it does find the free pages. I’ve confirmed that with such a hack the table is not growing indefinitely anymore. However, this number 50 is rather arbitrary obviously, as it depends on the number of unused line items on a page, so that’s not a proper way to fix things. > > > > In any case, the behavior feels like a bug to me, but I don’t know what the best way would be to fix it. Thoughts? One idea is to take your -50 idea and make it more general and safe, by scaling the fudge factor based on fillfactor, such that if fillfactor is less than 100, the requested freespace is a bit smaller than the max. It's still a bit of a hack, though. I've attached a draft of this idea. -- John Naylor EDB: http://www.enterprisedb.com allow-inserting-tuples-into-almost-empty-pages.patch Description: Binary data
Re: non-HOT update not looking at FSM for large tuple update
On Wed, Feb 24, 2021 at 4:52 PM Floris Van Nee wrote: > I also understand the temptation to define it based on the relation's fill factor, as you did in the patch. However, upon some further thought I wonder if that's useful? A relation with a higher fill factor will have a lower 'saveFreeSpace' variable, so it's less likely to run into issues in finding a fresh page, except if the tuple you're inserting/updating is even larger. However, if that case happens, you'll still be wanting to look for a page that's completely empty (except for the line items). So the only proper metric is 'how many unused line items do we expect on empty pages' and the fillfactor doesn't say much about this. Since this is probably difficult to estimate at all, we may be better off just defining it off MaxHeapTupleSize completely? > For example, we expect 1.5% of the page could be line items, then: > > targetFreeSpace = MaxHeapTupleSize * 0.985 That makes sense, although the exact number seems precisely tailored to your use case. 2% gives 164 bytes of slack and doesn't seem too large. Updated patch attached. -- John Naylor EDB: http://www.enterprisedb.com v2-allow-inserting-tuples-into-almost-empty-pages.patch Description: Binary data
Re: Removing support for COPY FROM STDIN in protocol version 2
On Thu, Feb 4, 2021 at 11:47 AM Heikki Linnakangas wrote: > > On 04/02/2021 17:35, Tom Lane wrote: > > Alvaro Herrera writes: > >> Yeah, the changes I was thinking about are all in libpq-int.h so that's > >> not really a problem. But one enum in libpq-fe.h renumbers values, and > >> I think it's better to keep the old value labelled as "unused" to avoid > >> any changes. > > > > Oh, yeah, can't do that. libpq-fe.h probably shouldn't change at all; > > but certainly we can't renumber existing enum values there. > > Ah, right, there's even a comment above the enum that says that's a no > no. But yeah, fixing that, I see no need for .so version bump. I was able to build libpq and psql on 7.3 with the tooling found on RHEL 7 (the rest of the tree refused to build, but that's not relevant here) and got the expected message when trying to connect: master: Welcome to psql 7.3.21, the PostgreSQL interactive terminal. patch: psql: FATAL: unsupported frontend protocol 2.0: server supports 3.0 to 3.0 I couldn't find any traces of version 2 in the tree with the patch applied. The enum mentioned above seems the only issue that needs to be fixed before commit. -- John Naylor EDB: http://www.enterprisedb.com
Re: non-HOT update not looking at FSM for large tuple update
On Wed, Feb 24, 2021 at 6:29 PM Floris Van Nee wrote: > > > > That makes sense, although the exact number seems precisely tailored to your use case. 2% gives 164 bytes of slack and doesn't seem too large. Updated patch attached. > > Yeah, I tried picking it as conservative as I could, but 2% is obviously great too. :-) I can't think of any large drawbacks either of having a slightly larger value. > Thanks for posting the patch! I've added this to the commitfest as a bug fix and added you as an author. -- John Naylor EDB: http://www.enterprisedb.com
our use of popcount
While reviewing the patch to speed up Gist indexes with tsvectors [1] by smarter use of popcount, I was reminded that for hardware popcount, we do an indirect function call to execute a single CPU instruction, one word at a time. I went ahead and did some microbenchmarks to see how much that buys us, and if we could do better. For the tests below I used the attached C file compiled into a shared library with gcc 8.4, measuring popcount on an 8kB buffer, median of 3 runs: select drive_popcount64(100, 1024); select drive_popcount(100, 1024); Note: pg_popcount64() is passed one 8-byte word, and pg_popcount() is passed a buffer, which if properly aligned allows to call the appropriate word-at-a-time popcount function. master -- indirect call to pg_popcount64_asm(), where available. Note that passing a buffer is faster: pg_popcount64() 2680ms pg_popcount() 1640ms 0001 ignores assembly and uses direct calls to popcount64_slow(). It is quite a bit slower, but notice that passing a buffer to pg_popcont() with the slow implementation is not all that much slower than calling the indirect assembly one word at a time (2680ms vs 3190ms): pg_popcount64() 4930ms pg_popcount() 3190ms It's also worth noting that currently all platforms use an indirect function call, regardless of instruction support, so the direct call here is a small win for non-x86-64 platforms. 0002 restores indirection through a function pointer, but chooses the pass-a-buffer function rather than the retail function, and if assembly is available, calls inlined popcount64_asm(). This in itself is not much faster than master (1640ms vs 1430ms): pg_popcount() 1430ms However, 0003 takes the advice of [2] and [3] and uses hand-written unrolled assembly, and now we actually have some real improvement, over 3x faster than master: pg_popcount() 494ms 0004 shows how it would look to use the buffer-passing version for bitmapsets and the visibility map. Bitmapsets would benefit from this even on master, going by the test results above. If we went with something like 0003, bitmapsets would benefit further automatically. However, counting of visibility map bits is a bit more tricky, since we have to mask off the visible bits and frozen bits to count them separately. 0004 has a simple unoptimized function to demonstrate. As I recall, the VM code did not benefit much from the popcount infrastructure to begin with, so a small regression might not be noticeable. If it is, it's just a SMOP to offer an assembly version here also. The motivation for benchmarking this was to have some concrete numbers in mind while reviewing gist index creation. If the gist patch can benefit from any of the above, it's worth considering, as a more holistic approach. Since it affects other parts of the system, I wanted to share this in a new thread first. [1] https://commitfest.postgresql.org/32/3023/ [2] https://danluu.com/assembly-intrinsics/ [3] https://stackoverflow.com/questions/25078285/replacing-a-32-bit-loop-counter-with-64-bit-introduces-crazy-performance-deviati -- John Naylor EDB: http://www.enterprisedb.com v1-0001-Use-direct-function-calls-for-pg_popcount-32-64.patch Description: Binary data v1-0002-Use-platform-specific-implementations-of-pg_popco.patch Description: Binary data v1-0003-Manually-unroll-the-loop-with-hand-written-assemb.patch Description: Binary data v1-0004-Use-pg_popcount-rather-than-pg_popcount-32-64-whe.patch Description: Binary data drive_popcount.c Description: Binary data
Re: Speeding up GIST index creation for tsvectors
On Wed, Jan 27, 2021 at 11:07 AM Amit Khandekar wrote: Hi Amit, Your performance numbers look like this is a fruitful area to improve. I have not yet tested performance, but I will do so at a later date. I did some microbenchmarking of our popcount implementation, since I wasn't quite sure it's optimal, and indeed, there is room for improvement there [1]. I'd be curious to hear your thoughts on those findings. I think it'd be worth it to test a version of this patch using those idioms here as well, so at some point I plan to post something. Now for the patches: 0001: + /* + * We can process 64-bit chunks only if both are mis-aligned by the same + * number of bytes. + */ + if (b_aligned - b == a_aligned - a) The obvious question here is: how often are they identically misaligned? You don't indicate that your measurements differ in a bimodal fashion, so does that mean they happen to be always (mis)aligned? Is that an accident of the gist coding and could change unexpectedly? And how hard would it be to allocate those values upstream so that the pointers are always aligned on 8-byte boundaries? (I imagine pretty hard, since there are multiple callers, and such tight coupling is not good style) + /* For smaller lengths, do simple byte-by-byte traversal */ + if (bytes <= 32) You noted upthread: > Since for the gist index creation of some of these types the default > value for siglen is small (8-20), I tested with small siglens. For > siglens <= 20, particularly for values that are not multiples of 8 > (e.g. 10, 13, etc), I see a 1-7 % reduction in speed of index > creation. It's probably because of > an extra function call for pg_xorcount(); and also might be due to the > extra logic in pg_xorcount() which becomes prominent for shorter > traversals. So for siglen less than 32, I kept the existing method > using byte-by-byte traversal. I wonder if that can be addressed directly, while cleaning up the loops and alignment checks in pg_xorcount_long() a little. For example, have a look at pg_crc32c_armv8.c -- it might be worth testing a similar approach. Also, pardon my ignorance, but where can I find the default siglen for various types? +/* Count the number of 1-bits in the result of xor operation */ +extern uint64 pg_xorcount_long(const unsigned char *a, const unsigned char *b, + int bytes); +static inline uint64 pg_xorcount(const unsigned char *a, const unsigned char *b, + int bytes) I don't think it makes sense to have a static inline function call a global function. -static int +static inline int hemdistsign(BITVECP a, BITVECP b, int siglen) Not sure what the reason is for inlining all these callers. Come to think of it, I don't see a reason to have hemdistsign() at all anymore. All it does is call pg_xorcount(). I suspect that's what Andrey Borodin meant when he said upthread: > > > Meanwhile there are at least 4 incarnation of hemdistsign() functions that are quite similar. I'd propose to refactor them somehow... 0002: I'm not really happy with this patch. I like the idea of keeping indirect calls out of non-x86 platforms, but I believe it could be done more simply. For one, I don't see a need to invent a third category of retail function. Second, there's no reason to put "nonasm" in the header as a static inline, and then call from there to the new now-global function "slow". Especially since the supposed static inline is still needed as a possible value as a function pointer on x86, so the compiler is not going to inline it on x86 anyway. That just confuses things. (I did make sure to remove indirect calls from the retail functions in [1], in case we want to go that route). [1] https://www.postgresql.org/message-id/CAFBsxsFCWys_yfPe4PoF3%3D2_oxU5fFR2H%2BmtM6njUA8nBiCYug%40mail.gmail.com -- John Naylor EDB: http://www.enterprisedb.com
Re: Force lookahead in COPY FROM parsing
On Thu, Mar 4, 2021 at 5:13 AM Heikki Linnakangas wrote: > > I posted this earlier at > https://www.postgresql.org/message-id/9ec25819-0a8a-d51a-17dc-4150bb3cc...@iki.fi , > and that led to removing FE/BE protocol version 2 support. That's been > committed now, so here's COPY FROM patch again, rebased. Looks good to me. Just a couple minor things: + * Look at the next character. If we're at EOF, c2 will wind + * up as '\0' because of the guaranteed pad of raw_buf. */ - IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(0); - - /* get next char */ c = copy_raw_buf[raw_buf_ptr]; The new comment seems copy-pasted from the c2 statements further down. - if (raw_buf_ptr >= copy_buf_len || need_data) +#define COPY_READ_LINE_LOOKAHEAD 4 + if (raw_buf_ptr + COPY_READ_LINE_LOOKAHEAD >= copy_buf_len) Is this #define deliberately put down here rather than at the top of the file? - * of the buffer and then we load more data after that. This case occurs only - * when a multibyte character crosses a bufferload boundary. + * of the buffer and then we load more data after that. Is the removed comment really invalidated by this patch? I figured it was something not affected until the patch to do the encoding conversion in larger chunks. -- John Naylor EDB: http://www.enterprisedb.com
Re: WIP: BRIN multi-range indexes
On Tue, Mar 2, 2021 at 7:32 PM Tomas Vondra wrote: > Ummm, in brin_minmax_multi_distance_timetz()? I don't think timetz can > be infinite, no? I think brin_minmax_multi_distance_date you meant? In timestamp.c there are plenty of checks for TIMESTAMP_NOT_FINITE for TimestampTz types and I assumed that was applicable here. -- John Naylor EDB: http://www.enterprisedb.com
Re: [POC] verifying UTF-8 using SIMD instructions
On Tue, Mar 9, 2021 at 5:00 AM Amit Khandekar wrote: > > Hi, > > Just a quick question before I move on to review the patch ... The > improvement looks like it is only meant for x86 platforms. Actually it's meant to be faster for all platforms, since the C fallback is quite a bit different from HEAD. I've found it to be faster on ppc64le. An earlier version of the patch was a loser on 32-bit Arm because of alignment issues, but if you could run the test script attached to [1] on 64-bit Arm, I'd be curious to see how it does on 0002, and whether 0003 and 0004 make things better or worse. If there is trouble building on non-x86 platforms, I'd want to fix that also. (Note: 0001 is not my patch, and I just include it for the tests) > Can this be > done in a portable way by arranging for auto-vectorization ? Something > like commit 88709176236caf. This way it would benefit other platforms > as well. I'm fairly certain that the author of a compiler capable of doing that in this case would be eligible for some kind of AI prize. :-) [1] https://www.postgresql.org/message-id/06d45421-61b8-86dd-e765-f1ce527a5...@iki.fi -- John Naylor EDB: http://www.enterprisedb.com
Re: non-HOT update not looking at FSM for large tuple update
On Tue, Mar 9, 2021 at 9:40 AM Floris Van Nee wrote: > > Hi, > > > > > This patch fails to consider that len may be bigger than MaxHeapTupleSize * > > 0.98, which in this case triggers a reproducable > > PANIC: > > Good catch! I've adapted the patch with your suggested fix. Thank you both for testing and for the updated patch. It seems we should add a regression test, but it's not clear which file it belongs in. Possibly insert.sql? > > One different question I have, though, is why we can't "just" teach vacuum > > to clean up trailing unused line pointers. As in, can't we trim the line pointer > > array when vacuum detects that the trailing line pointers on the page are all > > unused? That seems like the proper fix, and I see you've started a thread for that. I don't think that change in behavior would be backpatchable, but patch here might have a chance at that. -- John Naylor EDB: http://www.enterprisedb.com
Re: non-HOT update not looking at FSM for large tuple update
I wrote: > That seems like the proper fix, and I see you've started a thread for that. I don't think that change in behavior would be backpatchable, but patch here might have a chance at that. I remembered after the fact that truncating line pointers would only allow for omitting the 2% slack logic (and has other benefits), but the rest of this patch would be needed regardless. -- John Naylor EDB: http://www.enterprisedb.com
Re: WIP: BRIN multi-range indexes
On Sun, Mar 7, 2021 at 8:53 PM Tomas Vondra wrote: [v20210308b] I managed to trap an assertion that somehow doesn't happen during the regression tests. The callers of fill_expanded_ranges() do math like this: /* both ranges and points are expanded into a separate element */ neranges = ranges->nranges + ranges->nvalues; but inside fill_expanded_ranges() we have this assertion: /* Check that the output array has the right size. */ Assert(neranges == (2 * ranges->nranges + ranges->nvalues)); In the regression test data, a debugging elog() shows that nranges is most often zero, so in that case, the math happens to be right either way. I can reliably get nranges above zero by running update brintest_multi set int8col = int8col - 1; a few times, at which point I get the crash. Aside from that, the new changes look good. Just a couple small things: +allowed parameters. Only the bloom operator class +allows specifying parameters: minmax-multi aren't mentioned here, but are mentioned further down. + * Addresses from different families are consider to be in maximum (comment above brin_minmax_multi_distance_inet) s/consider/considered/ > > 2) moving minmax/inclusion changes from 0002 to a separate patch 0003 > > > > I think we should either ditch the 0003 (i.e. keep the existing > > opclasses unchanged) or commit 0003 (in which case I'd vote to just stop > > supporting the old signature of the consistent function). > > > > Still not sure what do to about this. I'm leaning towards keeping 0003 > and just removing the "old" signature entirely, to keep the API cleaner. > It might cause some breakage in out-of-core BRIN opclasses, but that > seems like a reasonable price. Moreover, the opclasses may need some > updating anyway, because of the changes in handling NULL scan keys (0004 > moves that from the opclass to the bringetbitmap function). Keeping 0003 seems reasonable, given the above. > > The remaining part that didn't get much review is the very last patch, > > adding an option to ignore correlation for some BRIN opclases. This is > > needed as the regular BRIN costing is quite sensitive to correlation, > > and the cost gets way too high for poorly correlated data, making it > > unlikely the index will be used. But handling such data sets efficiently > > is the main point of those new opclasses. Any opinions on this? > > > > Not sure about this. I hadn't given it much thought (nor tested), but I just took a look at brincostestimate(). If the table is badly correlated, I'm thinking the number of ranges that need to be scanned will increase regardless. Maybe rather than ignoring correlation, we could clamp it or otherwise tweak it. Not sure about the details, though, that would require some testing. -- John Naylor EDB: http://www.enterprisedb.com
Re: get rid of tags in the docs?
On Thu, Feb 4, 2021 at 11:31 AM Tom Lane wrote: > > John Naylor writes: > > While looking at the proposed removal of the v2 protocol, I noticed that we > > italicize some, but not all, instances of 'per se', 'pro forma', and 'ad > > hoc'. I'd say these are widespread enough in formal registers of English > > that they hardly need to be called out as foreign, so I propose removing > > the tags for those words. > > +1, nobody italicizes those in normal usage. Now that protocol v2 is gone, here's a patch to remove those tags. > > The other case is 'voilà', found in rules.sgml. The case for italics here > > is stronger, but looking at that file, I actually think a more > > generic-sounding phrase here would be preferable. > > Yeah, seeing that we only use that in one place, I think we could do > without it. Looks like something as pedestrian as "The results are:" > would do fine. Done that way. -- John Naylor EDB: http://www.enterprisedb.com v1-0001-Get-rid-of-foreignphrase-tags-in-the-docs.patch Description: Binary data
Re: Speeding up GIST index creation for tsvectors
On Mon, Mar 8, 2021 at 8:43 AM Amit Khandekar wrote: > > On Wed, 3 Mar 2021 at 23:32, John Naylor wrote: > > 0001: > > > > + /* > > + * We can process 64-bit chunks only if both are mis-aligned by the same > > + * number of bytes. > > + */ > > + if (b_aligned - b == a_aligned - a) > > > > The obvious question here is: how often are they identically misaligned? You > > don't indicate that your measurements differ in a bimodal fashion, so does > > that mean they happen to be always (mis)aligned? > > I ran CREATE INDEX on tsvector columns using the tsearch.data that I > had attached upthread, with some instrumentation; here are the > proportions : > 1. In 15% of the cases, only one among a and b was aligned. The other > was offset from the 8-byte boundary by 4 bytes. > 2. 6% of the cases, both were offset by 4 bytes, i.e. identically misaligned. > 3. Rest of the cases, both were aligned. That's somewhat strange. I would have assumed always aligned, or usually not. It sounds like we could require them both to be aligned and not bother with the byte-by-byte prologue. I also wonder if it's worth it to memcpy to a new buffer if the passed pointer is not aligned. > > + /* For smaller lengths, do simple byte-by-byte traversal */ > > + if (bytes <= 32) > > > > You noted upthread: > > > > > Since for the gist index creation of some of these types the default > > > value for siglen is small (8-20), I tested with small siglens. For > > > siglens <= 20, particularly for values that are not multiples of 8 > > > (e.g. 10, 13, etc), I see a 1-7 % reduction in speed of index > > > creation. It's probably because of > > > an extra function call for pg_xorcount(); and also might be due to the > > > extra logic in pg_xorcount() which becomes prominent for shorter > > > traversals. So for siglen less than 32, I kept the existing method > > > using byte-by-byte traversal. > > > > I wonder if that can be addressed directly, while cleaning up the loops and > > alignment checks in pg_xorcount_long() a little. For example, have a look at > > pg_crc32c_armv8.c -- it might be worth testing a similar approach. > > Yeah, we can put the bytes <= 32 condition inside pg_xorcount_long(). > I avoided that to not hamper the <= 32 scenarios. Details explained > below for "why inline pg_xorcount is calling global function" > > > Also, pardon my ignorance, but where can I find the default siglen for various types? > Check SIGLEN_DEFAULT. Okay, so it's hard-coded in various functions in contrib modules. In that case, let's just keep the existing coding for those. In fact, the comments that got removed by your patch specifically say: /* Using the popcount functions here isn't likely to win */ ...which your testing confirmed. The new function should be used only for Gist and possibly Intarray, whose default is 124. That means we can't get rid of hemdistsign(), but that's fine. Alternatively, we could say that a small regression is justifiable reason to refactor all call sites, but I'm not proposing that. > > (I did make sure to remove indirect calls from the retail functions > > in [1], in case we want to go that route). > > Yeah, I quickly had a look at that. I am still going over that thread. > Thanks for the exhaustive analysis there. I'll post a patch soon that builds on that, so you can see what I mean. -- John Naylor EDB: http://www.enterprisedb.com
Re: Speeding up GIST index creation for tsvectors
I wrote: > I'll post a patch soon that builds on that, so you can see what I mean. I've attached where I was imagining this heading, as a text file to avoid distracting the cfbot. Here are the numbers I got with your test on the attached, as well as your 0001, on x86-64 Clang 10, default siglen: master: 739ms v3-0001 692ms attached POC 665ms The small additional speed up is not worth it, given the code churn and complexity, so I don't want to go this route after all. I think the way to go is a simplified version of your 0001 (not 0002), with only a single function, for gist and intarray only, and a style that better matches the surrounding code. If you look at my xor functions in the attached text file, you'll get an idea of what it should look like. Note that it got the above performance without ever trying to massage the pointer alignment. I'm a bit uncomfortable with the fact that we can't rely on alignment, but maybe there's a simple fix somewhere in the gist code. -- John Naylor EDB: http://www.enterprisedb.com src/backend/access/heap/visibilitymap.c | 13 +- src/backend/nodes/bitmapset.c | 23 +-- src/backend/utils/adt/tsgistidx.c | 14 +- src/include/port/pg_bitutils.h | 12 +- src/port/pg_bitutils.c | 240 ++-- 5 files changed, 214 insertions(+), 88 deletions(-) diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c index e198df65d8..d910eb2875 100644 --- a/src/backend/access/heap/visibilitymap.c +++ b/src/backend/access/heap/visibilitymap.c @@ -388,7 +388,6 @@ visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_fro { Buffer mapBuffer; uint64 *map; - int i; /* * Read till we fall off the end of the map. We assume that any extra @@ -409,17 +408,11 @@ visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_fro StaticAssertStmt(MAPSIZE % sizeof(uint64) == 0, "unsupported MAPSIZE"); if (all_frozen == NULL) - { - for (i = 0; i < MAPSIZE / sizeof(uint64); i++) - nvisible += pg_popcount64(map[i] & VISIBLE_MASK64); - } + nvisible = pg_popcount_mask64(map, MAPSIZE, VISIBLE_MASK64); else { - for (i = 0; i < MAPSIZE / sizeof(uint64); i++) - { - nvisible += pg_popcount64(map[i] & VISIBLE_MASK64); - nfrozen += pg_popcount64(map[i] & FROZEN_MASK64); - } + nvisible = pg_popcount_mask64(map, MAPSIZE, VISIBLE_MASK64); + nfrozen = pg_popcount_mask64(map, MAPSIZE, FROZEN_MASK64); } ReleaseBuffer(mapBuffer); diff --git a/src/backend/nodes/bitmapset.c b/src/backend/nodes/bitmapset.c index 649478b0d4..5368c72255 100644 --- a/src/backend/nodes/bitmapset.c +++ b/src/backend/nodes/bitmapset.c @@ -452,7 +452,6 @@ bms_is_member(int x, const Bitmapset *a) int bms_member_index(Bitmapset *a, int x) { - int i; int bitnum; int wordnum; int result = 0; @@ -466,14 +465,8 @@ bms_member_index(Bitmapset *a, int x) bitnum = BITNUM(x); /* count bits in preceding words */ - for (i = 0; i < wordnum; i++) - { - bitmapword w = a->words[i]; - - /* No need to count the bits in a zero word */ - if (w != 0) - result += bmw_popcount(w); - } + result = pg_popcount((const char *) a->words, +wordnum * BITS_PER_BITMAPWORD / BITS_PER_BYTE); /* * Now add bits of the last word, but only those before the item. We can @@ -645,22 +638,14 @@ bms_get_singleton_member(const Bitmapset *a, int *member) int bms_num_members(const Bitmapset *a) { - int result = 0; int nwords; - int wordnum; if (a == NULL) return 0; nwords = a->nwords; - for (wordnum = 0; wordnum < nwords; wordnum++) - { - bitmapword w = a->words[wordnum]; - /* No need to count the bits in a zero word */ - if (w != 0) - result += bmw_popcount(w); - } - return result; + return pg_popcount((const char *) a->words, + nwords * BITS_PER_BITMAPWORD / BITS_PER_
Re: non-HOT update not looking at FSM for large tuple update
On Thu, Mar 11, 2021 at 9:46 AM Matthias van de Meent < boekewurm+postg...@gmail.com> wrote: > Regarding the 2% slack logic, could we change it to use increments of > line pointers instead? That makes it more clear what problem this > solution is trying to work around; that is to say line pointers not > (all) being truncated away. The currently subtracted value accounts That makes sense. > ... > - if (len + saveFreeSpace > MaxHeapTupleSize) > + if (len + saveFreeSpace > maxPaddedFsmRequest) > ... > - targetFreeSpace = Max(len, MaxHeapTupleSize - (MaxHeapTupleSize * 2 / 100)); > + targetFreeSpace = Max(len, maxPaddedFsmRequest); > ... If we have that convenient constant, it seems equivalent (I think) and a bit more clear to write it this way, but I'm not wedded to it: if (len + saveFreeSpace > MaxHeapTupleSize && len <= maxPaddedFsmRequest) { ... targetFreeSpace = maxPaddedFsmRequest; } else targetFreeSpace = len + saveFreeSpace; Also, should I write a regression test for it? The test case is already available, just no obvious place to put it. -- John Naylor EDB: http://www.enterprisedb.com
Re: [POC] verifying UTF-8 using SIMD instructions
On Fri, Mar 12, 2021 at 9:14 AM Amit Khandekar wrote: > > On my Arm64 VM : > > HEAD : > mixed | ascii > ---+--- > 1091 | 628 > (1 row) > > PATCHED : > mixed | ascii > ---+--- >681 | 119 Thanks for testing! Good, the speedup is about as much as I can hope for using plain C. In the next patch I'll go ahead and squash in the ascii fast path, using 16-byte stride, unless there are objections. I claim we can live with the regression Heikki found on an old 32-bit Arm platform since it doesn't seem to be true of Arm in general. > I guess, if at all we use the equivalent Arm NEON intrinsics, the > "mixed" figures will be close to the "ascii" figures, going by your > figures on x86. I would assume so. > I was not thinking about auto-vectorizing the code in > pg_validate_utf8_sse42(). Rather, I was considering auto-vectorization > inside the individual helper functions that you wrote, such as > _mm_setr_epi8(), shift_right(), bitwise_and(), prev1(), splat(), If the PhD holders who came up with this algorithm thought it possible to do it that way, I'm sure they would have. In reality, simdjson has different files for SSE4, AVX, AVX512, NEON, and Altivec. We can incorporate any of those as needed. That's a PG15 project, though, and I'm not volunteering. -- John Naylor EDB: http://www.enterprisedb.com
Re: Move catalog toast table and index declarations
On Tue, Oct 27, 2020 at 7:43 AM Peter Eisentraut < peter.eisentr...@2ndquadrant.com> wrote: > On 2020-10-24 15:23, John Naylor wrote: > > Style: In genbki.h, "extern int no_such_variable" is now out of place. > > Also, the old comments like "The macro definitions are just to keep the > > C compiler from spitting up." are now redundant in their new setting. > > Hmm, I don't really see what's wrong there. Do you mean the macro > definitions should be different, or the comments are wrong, or something > else? > There's nothing wrong; it's just a minor point of consistency. For the first part, I mean defined symbols in this file that are invisible to the C compiler are written #define SOMETHING() If some are written #define SOMETHING() extern int no_such_variable I imagine some future reader will wonder why there's a difference. As for the comments, the entire file is for things meant for scripts to read, but have to be put in macro form to be invisible to the compiler. The header comment has "genbki.h defines CATALOG(), BKI_BOOTSTRAP and related macros * so that the catalog header files can be read by the C compiler." I'm just saying we don't need to carry over the comments I mentioned from the toasting/indexing headers that were specially for those macros. -- John Naylor EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
duplicate function oid symbols
Hi, I noticed that the table AM abstraction introduced the symbol HEAP_TABLE_AM_HANDLER_OID, although we already have a convention for defining symbols automatically for builtin functions, which in this case is (currently unused) F_HEAP_TABLEAM_HANDLER. It seems a wart to have two symbols for the same function (seemingly accidentally). I suppose it's unacceptable to remove the non-standard symbol since it's been referred to in code for a while now. We could remove the unused (in core anyway) standard one by arranging fmgroids.h to use explicit symbols from pg_proc.dat where they exist, as well as prevent such symbols from being emitted into pg_proc_d.h. But then again there is no guarantee the standard symbol is not being used elsewhere. Thoughts? -- John Naylor EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
cutting down the TODO list thread
As I mentioned in [1], I've volunteered to clear out the TODO list of items that appear to be too difficult, controversial, or otherwise not worth doing to warrant being listed there. I'll be working a few sections at a time, and every so often I'll have a list of proposed items for removal. If I don't hear objections, I'll remove the items after a few days while going through the next set. Where there's an email thread, I've skimmed a few messages to get a sense of the community's thoughts on it. Where easily determined, I've taken age into account, insofar as something from 2017 is going to get much more benefit of doubt than something from 2008. I've added after each item a phrase that sums up the reason I believe it doesn't belong anymore. Feedback welcome, of course, although I suspect there won't be much. **Administration - Have custom variables be transaction-safe Old and found to be difficult after attempting - Allow custom variables to appear in pg_settings() Old and controversial - Implement the SQL-standard mechanism whereby REVOKE ROLE revokes only the privilege granted by the invoking role, and not those granted by other roles Old and difficult - Prevent query cancel packets from being replayed by an attacker, especially when using SSL Old and difficult *Configuration files - Consider normalizing fractions in postgresql.conf, perhaps using '%' At the time (2007), some gucs used an actual percentage. - Add external tool to auto-tune some postgresql.conf parameters There are already out-of-core tools that try to do this. - Create utility to compute accurate random_page_cost value Seems outdated: In the current age of SSDs and cloud environments, it's often just set to 1.1, and there hasn't been a demand to be more accurate than that. - Allow synchronous_standby_names to be disabled after communication failure with all synchronous standby servers exceeds some timeout Controversial - Adjust rounding behavior for numeric GUC values Controversial *Tablespaces - Allow WAL replay of CREATE TABLESPACE to work when the directory structure on the recovery computer is different from the original Thread quote: "part of the difficult, perhaps-not-worth doing impossible problems" - Allow per-tablespace quotas This seems to point to the larger problem space of disk space monitoring, and should probably be phrased thusly, and is a much bigger project or set of projects. - Allow tablespaces on RAM-based partitions for temporary objects In the thread, what's desired is the ability to have some amount of durability on a RAM-disk without WAL logging. - Close race in DROP TABLESPACE on Windows This refers to buildfarm failures from 2014. *Statistics Collector - Track number of WAL files ready to be archived in pg_stat_archiver Thread quote: "pg_stat_archiver already has a column for last_archived_wal and last_failed_wal, so you can already work out how many files there must be between then and now" *Point-In-Time Recovery - Allow archive_mode to be changed without server restart Controversial and old *Standby server mode - Allow pg_xlogfile_name() to be used in recovery mode Controversial and old - Change walsender so that it applies per-role settings Old and possibly obsolete -- [1] https://www.postgresql.org/message-id/CAFBsxsHbqMzDoGB3eAGmpcpB%2B7uae%2BLLi_G%2Bo8HMEECM9CbQcQ%40mail.gmail.com -- John Naylor EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: cutting down the TODO list thread
On Tue, Oct 27, 2020 at 4:00 PM Thomas Munro wrote: > On Wed, Oct 28, 2020 at 8:36 AM Andres Freund wrote: > > On 2020-10-27 15:24:35 -0400, John Naylor wrote: > > > - Allow WAL replay of CREATE TABLESPACE to work when the directory > > > structure on the recovery computer is different from the original > > > Thread quote: "part of the difficult, perhaps-not-worth doing > impossible > > > problems" > > > > I think we ought to do something here. Mostly because the current > > situation makes it impossible to test many things on a single > > system. And we have a partial solution with the tablespace mapping > > files. > > +1, we need to get something like this working so that we can write > decent replication tests. FWIW there was another little thread on the > topic, not listed there: > > > https://www.postgresql.org/message-id/flat/CALfoeisEF92F5nJ-aAcuWTvF_Aogxq_1bHLem_kVfM_tHc2mfg%40mail.gmail.com > Thanks, I've added this thread to the entry. -- John Naylor EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: cutting down the TODO list thread
On Tue, Oct 27, 2020 at 3:52 PM Bruce Momjian wrote: > > Do any of these limitations need to be documented before removing them > from the TODO list? > I see two areas that might use a mention: - pg_settings not displaying custom variables - SQL standard difference with REVOKE ROLE (I haven't looked further into this) -- John Naylor EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: duplicate function oid symbols
On Tue, Oct 27, 2020 at 9:51 AM Tom Lane wrote: > John Naylor writes: > > I noticed that the table AM abstraction introduced the symbol > > HEAP_TABLE_AM_HANDLER_OID, although we already have a convention for > > defining symbols automatically for builtin functions, which in this case > is > > (currently unused) F_HEAP_TABLEAM_HANDLER. > > Yeah, that seems wrong. I'd just remove HEAP_TABLE_AM_HANDLER_OID. > As long as we're not back-patching the change, it seems like a very > minor thing to fix, if anyone outside core is referencing the old name. > Ok, here is a patch to fix that, and also throw an error if pg_proc.dat has an explicitly defined symbol. -- John Naylor EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company v1-0001-Use-the-standard-symbol-for-the-builtin-function-.patch Description: Binary data
Re: cutting down the TODO list thread
On Tue, Oct 27, 2020 at 6:05 PM Bruce Momjian wrote: > On Tue, Oct 27, 2020 at 04:54:24PM -0400, John Naylor wrote: > > > > > > On Tue, Oct 27, 2020 at 3:52 PM Bruce Momjian wrote: > > > > > > Do any of these limitations need to be documented before removing > them > > from the TODO list? > > > > > > I see two areas that might use a mention: > > > > - pg_settings not displaying custom variables > > - SQL standard difference with REVOKE ROLE (I haven't looked further > into this) > > OK, thanks. Do you want to work on a doc patch or should I? Having it > the docs at least warns our users. > I'll work on that. -- John Naylor EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: cutting down the TODO list thread
On Wed, Oct 28, 2020 at 6:52 AM Magnus Hagander wrote: > On Wed, Oct 28, 2020 at 11:15 AM Julien Rouhaud > wrote: > > > > On Wed, 28 Oct 2020, 17:55 Oleksandr Shulgin < > oleksandr.shul...@zalando.de wrote: > >> I'm totally on board with cleaning the list up, but how about marking > as "won't fix" (or similar) instead of actually removing the items? That > should help to prevent the same exact items from appearing on the list > again, which they eventually would, I believe. > > > > > > +1 > > A small technical detail on the topic but if doing that, let's not > mark them as that inline -- create a separate page with those items on > it. > How about a section on the same page at the bottom, near "features we don't want"? -- John Naylor EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: duplicate function oid symbols
I wrote: > Ok, here is a patch to fix that, and also throw an error if pg_proc.dat > has an explicitly defined symbol. > It occurred to me I neglected to explain the error with a comment, which I've added in v2. -- John Naylor EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company v2-0001-Use-the-standard-symbol-for-the-builtin-function-.patch Description: Binary data
Re: duplicate function oid symbols
On Wed, Oct 28, 2020 at 12:25 PM Tom Lane wrote: > I wondered about introducing a similar prohibition for pg_type. > That might be worth doing, since some of the grandfathered macros are clustered together, which could lead to more cases creeping in as people match new types to examples nearby. > The only existing oid_symbol in pg_type that I think has enough > grandfather status to be tough to change is CASHOID for "money". > But we could imagine special-casing that with a handmade macro > > #define CASHOID MONEYOID > > and then getting rid of the oid_symbol entries. (Or perhaps we > could just up and nuke CASHOID too? It's somewhat dubious that > any outside code is really using that macro.) > Yeah, grepping shows that some of those aren't even used in core code. On the other hand, the difference from the heap_am_handler case is the standard macros don't already exist for these pg_type entries. The handmade macro idea could be used for all eight just as easily as for one. -- John Naylor EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
document pg_settings view doesn't display custom options
Starting separate threads to keep from cluttering the TODO list thread. Here's a patch for the subject, as mentioned in https://www.postgresql.org/message-id/20201027220555.GS4951%40momjian.us -- John Naylor EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company doc-pg-settings-no-custom.patch Description: Binary data
Re: document pg_settings view doesn't display custom options
On Wed, Oct 28, 2020 at 2:15 PM John Naylor wrote: > Starting separate threads to keep from cluttering the TODO list thread. > > Here's a patch for the subject, as mentioned in > https://www.postgresql.org/message-id/20201027220555.GS4951%40momjian.us > I just realized I introduced a typo, so here's v2. -- John Naylor EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company v2-doc-pg-settings-no-custom.patch Description: Binary data
Re: duplicate function oid symbols
On Wed, Oct 28, 2020 at 3:24 PM Tom Lane wrote: > and then the negotiation here is only about whether to make this list > longer. We don't need to complicate genbki.pl with a new facility. > Agreed, and reformat_dat_files.pl must also know about these special attributes. -- John Naylor EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: duplicate function oid symbols
On Wed, Oct 28, 2020 at 3:24 PM Tom Lane wrote: > > Nah. What I'm imagining is just that pg_type.h contains > > #ifdef EXPOSE_TO_CLIENT_CODE > > /* > * Backwards compatibility for ancient random spellings of OID macros. > * Don't use these macros in new code. > */ > #define CASHOID MONEYOID > #define LSNOID PG_LSNOID > > #endif > Here is a quick patch implementing this much. -- John Naylor EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company forbid-custom-pg-type-symbols.patch Description: Binary data
Re: document pg_settings view doesn't display custom options
On Wed, Oct 28, 2020 at 11:38 PM Fujii Masao wrote: > > > On 2020/10/29 3:45, John Naylor wrote: > > On Wed, Oct 28, 2020 at 2:15 PM John Naylor < > john.nay...@enterprisedb.com <mailto:john.nay...@enterprisedb.com>> wrote: > > > > Starting separate threads to keep from cluttering the TODO list > thread. > > > > Here's a patch for the subject, as mentioned in > > > https://www.postgresql.org/message-id/20201027220555.GS4951%40momjian.us > > > > > > I just realized I introduced a typo, so here's v2. > > + The pg_settings view does not display > + customized options. > > This is true until the module that defines the customized options is > loaded, > but not after that. No? For example, pg_settings displays > pg_stat_statements.max after pg_stat_statements is loaded. > True, how about this: The pg_settings does not display customized options that have been set before the relevant extension module has been loaded. -- John Naylor EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: cutting down the TODO list thread
On Wed, Oct 28, 2020 at 1:57 PM Andres Freund wrote: > On 2020-10-28 16:20:03 +0100, Magnus Hagander wrote: > > I would personally prefer a completely seprate page > > Same. > Ok, that's two votes for a separate page, and one for a new section on the same page, so it looks like it's a new page. That being the case, I would think it logical to move "features we don't want" there. As for the name, we should probably encompass both "won't fix" bugs and features not wanted. Maybe "past development ideas" or "not worth doing", but I'm open to better ideas. Once that's agreed upon, I'll make a new page and migrate the items over, minus the two that were mentioned upthread. -- John Naylor EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: document pg_settings view doesn't display custom options
On Thu, Oct 29, 2020 at 11:51 PM Fujii Masao wrote: > > > On 2020/10/29 21:54, John Naylor wrote: > > > The pg_settings does not display > > customized options > > that have been set before the relevant extension module has been > loaded. > > I guess that someone can misread this as > > customized options that have been set before the relevant extension > module has been loaded are not displayed even after the module is > loaded. > > So what about the following, instead? > > The pg_settings does not display customized options until the > extension > module that defines them has been loaded. > > Also I think this note should be in the different paragraph from the > paragraph > of "The pg_settings view cannot be inserted into or deleted from" > because > they are different topics. Thought? > Agreed on both points. In a separate paragraph, I think it's awkward to start two consecutive sentences with "The pg_settings view". If we put it in the previous paragraph we could phrase it like this: "See Section 20.1 for more information about the various ways to change these parameters. Customized options are not displayed until the extension module that defines them has been loaded. The pg_settings view..." -- John Naylor EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: document pg_settings view doesn't display custom options
On Fri, Oct 30, 2020 at 12:07 PM Tom Lane wrote: > John Naylor writes: > > On Thu, Oct 29, 2020 at 11:51 PM Fujii Masao < > masao.fu...@oss.nttdata.com> > > wrote: > >> Also I think this note should be in the different paragraph from the > >> paragraph > >> of "The pg_settings view cannot be inserted into or deleted from" > >> because > >> they are different topics. Thought? > > > Agreed on both points. In a separate paragraph, I think it's awkward to > > start two consecutive sentences with "The pg_settings view". If we put it > > in the previous paragraph we could phrase it like this: > > > "See Section 20.1 for more information about the various ways to change > > these parameters. Customized options are not displayed until the > > extension module that defines them has been loaded. > > That just moves the subject-inconsistency to a different para :-( > I think this item should be its own new para. > > As for the repetitiveness, we could just say "This view ...", in one or > even both paras. > Okay, along those lines here's a patch using "this view" in a new paragraph for simplicity. -- John Naylor EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company v3-doc-pg-settings-no-custom.patch Description: Binary data
Re: document pg_settings view doesn't display custom options
On Fri, Oct 30, 2020 at 12:48 PM Tom Lane wrote: > John Naylor writes: > > Okay, along those lines here's a patch using "this view" in a new > paragraph > > for simplicity. > > Basically OK with me, but ... > > > It seems fairly weird to use a nonspecific reference first and then a > specific one. That is, I'd expect to read "The pg_settings view ..." > and then "This view ...", not the other way around. So we could > put this para second, or put it first but make this para say > "The pg_settings view ..." while the existing text gets reduced to > "This view ...". > > Or just make them both say "This view ..." so we don't have to have > this discussion again the next time somebody wants to add a para here. > > Okay, how's this? -- John Naylor EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company v4-doc-pg-settings-no-custom.patch Description: Binary data
document deviation from standard on REVOKE ROLE
This is the other doc fix as suggested in https://www.postgresql.org/message-id/20201027220555.GS4951%40momjian.us There is already a compatibility section, so put there. -- John Naylor EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company v1-doc-fix-revoke-role.patch Description: Binary data
Re: [proposal] de-TOAST'ing using a iterator
On Mon, Nov 2, 2020 at 1:30 PM Alvaro Herrera wrote: > On 2020-Nov-02, Anastasia Lubennikova wrote: > > > Status update for a commitfest entry. > > > > This entry was inactive for a very long time. > > John, are you going to continue working on this? > Not in the near future. For background, this was a 2019 GSoC project where I was reviewer of record, and the patch is mostly good, but there is some architectural awkwardness. I have tried to address that, but have not had success. > > The last message mentions some open issues, namely backend crashes, so I > move it to "Waiting on author". > > As I understand, the patch he posted is fine -- it only crashes when he > tried a change I suggested. That's my recollection as well. -- John Naylor EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: cutting down the TODO list thread
I wrote: > > Ok, that's two votes for a separate page, and one for a new section on the > same page, so it looks like it's a new page. That being the case, I would > think it logical to move "features we don't want" there. As for the name, > we should probably encompass both "won't fix" bugs and features not wanted. > Maybe "past development ideas" or "not worth doing", but I'm open to better > ideas. Once that's agreed upon, I'll make a new page and migrate the items > over, minus the two that were mentioned upthread. > Hearing no preference, I've created https://wiki.postgresql.org/wiki/Not_Worth_Doing ...with links between the two. I've moved over the items I suggested upthread, minus the two where I heard feedback otherwise (prevent replay of query cancel packets and improve WAL replay of CREATE TABLESPACE) I have patches for documenting some behavior we won't fix in [1][2]. I was thinking of not having the next updates during commitfest, but it could also be argued this is a type of review, and the things here will be returned with feedback or rejected, in a way. Ultimately, it comes down to "when time permits". [1] https://www.postgresql.org/message-id/flat/CAFBsxsGsBZsG%3DcLM0Op5HFb2Ks6SzJrOc_eRO_jcKSNuqFRKnQ%40mail.gmail.com [2] https://www.postgresql.org/message-id/CAFBsxsEmg=kqrekxrlygy0ujcfyck4vgxzkalrwh_olfj8o...@mail.gmail.com -- John Naylor EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Move catalog toast table and index declarations
On Thu, Nov 5, 2020 at 4:24 AM Peter Eisentraut < peter.eisentr...@2ndquadrant.com> wrote: > On 2020-10-27 13:12, John Naylor wrote: > > There's nothing wrong; it's just a minor point of consistency. For the > > first part, I mean defined symbols in this file that are invisible to > > the C compiler are written > > > > #define SOMETHING() > > > > If some are written > > > > #define SOMETHING() extern int no_such_variable > > > > I imagine some future reader will wonder why there's a difference. > > The difference is that CATALOG() is followed in actual use by something > like > > { ... } FormData_pg_attribute; > > so it becomes a valid C statement. For DECLARE_INDEX() etc., we need to > do something else to make it valid. I guess this could be explained in > more detail (as I'm attempting in this email), but this isn't materially > changed by this patch. > I think we're talking past eachother. Here's a concrete example: #define BKI_ROWTYPE_OID(oid,oidmacro) #define DECLARE_TOAST(name,toastoid,indexoid) extern int no_such_variable I understand these to be functionally equivalent as far as what the C compiler sees. If not, I'd be curious to know what the difference is. I was thinking this is just a random style difference, and if so, they should be the same now that they're in the same file together: #define BKI_ROWTYPE_OID(oid,oidmacro) #define DECLARE_TOAST(name,toastoid,indexoid) And yes, this doesn't materially change the patch, it's just nitpicking :-) . Materially, I believe it's fine. -- John Naylor EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Move catalog toast table and index declarations
On Thu, Nov 5, 2020 at 2:20 PM Peter Eisentraut < peter.eisentr...@2ndquadrant.com> wrote: > On 2020-11-05 12:59, John Naylor wrote: > > I think we're talking past eachother. Here's a concrete example: > > > > #define BKI_ROWTYPE_OID(oid,oidmacro) > > #define DECLARE_TOAST(name,toastoid,indexoid) extern int no_such_variable > > > > I understand these to be functionally equivalent as far as what the C > > compiler sees. > > The issue is that you can't have a bare semicolon at the top level of a > C compilation unit, at least on some compilers. So doing > > #define FOO(stuff) /*empty*/ > > and then > > FOO(123); > > won't work. You need to fill the definition of FOO with some stuff to > make it valid. > > BKI_ROWTYPE_OID on the other hand is not used at the top level like > this, so it can be defined to empty. > Thank you for the explanation. -- John Naylor EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: speed up unicode decomposition and recomposition
There is a latent bug in the way code pairs for recomposition are sorted due to a copy-pasto on my part. Makes no difference now, but it could in the future. While looking, it seems pg_bswap.h should actually be backend-only. Both fixed in the attached. -- John Naylor EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company minor-fixes-in-unicode-norm.patch Description: Binary data
Re: document pg_settings view doesn't display custom options
On Mon, Nov 9, 2020 at 2:12 AM Fujii Masao wrote: > Pushed. Thanks! > Thank you! -- John Naylor EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: WIP: BRIN multi-range indexes
ter, the reference for this /* upper bound of number of primes below limit */ /* WIP: reference for this number */ int numprimes = 1.26 * limit / log(limit); is Rosser, J. Barkley; Schoenfeld, Lowell (1962). "Approximate formulas for some functions of prime numbers". Illinois J. Math. 6: 64–94. doi:10.1215/ijm/1255631807 More precisely, it's 30*log(113)/113 rounded up. -- John Naylor EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: WIP: BRIN multi-range indexes
On Mon, Nov 9, 2020 at 1:39 PM Tomas Vondra wrote: > > > While investigating the failures, I've tried increasing the values a > lot, without observing any measurable increase in runtime. IIRC I've > even used (10 * target_partlen) or something like that. That tells me > it's not very sensitive part of the code, so I'd suggest to simply use > something that we know is large enough to be safe. Okay, then it's not worth being clever. > For example, the largest bloom filter we can have is 32kB, i.e. 262kb, > at which point the largest gap is less than 95 (per the gap table). And > we may use up to BLOOM_MAX_NUM_PARTITIONS, so let's just use > BLOOM_MAX_NUM_PARTITIONS * 100 Sure. > FWIW I wonder if we should do something about bloom filters that we know > can get larger than page size. In the example I used, we know that > nbits=575104 is larger than page, so as the filter gets more full (and > thus more random and less compressible) it won't possibly fit. Maybe we > should reject that right away, instead of "delaying it" until later, on > the basis that it's easier to fix at CREATE INDEX time (compared to when > inserts/updates start failing at a random time). Yeah, I'd be inclined to reject that right away. > The problem with this is of course that if the index is multi-column, > this may not be strict enough (i.e. each filter would fit independently, > but the whole index row is too large). But it's probably better to do at > least something, and maybe improve that later with some whole-row check. A whole-row check would be nice, but I don't know how hard that would be. As a Devil's advocate proposal, how awful would it be to not allow multicolumn brin-bloom indexes? -- John Naylor EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Clean up optional rules in grammar
On Wed, Nov 11, 2020 at 5:13 AM Peter Eisentraut < peter.eisentr...@enterprisedb.com> wrote: > It's obviously confusing to have multiple different styles to do the > same thing. And these extra rules (including the empty ones) also end > up in the output, so they create more work down the line. > > The attached patch cleans this up to make them all look like the first > style. > +1 for standardizing in this area. It's worth noting that Bison 3.0 introduced %empty for this situation, which is self-documenting. Until we get there, this is a good step. -- John Naylor EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: cutting down the TODO list thread
On Tue, Nov 10, 2020 at 7:08 PM Bruce Momjian wrote: > On Tue, Nov 3, 2020 at 02:06:13PM -0400, John Naylor wrote: > > I was thinking of not having the next updates during commitfest, but it > could > > also be argued this is a type of review, and the things here will be > returned > > with feedback or rejected, in a way. Ultimately, it comes down to "when > time > > permits". > > I don't understand what this is referring to. Thanks for the rest of > the work. > This was awkwardly phrased, but I was concerned future proposals for removal would be easy to miss during commitfest. At this point, I'm thinking it isn't an issue. -- John Naylor EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: cutting down the TODO list thread
Here is the next section on data types, proposed to be moved to the "not worth doing" page. As before, if there are any objections, do speak up. I'll make the move in a few days. **Datatypes - Fix data types where equality comparison is not intuitive, e.g. box There is likely no way to do this without breaking applications. We already have a big warning about this in the docs. - Add IMMUTABLE column attribute Controversial *Domains (this entire section would go) - Allow functions defined as casts to domains to be called during casting - Allow values to be cast to domain types - Make domains work better with polymorphic functions Old and difficult *Date/Time - Allow TIMESTAMP WITH TIME ZONE to store the original timezone information - Have timestamp subtraction not call justify_hours() Very old - Allow a comma to denote fractional seconds in ISO-8601-compliant times (and timestamps) Apparent lack of interest *Arrays - Add function to detect if an array is empty - Improve handling of NULLs in arrays Lack of interest *Money (this entire section would go) - Add locale-aware MONEY type, and support multiple currencies - MONEY dumps in a locale-specific format making it difficult to restore to a system with a different locale The money datatype seems kind of obsolete anyway, and there doesn't seem to be demand to improve it. *Text Search - Allow dictionaries to change the token that is passed on to later dictionaries - Consider a function-based API for '@@' searches Very old - Improve text search error messages One of the gripes has been fixed already, in any case it's very old - tsearch and tsdicts regression tests fail in Turkish locale on glibc Bug report that refers to locale behavior from 2009 - Improve handling of dash and plus signs in email address user names, and perhaps improve URL parsing Difficult *XML (difficult section -- plenty of bugs which should be fixed, but also old and low interest) - Allow XML arrays to be cast to other data types Very old - Allow reliable XML operation non-UTF8 server encodings Difficult - Move XSLT from contrib/xml2 to a more reasonable location Lack of consensus - Improve the XSLT parameter passing API Lack of consensus - xpath_table needs to be implemented/implementable to get rid of contrib/xml2 - xpath_table is pretty broken anyway Unclear path forward - better handling of XPath data types - Improve handling of PIs and DTDs in xmlconcat() Zero interest - Restructure XML and /contrib/xml2 functionality As discussed in the thread, it's an unrealistically large project -- John Naylor EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: cutting down the TODO list thread
On Wed, Nov 11, 2020 at 4:45 PM John Naylor wrote: > Here is the next section on data types, proposed to be moved to the "not > worth doing" page. As before, if there are any objections, do speak up. > I'll make the move in a few days. > Hearing no objection, these have been moved over. -- John Naylor EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: cutting down the TODO list thread
Here are the next couple of sections with items proposed to be moved to the "not worth doing" page. As before, if there are any objections, let me know. I'll make the move in a few days. Also, since 13 has been released, I'll change the explanation of Done items to "will appear in the PostgreSQL 14 release." Also, since that wasn't updated, it's not immediately clear to me which release the [D] marking for "improve setting of visibility map bits for read-only and insert-only workloads" refers to. Does anyone know which commit that is? *Functions - Enforce typmod for function inputs, function results and parameters for spi_prepare'd statements called from PLs Lack of consensus - Reduce memory usage of aggregates in set returning functions The issue and proposed patch is likely no longer an important thing to improve in this area nowadays. - Fix /contrib/ltree operator Bug from 2007 with zero followup - Fix /contrib/btree_gist's implementation of inet indexing Bug from 2010 and apparent lack of interest *Character Formatting (this entire section would be removed) - Allow to_date() and to_timestamp() to accept localized month names The following attempts to pick this from the TODO list in 2008 didn't go anywhere: https://www.postgresql.org/message-id/flat/010401c86788%246f1ddb60%240a01a8c0%40gevmus https://www.postgresql.org/message-id/flat/CA%2BheTbrDQ6b0Am_mk0dJEcwNxwQz%2Br%3Daz_%3DzHTva%2B5BDnfOKjA%40mail.gmail.com - Add missing parameter handling in to_char() Very old - Throw an error from to_char() instead of printing a string of "#" when a number doesn't fit in the desired output format. Lack of consensus - Fix to_number() handling for values not matching the format string Large amount of work for questionable benefit *Multi-Language Support - Add a cares-about-collation column to pg_proc, so that unresolved-collation errors can be thrown at parse time Proposed while listing open items during 9.1. Doesn't seem to justify the amount of work it would take. - Add octet_length_server() and octet_length_client() - Make octet_length_client() the same as octet_length()? Very old - Improve UTF8 combined character handling? Too vague - Fix problems with wrong runtime encoding conversion for NLS message files What problems? There is no discussion thread. - More sensible support for Unicode combining characters, normal forms We have normalization as of PG13, so I propose to mark this Done rather than move it. -- John Naylor EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: cutting down the TODO list thread
On Wed, Nov 18, 2020 at 2:42 PM Bruce Momjian wrote: > On Wed, Nov 18, 2020 at 02:26:46PM -0400, John Naylor wrote: > > Here are the next couple of sections with items proposed to be moved to > the > > "not worth doing" page. As before, if there are any objections, let me > know. > > I'll make the move in a few days. > > > > Also, since 13 has been released, I'll change the explanation of Done > items to > > "will appear in the PostgreSQL 14 release." Also, since that wasn't > updated, > > Yes, please do that. I didn't go through this for PG 13 since you were > already working in this area. > OK, I'll do that and remove items done during or before the PG13 cycle. -- John Naylor EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: cutting down the TODO list thread
On Wed, Nov 18, 2020 at 3:05 PM Tom Lane wrote: > John Naylor writes: > > Here are the next couple of sections with items proposed to be moved to > the > > "not worth doing" page. As before, if there are any objections, let me > > know. I'll make the move in a few days. > > > - Fix /contrib/ltree operator > > Bug from 2007 with zero followup > > Actually, I believe this was fixed by 70dc4c509; at least, the > case shown in the bug report now yields "false" as expected. > I'll remove this item. > > - Fix /contrib/btree_gist's implementation of inet indexing > > Bug from 2010 and apparent lack of interest > > This one's pretty clearly a bug. Lack of interest or no, we > should keep it around. > Okay. > > - Allow to_date() and to_timestamp() to accept localized month names > > This is done too, see d67755049. > I'll remove this too. -- John Naylor EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Should we document IS [NOT] OF?
On Thu, Nov 19, 2020 at 6:43 PM Tom Lane wrote: > After digging a bit more I noticed that we'd discussed removing > IS OF in the 2007 thread, but forebore because there wasn't an easy > replacement. pg_typeof() was added a year later (b8fab2411), so we > could have done this at any point since then. > > Pushed. > Documenting or improving IS OF was a TODO, so I've removed that entry. -- John Naylor EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: cutting down the TODO list thread
With the exception of "Fix /contrib/btree_gist's implementation of inet indexing", all items above have been either moved over, or removed if they were done already by PG13. -- John Naylor EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: truncating timestamps on arbitrary intervals
On Thu, Nov 12, 2020 at 9:56 AM Peter Eisentraut < peter.eisentr...@enterprisedb.com> wrote: > > On 2020-06-30 06:34, John Naylor wrote: > > In v9, I've simplified the patch somewhat to make it easier for future > > work to build on. > > > > - When truncating on month-or-greater intervals, require the origin to > > align on month. This removes the need to handle weird corner cases > > that have no straightforward behavior. > > - Remove hackish and possibly broken code to allow origin to be after > > the input timestamp. The default origin is Jan 1, 1 AD, so only AD > > dates will behave correctly by default. This is not enforced for now, > > since it may be desirable to find a way to get this to work in a nicer > > way. > > - Rebase docs over PG13 formatting changes. > > This looks pretty solid now. Are there any more corner cases and other > areas with unclear behavior that you are aware of? Hi Peter, Thanks for taking a look! I believe there are no known corner cases aside from not throwing an error if origin > input, but I'll revisit that when we are more firm on what features we want support. > A couple of thoughts: > > - After reading the discussion a few times, I'm not so sure anymore > whether making this a cousin of date_trunc is the right way to go. As > you mentioned, there are some behaviors specific to date_trunc that > don't really make sense in date_trunc_interval, and maybe we'll have > more of those. As far as the behaviors, I'm not sure exactly what you what you were thinking of, but here are two issues off the top of my head: - If the new functions are considered variants of date_trunc(), there is the expectation that the options work the same way, in particular the timezone parameter. You asked specifically about that below, so I'll address that separately. - In the "week" case, the boundary position depends on the origin, since a week-long interval is just 7 days. > Also, date_trunc_interval isn't exactly a handy name. > Maybe something to think about. It's obviously fairly straightforward > to change it. Effectively, it puts timestamps into bins, so maybe date_bin() or something like that? > - There were various issues with the stride interval having months and > years. I'm not sure we even need that. It could be omitted unless you > are confident that your implementation is now sufficient. Months and years were a bit tricky, so I'd be happy to leave that out if there is not much demand for it. date_trunc() already has quarters, decades, centuries, and millenia. > - Also, negative intervals could be prohibited, but I suppose that > matters less. Good for the sake of completeness. I think they happen to work in v9 by accident, but it would be better not to expose that. > - I'm curious about the origin being set to 0001-01-01. This seems to > work correctly in that it sets the origin to a Monday, which is what we > wanted, but according to Google that day was a Saturday. Something to > do with Julian vs. Gregorian calendar? Right, working backwards from our calendar today, it's Monday, but at the time it would theoretically be Saturday, barring leap year miscalculations. > Maybe we should choose a date > that is a bit more recent and easier to reason with. 2001-01-01 would also be a Monday aligned with centuries and millenia, so that would be my next suggestion. If we don't care to match with date_trunc() on those larger units, we could also use 1900-01-01, so the vast majority of dates in databases are after the origin. > - Then again, I'm thinking that maybe we should make the origin > mandatory. Otherwise, the default answers when having strides larger > than a day are entirely arbitrary, e.g., > > => select date_trunc_interval('10 year', '0196-05-20 BC'::timestamp); > 0190-01-01 00:00:00 BC > > => select date_trunc_interval('10 year', '0196-05-20 AD'::timestamp); > 0191-01-01 00:00:00 Right. In the first case, the default origin is also after the input, and crosses the AD/BC boundary. Tricky to get right. > Perhaps the origin could be defaulted if the interval is less than a day > or something like that. If we didn't allow months and years to be units, it seems the default would always make sense? > - I'm wondering whether we need the date_trunc_interval(interval, > timestamptz, timezone) variant. Isn't that the same as > date_trunc_interval(foo AT ZONE xyz, value)? I based this on 600b04d6b5ef6 for date_trunc(), whose message states: date_trunc(field, timestamptz, zone_name) is the same as date_trunc(field, timestamptz at time zone zone_name) at time zone zone_name so without the shorthand, you need to specify the timezone twice, once for the calculation, and once for the output. -- John Naylor EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: WIP: BRIN multi-range indexes
On Sat, Dec 19, 2020 at 8:15 PM Tomas Vondra wrote: > [12-20 version] Hi Tomas, The measurements look good. In case it fell through the cracks, my earlier review comments for Bloom BRIN indexes regarding minor details don't seem to have been addressed in this version. I'll point to earlier discussion for convenience: https://www.postgresql.org/message-id/CACPNZCt%3Dx-fOL0CUJbjR3BFXKgcd9HMPaRUVY9cwRe58hmd8Xg%40mail.gmail.com https://www.postgresql.org/message-id/CACPNZCuqpkCGt8%3DcywAk1kPu0OoV_TjPXeV-J639ABQWyViyug%40mail.gmail.com > The improvements are fairly minor: > > 1) Rejecting bloom filters that are clearly too large (larger than page) > early. This is imperfect, as it works for individual index keys, not the > whole row. But per discussion it seems useful. I think this is good enough. > So based on this I'm tempted to just use the version with two hashes, as > implemented in 0005. It's much simpler than the partitioning scheme, > does not need any of the logic to generate primes etc. Sounds like the best engineering decision. Circling back to multi-minmax build times, I ran a couple quick tests on bigger hardware, and found that not only is multi-minmax slower than minmax, which is to be expected, but also slower than btree. (unlogged table ~12GB in size, maintenance_work_mem = 1GB, median of three runs) btree 38.3s minmax 26.2s multi-minmax 110s Since btree indexes are much larger, I imagine something algorithmic is involved. Is it worth digging further to see if some code path is taking more time than we would expect? -- John Naylor EDB: http://www.enterprisedb.com
outdated references to replication timeout
Hi, The parameter replication_timeout was retired in commit 6f60fdd701 back in 2012, but some comments and error messages seem to refer to that old setting instead of wal_sender_timeout or wal_receiver_timeout. The attached patch replaces the old language with more specific references. -- John Naylor EDB: http://www.enterprisedb.com stray-repl-timeout.patch Description: Binary data
Re: outdated references to replication timeout
On Tue, Jan 12, 2021 at 9:37 PM Fujii Masao wrote: > > Thanks for the patch! I think this change makes sense. > > - (errmsg("terminating walsender process > due to replication timeout"))); > + (errmsg("terminating walsender process > due to WAL sender timeout"))); > > Isn't it a bit strange to include different expressions "walsender" and > "WAL sender" for the same thing in one message? It is strange, now that I think about it. My thinking was that the former wording of "replication timeout" was a less literal reference to the replication_timeout parameter, so I did the same for wal_sender_timeout. A quick look shows we are not consistent in the documentation as far as walsender vs. WAL sender. For the purpose of the patch I agree it should be consistent within a single message. Maybe the parameter should be spelled exactly as is, with underscores? I'll take a broader look and send an updated patch. -- John Naylor EDB: http://www.enterprisedb.com
Re: outdated references to replication timeout
On Thu, Jan 14, 2021 at 1:55 AM Michael Paquier wrote: > > On Wed, Jan 13, 2021 at 11:28:55PM +0900, Fujii Masao wrote: > > On Wed, Jan 13, 2021 at 10:51 PM John Naylor < john.nay...@enterprisedb.com> wrote: > >> It is strange, now that I think about it. My thinking was that the > >> former wording of "replication timeout" was a less literal > >> reference to the replication_timeout parameter, so I did the same > >> for wal_sender_timeout. A quick look shows we are not consistent > >> in the documentation as far as walsender vs. WAL sender. For the > >> purpose of the patch I agree it should be consistent within a > >> single message. Maybe the parameter should be spelled exactly as > >> is, with underscores? > > > > I'm ok with that. But there seems no other timeout messages using > > the parameter name. > > Could it be that nothing needs to change here because this refers to > timeouts with the replication protocol? The current state of things > on HEAD does not sound completely incorrect to me either. It was off enough to cause brief confusion during a customer emergency. To show a bit more context around one of the proposed corrections: timeout = TimestampTzPlusMilliseconds(last_reply_timestamp, wal_sender_timeout); if (wal_sender_timeout > 0 && last_processing >= timeout) { /* * Since typically expiration of replication timeout means * communication problem, we don't send the error message to the * standby. */ ereport(COMMERROR, (errmsg("terminating walsender process due to replication timeout"))); My reading is, this is a case where the message didn't keep up with the change in parameter name. -- John Naylor EDB: http://www.enterprisedb.com
Re: truncating timestamps on arbitrary intervals
On Mon, Nov 23, 2020 at 1:44 PM John Naylor wrote: > > On Thu, Nov 12, 2020 at 9:56 AM Peter Eisentraut < peter.eisentr...@enterprisedb.com> wrote: > > - After reading the discussion a few times, I'm not so sure anymore > > whether making this a cousin of date_trunc is the right way to go. As > > you mentioned, there are some behaviors specific to date_trunc that > > don't really make sense in date_trunc_interval, and maybe we'll have > > more of those. For v10, I simplified the behavior by decoupling the behavior from date_trunc() and putting in some restrictions as discussed earlier. It's much simpler now. It could be argued that it goes too far in that direction, but it's easy to reason about and we can put back some features as we see fit. > > Also, date_trunc_interval isn't exactly a handy name. > > Maybe something to think about. It's obviously fairly straightforward > > to change it. > > Effectively, it puts timestamps into bins, so maybe date_bin() or something like that? For v10 I went with date_bin() so we can see how that looks. > > - There were various issues with the stride interval having months and > > years. I'm not sure we even need that. It could be omitted unless you > > are confident that your implementation is now sufficient. > > Months and years were a bit tricky, so I'd be happy to leave that out if there is not much demand for it. date_trunc() already has quarters, decades, centuries, and millenia. I removed months and years for this version, but that can be reconsidered of course. The logic is really simple now. > > - Also, negative intervals could be prohibited, but I suppose that > > matters less. I didn't go this far, but probably should before long. > > - Then again, I'm thinking that maybe we should make the origin > > mandatory. Otherwise, the default answers when having strides larger > > than a day are entirely arbitrary, e.g., I've tried this and like the resulting simplification. > > - I'm wondering whether we need the date_trunc_interval(interval, > > timestamptz, timezone) variant. Isn't that the same as > > date_trunc_interval(foo AT ZONE xyz, value)? > > I based this on 600b04d6b5ef6 for date_trunc(), whose message states: > > date_trunc(field, timestamptz, zone_name) > > is the same as > > date_trunc(field, timestamptz at time zone zone_name) at time zone zone_name > > so without the shorthand, you need to specify the timezone twice, once for the calculation, and once for the output. In light of making the origin mandatory, it no longer makes sense to have a time zone parameter, since we can specify the time zone on the origin; and if desired on the output as well. -- John Naylor EDB: http://www.enterprisedb.com v10-datetrunc-interval.patch Description: Binary data
Re: WIP: BRIN multi-range indexes
On Tue, Jan 12, 2021 at 1:42 PM Tomas Vondra wrote: > I suspect it'd due to minmax having to decide which "ranges" to merge, > which requires repeated sorting, etc. I certainly don't dare to claim > the current algorithm is perfect. I wouldn't have expected such big > difference, though - so definitely worth investigating. It seems that monotonically increasing (or decreasing) values in a table are a worst case scenario for multi-minmax indexes, or basically, unique values within a range. I'm guessing it's because it requires many passes to fit all the values into a limited number of ranges. I tried using smaller pages_per_range numbers, 32 and 8, and that didn't help. Now, with a different data distribution, using only 10 values that repeat over and over, the results are much more sympathetic to multi-minmax: insert into iot (num, create_dt) select random(), '2020-01-01 0:00'::timestamptz + (x % 10 || ' seconds')::interval from generate_series(1,5*365*24*60*60) x; create index cd_single on iot using brin(create_dt); 27.2s create index cd_multi on iot using brin(create_dt timestamptz_minmax_multi_ops); 30.4s create index cd_bt on iot using btree(create_dt); 61.8s Circling back to the monotonic case, I tried running a simple perf record on a backend creating a multi-minmax index on a timestamptz column and these were the highest non-kernel calls: + 21.98%21.91% postgres postgres[.] FunctionCall2Coll +9.31% 9.29% postgres postgres[.] compare_combine_ranges +8.60% 8.58% postgres postgres[.] qsort_arg +5.68% 5.66% postgres postgres[.] brin_minmax_multi_add_value +5.63% 5.60% postgres postgres[.] timestamp_lt +4.73% 4.71% postgres postgres[.] reduce_combine_ranges +3.80% 0.00% postgres [unknown] [.] 0x032001680004 +3.51% 3.50% postgres postgres[.] timestamp_eq There's no one place that's pathological enough to explain the 4x slowness over traditional BRIN and nearly 3x slowness over btree when using a large number of unique values per range, so making progress here would have to involve a more holistic approach. -- John Naylor EDB: http://www.enterprisedb.com
Re: WIP: BRIN multi-range indexes
On Thu, Jan 21, 2021 at 9:06 PM Tomas Vondra wrote: > [wip optimizations] > The pathological case (monotonic-asc) is now 4x faster, roughly equal to > regular minmax index build. The opposite (monotonic-desc) is about 33% > faster, roughly in line with btree. Those numbers look good. I get similar results, shown below. I've read 0007-8 briefly but not in depth. > There are a couple cases where it's actually a bit slower - those are > the cases with very few distinct values per range. I believe this > happens because in the batch mode the code does not check if the summary > already contains this value, adds it to the buffer and the last step > ends up being more expensive than this. I think if it's worst case a bit faster than btree and best case a bit slower than traditional minmax, that's acceptable. > I believe there's some "compromise" between those two extremes, i.e. we > should use buffer that is too small or too large, but something in > between, so that the reduction happens once in a while but not too often > (as with the original aggressive approach). This sounds good also. > FWIW, none of this is likely to be an issue in practice, because (a) > tables usually don't have such strictly monotonic patterns, (b) people > should stick to plain minmax for cases that do. Still, it would be great if multi-minmax can be a drop in replacement. I know there was a sticking point of a distance function not being available on all types, but I wonder if that can be remedied or worked around somehow. And (c) regular tables > tend to have much wider rows, so there are fewer values per range (so > that other stuff is likely more expensive than building BRIN). True. I'm still puzzled that it didn't help to use 8 pages per range, but it's moot now. Here are some numbers (median of 3) with a similar scenario as before, repeated here with some added details. I didn't bother with what you call "unpatched": btree minmax multi monotonic-asc 44.426.5 27.8 mono-del-ins 38.724.6 30.4 mono-10-asc61.825.6 33.5 create unlogged table iot ( id bigint generated by default as identity primary key, num double precision not null, create_dt timestamptz not null, stuff text generated always as (md5(id::text)) stored ) with (fillfactor = 95); -- monotonic-asc: insert into iot (num, create_dt) select random(), x from generate_series( '2020-01-01 0:00'::timestamptz, '2020-01-01 0:00'::timestamptz +'5 years'::interval, '1 second'::interval) x; -- mono-del-ins: -- Here I deleted a few values from (likely) each page in the above table, and reinserted values that shouldn't be in existing ranges: delete from iot1 where num < 0.05 or num > 0.95; vacuum iot1; insert into iot (num, create_dt) select random(), x from generate_series( '2020-01-01 0:00'::timestamptz, '2020-02-01 23:59'::timestamptz, '1 second'::interval) x; -- mono-10-asc truncate table iot; insert into iot (num, create_dt) select random(), '2020-01-01 0:00'::timestamptz + (x % 10 || ' seconds')::interval from generate_series(1,5*365*24*60*60) x; -- John Naylor EDB: http://www.enterprisedb.com
Re: WIP: BRIN multi-range indexes
Hi Tomas, I took another look through the Bloom opclass portion (0004) with sorted_mode omitted, and it looks good to me code-wise. I think this part is close to commit-ready. I also did one more proofreading pass for minor details. + rows per block). The default values is -0.1, and + greater than 0.0 and smaller than 1.0. The default values is 0.01, s/values/value/ + * bloom filter, we can easily and cheaply test wheter it contains values s/wheter/whether/ + * XXX We assume the bloom filters have the same parameters fow now. In the s/fow/for/ + * or null if it does not exists. s/exists/exist/ + * We do expect the bloom filter to eventually switch to hashing mode, + * and it's bound to be almost perfectly random, so not compressible. Leftover from when it started out in sorted mode. + if ((m/8) > BLCKSZ) It seems we need something more safe, to account for page header and tuple header at least. As the comment before says, the filter will eventually not be compressible. I remember we can't be exact here, with the possibility of multiple columns, but we can leave a little slack space. -- John Naylor EDB: http://www.enterprisedb.com
Re: WIP: BRIN multi-range indexes
On Fri, Jan 22, 2021 at 10:59 PM Tomas Vondra wrote: > > > On 1/23/21 12:27 AM, John Naylor wrote: > > Still, it would be great if multi-minmax can be a drop in replacement. I > > know there was a sticking point of a distance function not being > > available on all types, but I wonder if that can be remedied or worked > > around somehow. > > > > Hmm. I think Alvaro also mentioned he'd like to use this as a drop-in > replacement for minmax (essentially, using these opclasses as the > default ones, with the option to switch back to plain minmax). I'm not > convinced we should do that - though. Imagine you have minmax indexes in > your existing DB, it's working perfectly fine, and then we come and just > silently change that during dump/restore. Is there some past example > when we did something similar and it turned it to be OK? I was assuming pg_dump can be taught to insert explicit opclasses for minmax indexes, so that upgrade would not cause surprises. If that's true, only new indexes would have the different default opclass. > As for the distance functions, I'm pretty sure there are data types > without "natural" distance - like most strings, for example. We could > probably invent something, but the question is how much we can rely on > it working well enough in practice. > > Of course, is minmax even the right index type for such data types? > Strings are usually "labels" and not queried using range queries, > although sometimes people encode stuff as strings (but then it's very > unlikely we'll define the distance definition well). So maybe for those > types a hash / bloom would be a better fit anyway. Right. > But I do have an idea - maybe we can do without distances, in those > cases. Essentially, the primary issue of minmax indexes are outliers, so > what if we simply sort the values, keep one range in the middle and as > many single points on each tail? That's an interesting idea. I think it would be a nice bonus to try to do something along these lines. On the other hand, I'm not the one volunteering to do the work, and the patch is useful as is. -- John Naylor EDB: http://www.enterprisedb.com
Re: Perform COPY FROM encoding conversions in larger chunks
Hi Heikki, 0001 through 0003 are straightforward, and I think they can be committed now if you like. 0004 is also pretty straightforward. The check you proposed upthread for pg_upgrade seems like the best solution to make that workable. I'll take a look at 0005 soon. I measured the conversions that were rewritten in 0003, and there is indeed a noticeable speedup: Big5 to EUC-TW: head196ms 0001-3 152ms EUC-TW to Big5: head190ms 0001-3 144ms I've attached the driver function for reference. Example use: select drive_conversion( 1000, 'euc_tw'::name, 'big5'::name, convert('a few kB of utf8 text here', 'utf8', 'euc_tw') ); I took a look at the test suite also, and the only thing to note is a couple places where the comment doesn't match the code: + -- JIS X 0201: 2-byte encoded chars starting with 0x8e (SS2) + byte1 = hex('0e'); + for byte2 in hex('a1')..hex('df') loop +return next b(byte1, byte2); + end loop; + + -- JIS X 0212: 3-byte encoded chars, starting with 0x8f (SS3) + byte1 = hex('0f'); + for byte2 in hex('a1')..hex('fe') loop +for byte3 in hex('a1')..hex('fe') loop + return next b(byte1, byte2, byte3); +end loop; + end loop; Not sure if it matters , but thought I'd mention it anyway. -- John Naylor EDB: http://www.enterprisedb.com drive_conversion.c Description: Binary data
Re: Perform COPY FROM encoding conversions in larger chunks
On Thu, Jan 28, 2021 at 7:36 AM Heikki Linnakangas wrote: > > Even more surprising was that the second patch > (0002-Replace-pg_utf8_verifystr-with-a-faster-implementati.patch) > actually made things worse again. I thought it would give a modest gain, > but nope. Hmm, that surprised me too. > Based on these results, I'm going to commit the first patch, but not the > second one. There are much faster UTF-8 verification routines out there, > using SIMD instructions and whatnot, and we should consider adopting one > of those, but that's future work. I have something in mind for that. I took a look at v2, and for the first encoding I tried, it fails to report the error for invalid input: create database euctest WITH ENCODING 'EUC_CN' LC_COLLATE='zh_CN.eucCN' LC_CTYPE='zh_CN.eucCN' TEMPLATE=template0; \c euctest create table foo (a text); master: euctest=# copy foo from stdin; Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself, or an EOF signal. >> ä >> \. ERROR: character with byte sequence 0xc3 0xa4 in encoding "UTF8" has no equivalent in encoding "EUC_CN" CONTEXT: COPY foo, line 1 patch: euctest=# copy foo from stdin; Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself, or an EOF signal. >> ä >> \. COPY 0 euctest=# I believe the problem is in UtfToLocal(). I've attached a fix formatted as a text file to avoid confusing the cfbot. The fix keeps the debugging ereport() in case you find it useful. Some additional test coverage might be good here, but not sure how much work that would be. I didn't check any other conversions yet. v2-0002 seems fine to me, I just have cosmetic comments here: + * the same, no conversion is required by we must still validate that the s/by/but/ This comment in copyfrom_internal.h above the *StateData struct is the same as the corresponding one in copyto.c: * Multi-byte encodings: all supported client-side encodings encode multi-byte * characters by having the first byte's high bit set. Subsequent bytes of the * character can have the high bit not set. When scanning data in such an * encoding to look for a match to a single-byte (ie ASCII) character, we must * use the full pg_encoding_mblen() machinery to skip over multibyte * characters, else we might find a false match to a trailing byte. In * supported server encodings, there is no possibility of a false match, and * it's faster to make useless comparisons to trailing bytes than it is to * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true * when we have to do it the hard way. The references to pg_encoding_mblen() and encoding_embeds_ascii, are out of date for copy-from. I'm not sure the rest is relevant to copy-from anymore, either. Can you confirm? This comment inside the struct is now out of date as well: * Similarly, line_buf holds the whole input line being processed. The * input cycle is first to read the whole line into line_buf, convert it * to server encoding there, and then extract the individual attribute HEAD has this macro already: /* Shorthand for number of unconsumed bytes available in raw_buf */ #define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index) It might make sense to create a CONVERSION_BUF_BYTES equivalent since the patch calculates cstate->conversion_buf_len - cstate->conversion_buf_index in a couple places. -- John Naylor EDB: http://www.enterprisedb.com diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c index 60dfebb0bd..1681efc7a0 100644 --- a/src/backend/commands/copyfromparse.c +++ b/src/backend/commands/copyfromparse.c @@ -397,6 +397,9 @@ CopyConvertBuf(CopyFromState cstate) (unsigned char *) cstate->raw_buf + cstate->raw_buf_len, dstlen, true); + +ereport(NOTICE, errmsg_internal("convertedbytes: %d", convertedbytes)); + if (convertedbytes == 0) { /* diff --git a/src/backend/utils/mb/conv.c b/src/backend/utils/mb/conv.c index b83358bc7a..26600e68ee 100644 --- a/src/backend/utils/mb/conv.c +++ b/src/backend/utils/mb/conv.c @@ -497,7 +497,7 @@ UtfToLocal(const unsigned char *utf, int len, int l; const pg_utf_to_local_combined *cp; const unsigned char *start = utf; - const unsigned char *cur = utf; + unsigned char *cur = unconstify(unsigned char *, utf); if (!PG_VALID_ENCODING(encoding)) ereport(ERROR, @@ -5
Re: WIP: BRIN multi-range indexes
On Tue, Jan 26, 2021 at 6:59 PM Tomas Vondra wrote: > > > > On 1/26/21 7:52 PM, John Naylor wrote: > > On Fri, Jan 22, 2021 at 10:59 PM Tomas Vondra > > mailto:tomas.von...@enterprisedb.com>> > > wrote: > > > Hmm. I think Alvaro also mentioned he'd like to use this as a drop-in > > > replacement for minmax (essentially, using these opclasses as the > > > default ones, with the option to switch back to plain minmax). I'm not > > > convinced we should do that - though. Imagine you have minmax indexes in > > > your existing DB, it's working perfectly fine, and then we come and just > > > silently change that during dump/restore. Is there some past example > > > when we did something similar and it turned it to be OK? > > > > I was assuming pg_dump can be taught to insert explicit opclasses for > > minmax indexes, so that upgrade would not cause surprises. If that's > > true, only new indexes would have the different default opclass. > > > > Maybe, I suppose we could do that. But I always found such changes > happening silently in the background a bit suspicious, because it may be > quite confusing. I certainly wouldn't expect such difference between > creating a new index and index created by dump/restore. Did we do such > changes in the past? That might be a precedent, but I don't recall any > example ... I couldn't think of a comparable example either. It comes down to evaluating risk. On the one hand it's nice if users get an enhancement without having to know about it, on the other hand if there is some kind of noticeable regression, that's bad. -- John Naylor EDB: http://www.enterprisedb.com
[POC] verifying UTF-8 using SIMD instructions
Hi, As of b80e10638e3, there is a new API for validating the encoding of strings, and one of the side effects is that we have a wider choice of algorithms. For UTF-8, it has been demonstrated that SIMD is much faster at decoding [1] and validation [2] than the standard approach we use. It makes sense to start with the ascii subset of UTF-8 for a couple reasons. First, ascii is very widespread in database content, particularly in bulk loads. Second, ascii can be validated using the simple SSE2 intrinsics that come with (I believe) any x64-64 chip, and I'm guessing we can detect that at compile time and not mess with runtime checks. The examples above using SSE for the general case are much more complicated and involve SSE 4.2 or AVX. Here are some numbers on my laptop (MacOS/clang 10 -- if the concept is okay, I'll do Linux/gcc and add more inputs). The test is the same as Heikki shared in [3], but I added a case with >95% Chinese characters just to show how that compares to the mixed ascii/multibyte case. master: chinese | mixed | ascii -+---+--- 1081 | 761 | 366 patch: chinese | mixed | ascii -+---+--- 1103 | 498 |51 The speedup in the pure ascii case is nice. In the attached POC, I just have a pro forma portability stub, and left full portability detection for later. The fast path is inlined inside pg_utf8_verifystr(). I imagine the ascii fast path could be abstracted into a separate function to which is passed a function pointer for full encoding validation. That would allow other encodings with strict ascii subsets to use this as well, but coding that abstraction might be a little messy, and b80e10638e3 already gives a performance boost over PG13. I also gave a shot at doing full UTF-8 recognition using a DFA, but so far that has made performance worse. If I ever have more success with that, I'll add that in the mix. [1] https://woboq.com/blog/utf-8-processing-using-simd.html [2] https://lemire.me/blog/2020/10/20/ridiculously-fast-unicode-utf-8-validation/ [3] https://www.postgresql.org/message-id/06d45421-61b8-86dd-e765-f1ce527a5...@iki.fi -- John Naylor EDB: http://www.enterprisedb.com diff --git a/src/common/wchar.c b/src/common/wchar.c index 6e7d731e02..12b3a5e1a2 100644 --- a/src/common/wchar.c +++ b/src/common/wchar.c @@ -13,6 +13,10 @@ #include "c.h" #include "mb/pg_wchar.h" +#include "port/pg_bitutils.h" + +/* FIXME -- should go in src/include/port */ +#include /* @@ -1762,6 +1766,80 @@ pg_utf8_verifystr(const unsigned char *s, int len) { const unsigned char *start = s; +#ifdef __x86_64__ + + + const __m128i zero = _mm_setzero_si128(); + __m128i chunk, + cmp; + + const int chunk_size = sizeof(__m128i); + intzero_mask, + highbit_mask, + ascii_count, + remainder; + + while (len >= chunk_size) + { + /* load next chunk */ + chunk = _mm_loadu_si128((const __m128i *) s); + + /* first detect any zero bytes */ + cmp = _mm_cmpeq_epi8(chunk, zero); + zero_mask = _mm_movemask_epi8(cmp); + + /* if there is a zero byte, let the slow path encounter it */ + if (zero_mask) + break; + + /* now check for non-ascii bytes */ + highbit_mask = _mm_movemask_epi8(chunk); + + if (!highbit_mask) + { + /* all ascii, so advance to the next chunk */ + s += chunk_size; + len -= chunk_size; + continue; + } + + /* + * if not all ascii, maybe there is a solid block of ascii + * at the beginning of the chunk. if so, skip it + */ + ascii_count = pg_rightmost_one_pos32(highbit_mask); + + s += ascii_count; + len -= ascii_count; + remainder = chunk_size - ascii_count; + + /* found non-ascii, so handle the remainder in the normal way */ + while (remainder > 0) + { + int l; + + /* + * fast path for ASCII-subset characters + * we already know they're non-zero + */ + if (!IS_HIGHBIT_SET(*s)) +l = 1; + else + { +l = pg_utf8_verifychar(s, len); +if (l == -1) + goto finish; + } + s += l; + len -= l; + remainder -= l; + + } + } + +#endif /* __x86_64__ */ + + /* handle last few bytes */ while (len > 0) { int l; @@ -1770,19 +1848,20 @@ pg_utf8_verifystr(const unsigned char *s, int len) if (!IS_HIGHBIT_SET(*s)) { if (*s == '\0') -break; +goto finish; l = 1; } else { l = pg_utf8_verifychar(s, len); if (l == -1) -break; +goto finish; } s += l; len -= l; } +finish: return s - start; }
Re: Perform COPY FROM encoding conversions in larger chunks
On Mon, Feb 1, 2021 at 12:15 PM Heikki Linnakangas wrote: > Thanks. I fixed it slightly differently, and also changed LocalToUtf() > to follow the same pattern, even though LocalToUtf() did not have the > same bug. Looks good to me. > I added a bunch of tests for various built-in conversions. Nice! I would like to have utf8 tests for every category of invalid byte (overlong, surrogate, 5 bytes, etc), but it's not necessary for this patch. > I spent some time refactoring and adding comments all around the patch, > hopefully making it all more clear. One notable difference is that I > renamed 'raw_buf' (which exists in master too) to 'input_buf', and > renamed 'conversion_buf' to 'raw_buf'. I'm going to read through this > patch again another day with fresh eyes, and also try to add some tests > for the corner cases at buffer boundaries. The comments and renaming are really helpful in understanding that file! Although a new patch is likely forthcoming, I did take a brief look and found the following: In copyfromparse.c, this is now out of date: * Read the next input line and stash it in line_buf, with conversion to * server encoding. One of your FIXME comments seems to allude to this, but if we really need a difference here, maybe it should be explained: +#define INPUT_BUF_SIZE 65536 /* we palloc INPUT_BUF_SIZE+1 bytes */ +#define RAW_BUF_SIZE 65536 /* allocated size of the buffer */ Lastly, it looks like pg_do_encoding_conversion_buf() ended up in 0003 accidentally? -- John Naylor EDB: http://www.enterprisedb.com
Re: Bug in COPY FROM backslash escaping multi-byte chars
On Wed, Feb 3, 2021 at 8:08 AM Heikki Linnakangas wrote: > > Hi, > > While playing with COPY FROM refactorings in another thread, I noticed > corner case where I think backslash escaping doesn't work correctly. > Consider the following input: > > \么.foo I've seen multibyte delimiters in the wild, so it's not as outlandish as it seems. The fix is simple enough, so +1. -- John Naylor EDB: http://www.enterprisedb.com
get rid of tags in the docs?
Hi, While looking at the proposed removal of the v2 protocol, I noticed that we italicize some, but not all, instances of 'per se', 'pro forma', and 'ad hoc'. I'd say these are widespread enough in formal registers of English that they hardly need to be called out as foreign, so I propose removing the tags for those words. Alternatively, we could just add tags to make existing usage consistent, but I have little reason to think it will stay that way. It's also impractical to go and search for other possible words that should have been italicized but weren't. The other case is 'voilà', found in rules.sgml. The case for italics here is stronger, but looking at that file, I actually think a more generic-sounding phrase here would be preferable. Other opinions? -- John Naylor EDB: http://www.enterprisedb.com
Re: [POC] verifying UTF-8 using SIMD instructions
On Mon, Feb 1, 2021 at 2:01 PM Heikki Linnakangas wrote: > > On 01/02/2021 19:32, John Naylor wrote: > > It makes sense to start with the ascii subset of UTF-8 for a couple > > reasons. First, ascii is very widespread in database content, > > particularly in bulk loads. Second, ascii can be validated using the > > simple SSE2 intrinsics that come with (I believe) any x64-64 chip, and > > I'm guessing we can detect that at compile time and not mess with > > runtime checks. The examples above using SSE for the general case are > > much more complicated and involve SSE 4.2 or AVX. > > I wonder how using SSE compares with dealing with 64 or 32-bit words at > a time, using regular instructions? That would be more portable. I gave that a shot, and it's actually pretty good. According to this paper, [1], 16 bytes was best and gives a good apples-to-apples comparison to SSE registers, so I tried both 16 and 8 bytes. > All supported encodings are ASCII subsets. Might be best to putt the > ASCII-check into a static inline function and use it in all the verify > functions. I presume it's only a few instructions, and these functions > can be pretty performance sensitive. I tried both the static inline function and also putting the whole optimized utf-8 loop in a separate function to which the caller passes a pointer to the appropriate pg_*_verifychar(). In the table below, "inline" refers to coding directly inside pg_utf8_verifystr(). Both C and SSE are in the same patch, with an #ifdef. I didn't bother splitting them out because for other encodings, we want one of the other approaches above. For those, "C retail" refers to a static inline function to code the contents of the inner loop, if I understood your suggestion correctly. This needs more boilerplate in each function, so I don't prefer this. "C func pointer" refers to the pointer approach I just mentioned. That is the cleanest looking way to generalize it, so I only tested that version with different strides -- 8- and 16-bytes This is the same test I used earlier, which is the test in [2] but adding an almost-pure multibyte Chinese text of about the same size. x64-64 Linux gcc 8.4.0: build | chinese | mixed | ascii --+-+---+--- master |1480 | 848 | 428 inline SSE |1617 | 634 |63 inline C |1481 | 843 |50 C retail |1493 | 838 |49 C func pointer |1467 | 851 |49 C func pointer 8 |1518 | 757 |56 x64-64 MacOS clang 10.0.0: build | chinese | mixed | ascii --+-+---+--- master |1086 | 760 | 374 inline SSE |1081 | 529 |70 inline C |1093 | 649 |49 C retail |1132 | 695 | 152 C func pointer |1085 | 609 |59 C func pointer 8 |1099 | 571 |71 PowerPC-LE Linux gcc 4.8.5: build | chinese | mixed | ascii --+-+---+--- master |2961 | 1525 | 871 inline SSE | (n/a) | (n/a) | (n/a) inline C |2911 | 1329 |80 C retail |2838 | 1311 | 102 C func pointer |2828 | 1314 |80 C func pointer 8 |3143 | 1249 | 133 Looking at the results, the main advantage of SSE here is it's more robust for mixed inputs. If a 16-byte chunk is not ascii-only but contains a block of ascii at the front, we can skip those with a single CPU instruction, but in C, we have to verify the whole chunk using the slow path. The "C func pointer approach" seems to win out over the "C retail" approach (static inline function). Using an 8-byte stride is slightly better for mixed inputs on all platforms tested, but regresses on pure ascii and also seems to regress on pure multibyte. The difference in the multibyte caes is small enough that it could be random, but it happens on two platforms, so I'd say it's real. On the other hand, pure multibyte is not as common as mixed text. Overall, I think the function pointer approach with an 8-byte stride is the best balance. If that's agreeable, next I plan to test with short inputs, because I think we'll want a guard if-statement to only loop through the fast path if the string is long enough to justify that. > > I also gave a shot at doing full UTF-8 recognition using a DFA, but so > > far that has made performance worse. If I ever have more success with > > that, I'll add that in the mix. > > That's disappointing. Perhaps the SIMD algorithms have higher startup > costs, so that you need longer inputs to benefit? In that case, it might > make sense to check the length of the input and only use the SIMD > algorithm if the input is long enough. I changed topics a
Re: [POC] verifying UTF-8 using SIMD instructions
Here is a more polished version of the function pointer approach, now adapted to all multibyte encodings. Using the not-yet-committed tests from [1], I found a thinko bug that resulted in the test for nul bytes to not only be wrong, but probably also elided by the compiler. Doing it correctly is noticeably slower on pure ascii, but still several times faster than before, so the conclusions haven't changed any. I'll run full measurements later this week, but I'll share the patch now for review. [1] https://www.postgresql.org/message-id/11d39e63-b80a-5f8d-8043-fff04201f...@iki.fi -- John Naylor EDB: http://www.enterprisedb.com v1-0001-Add-an-ASCII-fast-path-to-multibyte-encoding-veri.patch Description: Binary data
Re: [POC] verifying UTF-8 using SIMD instructions
On Mon, Feb 8, 2021 at 6:17 AM Heikki Linnakangas wrote: > As a quick test, I hacked up pg_utf8_verifystr() to use Lemire's > algorithm from the simdjson library [1], see attached patch. I > microbenchmarked it using the the same test I used before [2]. I've been looking at various iterations of Lemire's utf8 code, and trying it out was next on my list, so thanks for doing that! > These results are with "gcc -O2" using "gcc (Debian 10.2.1-6) 10.2.1 > 20210110" > > unpatched master: > > postgres=# \i mbverifystr-speed.sql > CREATE FUNCTION > mixed | ascii > ---+--- > 728 | 393 > (1 row) > > v1-0001-Add-an-ASCII-fast-path-to-multibyte-encoding-veri.patch: > > mixed | ascii > ---+--- > 759 |98 > (1 row) Hmm, the mixed case got worse -- I haven't seen that in any of my tests. > simdjson-utf8-hack.patch: > > mixed | ascii > ---+--- > 53 |31 > (1 row) > > So clearly that algorithm is fast. Not sure if it has a high startup > cost, or large code size, or other tradeoffs that we don't want. The simdjson lib uses everything up through AVX512 depending on what hardware is available. I seem to remember reading that high start-up cost is more relevant to floating point than to integer ops, but I could be wrong. Just the utf8 portion is surely tiny also. > At > least it depends on SIMD instructions, so it requires more code for the > architecture-specific implementations and autoconf logic and all that. One of his earlier demos [1] (in simdutf8check.h) had a version that used mostly SSE2 with just three intrinsics from SSSE3. That's widely available by now. He measured that at 0.7 cycles per byte, which is still good compared to AVX2 0.45 cycles per byte [2]. Testing for three SSSE3 intrinsics in autoconf is pretty easy. I would assume that if that check (and the corresponding runtime check) passes, we can assume SSE2. That code has three licenses to choose from -- Apache 2, Boost, and MIT. Something like that might be straightforward to start from. I think the only obstacles to worry about are license and getting it to fit into our codebase. Adding more than zero high-level comments with a good description of how it works in detail is also a bit of a challenge. > I also tested the fallback implementation from the simdjson library > (included in the patch, if you uncomment it in simdjson-glue.c): > > mixed | ascii > ---+--- > 447 |46 > (1 row) > > I think we should at least try to adopt that. At a high level, it looks > pretty similar your patch: you load the data 8 bytes at a time, check if > there are all ASCII. If there are any non-ASCII chars, you check the > bytes one by one, otherwise you load the next 8 bytes. Your patch should > be able to achieve the same performance, if done right. I don't think > the simdjson code forbids \0 bytes, so that will add a few cycles, but > still. Okay, I'll look into that. > PS. Your patch as it stands isn't safe on systems with strict alignment, > the string passed to the verify function isn't guaranteed to be 8 bytes > aligned. Use memcpy to fetch the next 8-byte chunk to fix. Will do. [1] https://github.com/lemire/fastvalidate-utf-8/tree/master/include [2] https://lemire.me/blog/2018/10/19/validating-utf-8-bytes-using-only-0-45-cycles-per-byte-avx-edition/ -- John Naylor EDB: http://www.enterprisedb.com
Re: Perform COPY FROM encoding conversions in larger chunks
On Sun, Feb 7, 2021 at 2:13 PM Heikki Linnakangas wrote: > > On 02/02/2021 23:42, John Naylor wrote: > > > > In copyfromparse.c, this is now out of date: > > > > * Read the next input line and stash it in line_buf, with conversion to > > * server encoding. This comment for CopyReadLine() is still there. Conversion already happened by now, so I think this comment is outdated. Other than that, I think this is ready for commit. -- John Naylor EDB: http://www.enterprisedb.com
Re: WIP: BRIN multi-range indexes
On Wed, Feb 3, 2021 at 7:54 PM Tomas Vondra wrote: > > [v-20210203] Hi Tomas, I have some random comments from reading the patch, but haven't gone into detail in the newer aspects. I'll do so in the near future. The cfbot seems to crash on this patch during make check, but it doesn't crash for me. I'm not even sure what date that cfbot status is from. > BLOOM > - Looks good, but make sure you change the commit message -- it still refers to sorted mode. + * not entirely clear how to distrubute the space between those columns. s/distrubute/distribute/ > MINMAX-MULTI > > c) 0007 - A hybrid approach, using a buffer that is multiple of the > user-specified value, with some safety min/max limits. IMO this is what > we should use, although perhaps with some tuning of the exact limits. That seems like a good approach. +#include "access/hash.h" /* XXX strange that it fails because of BRIN_AM_OID without this */ I think you want #include "catalog/pg_am.h" here. > Attached is a spreadsheet with benchmark results for each of those three > approaches, on different data types (byval/byref), data set types, index > parameters (pages/values per range) etc. I think 0007 is a reasonable > compromise overall, with performance somewhere in betwen 0005 and 0006. > Of course, there are cases where it's somewhat slow, e.g. for data types > with expensive comparisons and data sets forcing frequent compactions, > in which case it's ~10x slower compared to regular minmax (in most cases > it's ~1.5x). Compared to btree, it's usually much faster - ~2-3x as fast > (except for some extreme cases, of course). > > > As for the opclasses for indexes without "natural" distance function, > implemented in 0008, I think we should drop it. In theory it works, but Sounds reasonable. > The other thing we were considering was using the new multi-minmax > opclasses as default ones, replacing the existing minmax ones. IMHO we > shouldn't do that either. For existing minmax indexes that's useless > (the opclass seems to be working, otherwise the index would be dropped). > But even for new indexes I'm not sure it's the right thing, so I don't > plan to change this. Okay. -- John Naylor EDB: http://www.enterprisedb.com
Re: [POC] verifying UTF-8 using SIMD instructions
On Mon, Feb 8, 2021 at 6:17 AM Heikki Linnakangas wrote: > > I also tested the fallback implementation from the simdjson library > (included in the patch, if you uncomment it in simdjson-glue.c): > > mixed | ascii > ---+--- > 447 |46 > (1 row) > > I think we should at least try to adopt that. At a high level, it looks > pretty similar your patch: you load the data 8 bytes at a time, check if > there are all ASCII. If there are any non-ASCII chars, you check the > bytes one by one, otherwise you load the next 8 bytes. Your patch should > be able to achieve the same performance, if done right. I don't think > the simdjson code forbids \0 bytes, so that will add a few cycles, but > still. That fallback is very similar to my "inline C" case upthread, and they both actually check 16 bytes at a time (the comment is wrong in the patch you shared). I can work back and show how the performance changes with each difference (just MacOS, clang 10 here): master mixed | ascii ---+--- 757 | 366 v1, but using memcpy() mixed | ascii ---+--- 601 | 129 remove zero-byte check: mixed | ascii ---+--- 588 |93 inline ascii fastpath into pg_utf8_verifystr() mixed | ascii ---+--- 595 |71 use 16-byte stride mixed | ascii ---+--- 652 |49 With this cpu/compiler, v1 is fastest on the mixed input all else being equal. Maybe there's a smarter way to check for zeros in C. Or maybe be more careful about cache -- running memchr() on the whole input first might not be the best thing to do. -- John Naylor EDB: http://www.enterprisedb.com
Re: [POC] verifying UTF-8 using SIMD instructions
I wrote: > > On Mon, Feb 8, 2021 at 6:17 AM Heikki Linnakangas wrote: > One of his earlier demos [1] (in simdutf8check.h) had a version that used mostly SSE2 with just three intrinsics from SSSE3. That's widely available by now. He measured that at 0.7 cycles per byte, which is still good compared to AVX2 0.45 cycles per byte [2]. > > Testing for three SSSE3 intrinsics in autoconf is pretty easy. I would assume that if that check (and the corresponding runtime check) passes, we can assume SSE2. That code has three licenses to choose from -- Apache 2, Boost, and MIT. Something like that might be straightforward to start from. I think the only obstacles to worry about are license and getting it to fit into our codebase. Adding more than zero high-level comments with a good description of how it works in detail is also a bit of a challenge. I double checked, and it's actually two SSSE3 intrinsics and one SSE4.1, but the 4.1 one can be emulated with a few SSE2 intrinsics. But we could probably fold all three into the SSE4.2 CRC check and have a single symbol to save on boilerplate. I hacked that demo [1] into wchar.c (very ugly patch attached), and got the following: master mixed | ascii ---+--- 757 | 366 Lemire demo: mixed | ascii ---+--- 172 | 168 This one lacks an ascii fast path, but the AVX2 version in the same file has one that could probably be easily adapted. With that, I think this would be worth adapting to our codebase and license. Thoughts? The advantage of this demo is that it's not buried in a mountain of modern C++. Simdjson can use AVX -- do you happen to know which target it got compiled to? AVX vectors are 256-bits wide and that requires OS support. The OS's we care most about were updated 8-12 years ago, but that would still be something to check, in addition to more configure checks. [1] https://github.com/lemire/fastvalidate-utf-8/tree/master/include -- John Naylor EDB: http://www.enterprisedb.com utf-sse42-demo.patch Description: Binary data
Re: [POC] verifying UTF-8 using SIMD instructions
On Tue, Feb 9, 2021 at 4:22 PM Heikki Linnakangas wrote: > > On 09/02/2021 22:08, John Naylor wrote: > > Maybe there's a smarter way to check for zeros in C. Or maybe be more > > careful about cache -- running memchr() on the whole input first might > > not be the best thing to do. > > The usual trick is the haszero() macro here: > https://graphics.stanford.edu/~seander/bithacks.html#ZeroInWord. That's > how memchr() is typically implemented, too. Thanks for that. Checking with that macro each loop iteration gives a small boost: v1, but using memcpy() mixed | ascii ---+--- 601 | 129 with haszero() mixed | ascii ---+--- 583 | 105 remove zero-byte check: mixed | ascii ---+--- 588 |93 -- John Naylor EDB: http://www.enterprisedb.com
Re: cutting down the TODO list thread
It's been a while, but here are a few more suggested removals/edits/additions to the TODO list. Any objections or new information, let me know: - Auto-fill the free space map by scanning the buffer cache or by checking pages written by the background writer http://archives.postgresql.org/pgsql-hackers/2006-02/msg01125.php https://www.postgresql.org/message-id/200603011716.16984.pete...@gmx.net Both these threads are from 2006, so have nothing to do with the current FSM. - Allow concurrent inserts to use recently created pages rather than creating new ones http://archives.postgresql.org/pgsql-hackers/2010-05/msg00853.php Skimming the first few messages, I believe this has been covered by commit 719c84c1b? (Extend relations multiple blocks at a time to improve scalability.) - Allow VACUUM FULL and CLUSTER to update the visibility map This topic has a current CF entry which seems to have stalled, so that newer thread would be better to list here than the one from 2013. - Bias FSM towards returning free space near the beginning of the heap file, in hopes that empty pages at the end can be truncated by VACUUM http://archives.postgresql.org/pgsql-hackers/2009-09/msg01124.php https://www.postgresql.org/message-id/20150424190403.gp4...@alvh.no-ip.org I'm not sure what to think of this, but independently of that, the second thread is actually talking about bringing back something like the pre-9.0 vacuum full, so maybe it should be its own entry? - Consider a more compact data representation for dead tuple locations within VACUUM http://archives.postgresql.org/pgsql-patches/2007-05/msg00143.php Great, but let's link to this more recent thread instead: https://www.postgresql.org/message-id/flat/CAD21AoAfOZvmfR0j8VmZorZjL7RhTiQdVttNuC4W-Shdc2a-AA%40mail.gmail.com - Improve autovacuum tuning http://www.postgresql.org/message-id/5078ad6b.8060...@agliodbs.com http://www.postgresql.org/message-id/20130124215715.ge4...@alvh.no-ip.org I'm kind of on the fence about these. The title is way too broad, and I doubt we are going to forget to keep improving this area. It seems the first thread is really about auto-analyze thresholds, so maybe it should be in a separate entry if we want to do anything mentioned in the thread? The second thread is really about autovacuum launcher scheduling. Probably still relevant, but the thread is very long and doesn't seem terribly helpful to someone trying to get up to speed on the issues that are still relevant. I don't see any more recent discussion, either. Thoughts? -- John Naylor EDB: http://www.enterprisedb.com
Re: speed up verifying UTF-8
It occurred to me that the DFA + ascii quick check approach could also be adapted to speed up some cases where we currently walk a string counting characters, like this snippet in text_position_get_match_pos(): /* Convert the byte position to char position. */ while (state->refpoint < state->last_match) { state->refpoint += pg_mblen(state->refpoint); state->refpos++; } This coding changed in 9556aa01c69 (Use single-byte Boyer-Moore-Horspool search even with multibyte encodings), in which I found the majority of cases were faster, but some were slower. It would be nice to regain the speed lost and do even better. In the case of UTF-8, we could just run it through the DFA, incrementing a count of the states found. The number of END states should be the number of characters. The ascii quick check would still be applicable as well. I think all that is needed is to export some symbols and add the counting function. That wouldn't materially affect the current patch for input verification, and would be separate, but it would be nice to get the symbol visibility right up front. I've set this to waiting on author while I experiment with that. -- John Naylor EDB: http://www.enterprisedb.com
do only critical work during single-user vacuum?
When a user must shut down and restart in single-user mode to run vacuum on an entire database, that does a lot of work that's unnecessary for getting the system online again, even without index_cleanup. We had a recent case where a single-user vacuum took around 3 days to complete. Now that we have a concept of a fail-safe vacuum, maybe it would be beneficial to skip a vacuum in single-user mode if the fail-safe criteria were not met at the beginning of vacuuming a relation. This is not without risk, of course, but it should be much faster than today and once up and running the admin would have a chance to get a handle on things. Thoughts? -- John Naylor EDB: http://www.enterprisedb.com
Re: do only critical work during single-user vacuum?
On Thu, Dec 9, 2021 at 5:13 PM Peter Geoghegan wrote: > Oh, I think I misunderstood. Your concern is for the case where the > DBA runs a simple "VACUUM" in single-user mode; you want to skip over > tables that don't really need to advance relfrozenxid, automatically. Right. > I can see an argument for something like that, but I think that it > should be a variant of VACUUM. Or maybe it could be addressed with a > better user interface; On Thu, Dec 9, 2021 at 6:08 PM Andres Freund wrote: > What if the user tried to reclaim space by vacuuming (via truncation)? Or is > working around some corruption or such? I think this is too much magic. > > That said, having a VACUUM "selector" that selects the oldest tables could be > quite useful. And address this usecase both for single-user and normal > operation. All good points. [Peter again] > single-user mode should prompt the user about > what exact VACUUM command they ought to run to get things going. The current message is particularly bad in its vagueness because some users immediately reach for VACUUM FULL, which quite logically seems like the most complete thing to do. -- John Naylor EDB: http://www.enterprisedb.com
Re: speed up verifying UTF-8
; > + * In a shift-based DFA, the input byte is an index into array of integers > > + * whose bit pattern encodes the state transitions. To compute the current > > + * state, we simply right-shift the integer by the current state and apply > > a > > + * mask. In this scheme, the address of the transition only depends on the > > + * input byte, so there is better pipelining. > > Should be "To compute the *next* state, ...", I think. Fixed. > The way the state transition table works is pretty inscrutable. That's > understandable, because the values were found by an SMT solver, so I'm > not sure if anything can be done about it. Do you mean in general, or just the state values? Like any state machine, the code is simple, and the complexity is hidden in the data. Hopefully the first link I included in the comment is helpful. The SMT solver was only needed to allow 32-bit (instead of 64-bit) entries in the transition table, so it's not strictly necessary. A lookup table that fits in 1kB is nice from a cache perspective, however. With 64-bit, the state values are less weird-looking but they're still just arbitrary numbers. As long as ERR = 0 and the largest is at most 9, it doesn't matter what they are, so I'm not sure it's much less mysterious. You can see the difference between 32-bit and 64-bit in [1]. -- In addition to Heikki's. review points, I've made a couple small additional changes from v24: I rewrote this part, so we don't need these macros anymore: - if (!IS_HIGHBIT_SET(*s) || - IS_UTF8_2B_LEAD(*s) || - IS_UTF8_3B_LEAD(*s) || - IS_UTF8_4B_LEAD(*s)) + if (!IS_HIGHBIT_SET(*s) || pg_utf_mblen(s) > 1) And I moved is_valid_ascii() to pg_wchar.h so it can be used elsewhere. I'm not sure there's a better place to put it. I tried using this for text_position(), for which I'll start a new thread. [1] https://www.postgresql.org/message-id/attachment/125672/v22-addendum-32-bit-transitions.txt -- John Naylor EDB: http://www.enterprisedb.com -- John Naylor EDB: http://www.enterprisedb.com From 0c20e37ac47a5759e6ec674bb4184d835c562af8 Mon Sep 17 00:00:00 2001 From: John Naylor Date: Tue, 19 Oct 2021 16:43:14 -0400 Subject: [PATCH v25] Add fast path for validating UTF-8 text Our previous validator used a traditional algorithm that performed comparison and branching one byte at a time. It's useful in that we always know exactly how many bytes we have validated, but that precision comes at a cost. Input validation can show up prominently in profiles of COPY FROM, and future improvements to COPY FROM such as parallelism or faster line parsing will put more pressure on input validation. Hence, add fast paths for both ASCII and multibyte UTF-8: Use bitwise operations to check 16 bytes at a time for ASCII. If that fails, use a "shift-based" DFA on those bytes to handle the general case, including multibyte. These paths are relatively free of branches and thus robust against all kinds of byte patterns. With these algorithms, UTF-8 validation is several times faster, depending on platform and the input byte distribution. The previous coding in pg_utf8_verifystr() is retained for short strings and for when the fast path returns an error. Review, performance testing, and additional hacking by: Heikki Linakangas, Vladimir Sitnikov, Amit Khandekar, Thomas Munro, and Greg Stark Discussion: https://www.postgresql.org/message-id/CAFBsxsEV_SzH%2BOLyCiyon%3DiwggSyMh_eF6A3LU2tiWf3Cy2ZQg%40mail.gmail.com --- src/common/wchar.c | 215 +++ src/include/mb/pg_wchar.h| 53 ++ src/test/regress/expected/conversion.out | 169 ++ src/test/regress/sql/conversion.sql | 133 ++ 4 files changed, 570 insertions(+) diff --git a/src/common/wchar.c b/src/common/wchar.c index a6bffd0642..be931c5e92 100644 --- a/src/common/wchar.c +++ b/src/common/wchar.c @@ -1750,11 +1750,226 @@ pg_utf8_verifychar(const unsigned char *s, int len) return l; } +/* + * The fast path of the UTF-8 verifier uses a deterministic finite automaton + * (DFA) for multibyte characters. In a traditional table-driven DFA, the + * input byte and current state are used to compute an index into an array of + * state transitions. Since the address of the next transition is dependent + * on this computation, there is latency in executing the load instruction, + * and the CPU is not kept busy. + * + * Instead, we use a "shift-based" DFA as described by Per Vognsen: + * + * https://gist.github.com/pervognsen/218ea17743e1442e59bb60d29b1aa725 + * + * In a shift-based DFA, the input byte is an index into array of integers + * whose bit pattern encodes the state transitions. To compute the n
speed up text_position() for utf-8
Hi, Commit 9556aa01c69 (Use single-byte Boyer-Moore-Horspool search even with multibyte encodings), was a speed improvement for the majority of cases, but when the match was toward the end of the string, the slowdown in text_position_get_match_pos() was noticeable. It was found that there was a lot of overhead in pg_mblen(). [1] The attached exploratory PoC improves this for utf-8. It applies on top v25 of my utf-8 verification patch in [2], since one approach relies on the DFA from it. The other three approaches are: - a version of pg_utf_mblen() that uses a lookup table [3] - an inlined copy of pg_utf_mblen() - an ascii fast path with a fallback to the inlined copy of pg_utf_mblen() The test is attached and the test function is part of the patch. It's based on the test used in the commit above. The test searches for a string that's at the end of a ~1 million byte string. This is on gcc 11 with 2-3 runs to ensure repeatability, but I didn't bother with statistics because the differences are pretty big: patch | no match | ascii | mulitbyte -+--+---+--- PG11| 1120 | 1100 | 900 master | 381 | 2350 | 1900 DFA | 386 | 1640 | 1640 branchless utf mblen| 387 | 4100 | 2600 inline pg_utf_mblen() | 380 | 1080 | 920 inline pg_utf_mblen() + ascii fast path | 382 | 470 | 918 Neither of the branchless approaches worked well. The DFA can't work as well here as in verification because it must do additional work. Inlining pg_utf_mblen() restores worst-case performance to PG11 levels. The ascii fast path is a nice improvement on top of that. A similar approach could work for pg_mbstrlen() as well, but I haven't looked into that yet. There are other callers of pg_mblen(), but I haven't looked into whether they are performance-sensitive. A more general application would be preferable to a targeted one. [1] https://www.postgresql.org/message-id/b65df3d8-1f59-3bd7-ebbe-68b81d5a76a4%40iki.fi [2] https://www.postgresql.org/message-id/CAFBsxsHG%3Dg6W8Mie%2B_NO8dV6O0pO2stxrnS%3Dme5ZmGqk--fd5g%40mail.gmail.com [3] https://github.com/skeeto/branchless-utf8/blob/master/utf8.h -- John Naylor EDB: http://www.enterprisedb.com src/backend/utils/adt/varlena.c | 112 src/common/wchar.c | 90 ++-- src/include/mb/pg_wchar.h | 53 --- src/test/regress/regress.c | 18 +++ 4 files changed, 133 insertions(+), 140 deletions(-) diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c index bd3091bbfb..cc93818007 100644 --- a/src/backend/utils/adt/varlena.c +++ b/src/backend/utils/adt/varlena.c @@ -26,6 +26,7 @@ #include "common/unicode_norm.h" #include "lib/hyperloglog.h" #include "libpq/pqformat.h" +#include "mb/pg_utf8.h" #include "miscadmin.h" #include "nodes/execnodes.h" #include "parser/scansup.h" @@ -1468,6 +1469,116 @@ text_position_get_match_pos(TextPositionState *state) { if (!state->is_multibyte) return state->last_match - state->str1 + 1; + else if (GetDatabaseEncoding() == PG_UTF8) + { +#if 0 /* dfa */ + + int utf8_state_count[UTF8_LAST_STATE + 1] = {0}; + uint32 dfa_state = BGN; + + /* + * See pg_utf8.h for an explanation of the state machine. Unlike in + * pg_wchar.c, we must calculate the exact state so we can use it as + * an array index. + */ + while (state->refpoint < state->last_match) + { + const unsigned char *s = (const unsigned char *) state->refpoint++; + + dfa_state = (Utf8Transition[*s] >> dfa_state) & 31; + utf8_state_count[dfa_state]++; + } + Assert(state->refpoint == state->last_match); + state->refpos += utf8_state_count[END]; + return state->refpos + 1; + +#endif +#if 0 /* like pg_utf_mblen(), but different algorithm: */ + + static const char lengths[] = { + 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, + 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 4, 1 + }; + + while (state->refpoint < state->last_match) + { + const unsigned char *s = (const unsigned char *) state->refpoint; + + state->refpoint += lengths[s[0] >> 3]; + state->refpos++; + } + + Assert(state->refpoint == state->last_match); + return state->refpos + 1; +#endif +#if 0 /* inline pg_utf_mblen()*/ + + int len = 0; + + while (state->refpoint < state->last_match) + { + const unsigned char *s = (const unsigned char *) state->refpoint; + + if ((*s & 0x80) == 0) +len = 1; + else if ((*s & 0xe0) == 0xc0) +len = 2; + else if ((*s & 0xf0
Re: cutting down the TODO list thread
On Wed, Dec 8, 2021 at 1:40 PM John Naylor wrote: > > It's been a while, but here are a few more suggested > removals/edits/additions to the TODO list. Any objections or new > information, let me know: > > - Auto-fill the free space map by scanning the buffer cache or by > checking pages written by the background writer > http://archives.postgresql.org/pgsql-hackers/2006-02/msg01125.php > https://www.postgresql.org/message-id/200603011716.16984.pete...@gmx.net > > Both these threads are from 2006, so have nothing to do with the current FSM. Moved to the Not Worth Doing list. > - Allow concurrent inserts to use recently created pages rather than > creating new ones > http://archives.postgresql.org/pgsql-hackers/2010-05/msg00853.php > > Skimming the first few messages, I believe this has been covered by > commit 719c84c1b? (Extend relations multiple blocks at a time to > improve scalability.) Removed. > - Allow VACUUM FULL and CLUSTER to update the visibility map > > This topic has a current CF entry which seems to have stalled, so that > newer thread would be better to list here than the one from 2013. Added. > - Bias FSM towards returning free space near the beginning of the heap > file, in hopes that empty pages at the end can be truncated by VACUUM > http://archives.postgresql.org/pgsql-hackers/2009-09/msg01124.php > https://www.postgresql.org/message-id/20150424190403.gp4...@alvh.no-ip.org > > I'm not sure what to think of this, but independently of that, the > second thread is actually talking about bringing back something like > the pre-9.0 vacuum full, so maybe it should be its own entry? Done. > - Consider a more compact data representation for dead tuple locations > within VACUUM > http://archives.postgresql.org/pgsql-patches/2007-05/msg00143.php > > Great, but let's link to this more recent thread instead: > https://www.postgresql.org/message-id/flat/CAD21AoAfOZvmfR0j8VmZorZjL7RhTiQdVttNuC4W-Shdc2a-AA%40mail.gmail.com Done. > > The second thread is really about autovacuum launcher scheduling. > > Probably still relevant, but the thread is very long and doesn't seem > > terribly helpful to someone trying to get up to speed on the issues > > that are still relevant. I don't see any more recent discussion, > > either. Thoughts? Split into two entries. On Wed, Dec 8, 2021 at 8:12 PM Masahiko Sawada wrote: > There is another discussion on autovacuum scheduling in 2018 here: > > https://www.postgresql.org/message-id/0A3221C70F24FB45833433255569204D1F8A4DC6%40G01JPEXMBYT05 > > Some algorithms were proposed there and I implemented a PoC patch: > > https://www.postgresql.org/message-id/CAD21AoBUaSRBypA6pd9ZD%3DU-2TJCHtbyZRmrS91Nq0eVQ0B3BA%40mail.gmail.com Added, thanks! -- John Naylor EDB: http://www.enterprisedb.com
Re: speed up text_position() for utf-8
I wrote: > The test is attached and the test function is part of the patch. It's > based on the test used in the commit above. The test searches for a > string that's at the end of a ~1 million byte string. This is on gcc > 11 with 2-3 runs to ensure repeatability, but I didn't bother with > statistics because the differences are pretty big: > > patch | no match | ascii | mulitbyte > -+--+---+--- > PG11| 1120 | 1100 | 900 > master | 381 | 2350 | 1900 > DFA | 386 | 1640 | 1640 > branchless utf mblen| 387 | 4100 | 2600 > inline pg_utf_mblen() | 380 | 1080 | 920 > inline pg_utf_mblen() + ascii fast path | 382 | 470 | 918 I failed to mention that the above numbers are milliseconds, so smaller is better. -- John Naylor EDB: http://www.enterprisedb.com
Re: speed up verifying UTF-8
I plan to push v25 early next week, unless there are further comments. -- John Naylor EDB: http://www.enterprisedb.com