Re: Joins on TID

2018-12-22 Thread Simon Riggs
On Sat, 22 Dec 2018 at 04:31, Tom Lane  wrote:

> BTW, if we're to start taking joins on TID seriously, we should also
> add the missing hash opclass for TID, so that you can do hash joins
> when dealing with a lot of rows.
>
> (In principle this also enables things like hash aggregation, though
> I'm not very clear on a use-case for grouping by TID.)
>

I don't think we are trying to do TID joins more seriously, just fix a
special case.

The case cited requires the batches of work to be small, so nested loops
works fine.

Looks to me that Edmund is trying to solve the same problem. If so, this is
the best solution.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [PATCH] Improve tab completion for CREATE TABLE

2018-12-22 Thread Dagfinn Ilmari Mannsåker
Michael Paquier  writes:

> On Fri, Dec 21, 2018 at 03:14:36PM +, Dagfinn Ilmari Mannsåker wrote:
>> Here's a patch that does this (and in passing alphabetises the list of
>> options).
>
> Cool, thanks.  The position of the option list is fine.  However
> list_TABLEOPTIONS is not a name consistent with the surroundings.  So
> we could just use table_level_option?

The CREATE and ALTER TABLE documentation calls them storage parameters,
so I've gone for table_storage_parameters in the attached v2 patch.

> Reordering them is a good idea, log_autovacuum_min_duration being the
> bad entry.

toast_tuple_target was also on the wrong side of the toast.* options.

- ilmari
-- 
- Twitter seems more influential [than blogs] in the 'gets reported in
  the mainstream press' sense at least.   - Matt McLeod
- That'd be because the content of a tweet is easier to condense down
  to a mainstream media article.  - Calle Dybedahl

>From 59f30579b72106cf14338b890acfd0c6f0b6009a Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= 
Date: Fri, 21 Dec 2018 15:03:19 +
Subject: [PATCH v2] =?UTF-8?q?Add=20completion=20for=20storage=20parameter?=
 =?UTF-8?q?s=20after=20CREATE=20TABLE=20=E2=80=A6=20WITH=20=E2=80=A6?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Move the list of parameters from the "ALTER TABLE  SET|RESET ("
block to the top-level, and reuse it after "CREATE TABLE  ( … )
WITH (".

In passing, rename the variable to and alphabetise the list properly.
---
 src/bin/psql/tab-complete.c | 76 ++---
 1 file changed, 38 insertions(+), 38 deletions(-)

diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 5ba6ffba8c..b12ffab312 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -1005,6 +1005,40 @@ static const pgsql_thing_t words_after_create[] = {
 	{NULL}		/* end of list */
 };
 
+static const char *const table_storage_parameters[] = {
+	"autovacuum_analyze_scale_factor",
+	"autovacuum_analyze_threshold",
+	"autovacuum_enabled",
+	"autovacuum_freeze_max_age",
+	"autovacuum_freeze_min_age",
+	"autovacuum_freeze_table_age",
+	"autovacuum_multixact_freeze_max_age",
+	"autovacuum_multixact_freeze_min_age",
+	"autovacuum_multixact_freeze_table_age",
+	"autovacuum_vacuum_cost_delay",
+	"autovacuum_vacuum_cost_limit",
+	"autovacuum_vacuum_scale_factor",
+	"autovacuum_vacuum_threshold",
+	"fillfactor",
+	"log_autovacuum_min_duration",
+	"parallel_workers",
+	"toast.autovacuum_enabled",
+	"toast.autovacuum_freeze_max_age",
+	"toast.autovacuum_freeze_min_age",
+	"toast.autovacuum_freeze_table_age",
+	"toast.autovacuum_multixact_freeze_max_age",
+	"toast.autovacuum_multixact_freeze_min_age",
+	"toast.autovacuum_multixact_freeze_table_age",
+	"toast.autovacuum_vacuum_cost_delay",
+	"toast.autovacuum_vacuum_cost_limit",
+	"toast.autovacuum_vacuum_scale_factor",
+	"toast.autovacuum_vacuum_threshold",
+	"toast.log_autovacuum_min_duration",
+	"toast_tuple_target",
+	"user_catalog_table",
+	NULL
+};
+
 
 /* Forward declaration of functions */
 static char **psql_completion(const char *text, int start, int end);
@@ -1904,44 +1938,7 @@ psql_completion(const char *text, int start, int end)
 		COMPLETE_WITH("(");
 	/* ALTER TABLE  SET|RESET ( */
 	else if (Matches("ALTER", "TABLE", MatchAny, "SET|RESET", "("))
-	{
-		static const char *const list_TABLEOPTIONS[] =
-		{
-			"autovacuum_analyze_scale_factor",
-			"autovacuum_analyze_threshold",
-			"autovacuum_enabled",
-			"autovacuum_freeze_max_age",
-			"autovacuum_freeze_min_age",
-			"autovacuum_freeze_table_age",
-			"autovacuum_multixact_freeze_max_age",
-			"autovacuum_multixact_freeze_min_age",
-			"autovacuum_multixact_freeze_table_age",
-			"autovacuum_vacuum_cost_delay",
-			"autovacuum_vacuum_cost_limit",
-			"autovacuum_vacuum_scale_factor",
-			"autovacuum_vacuum_threshold",
-			"fillfactor",
-			"parallel_workers",
-			"log_autovacuum_min_duration",
-			"toast_tuple_target",
-			"toast.autovacuum_enabled",
-			"toast.autovacuum_freeze_max_age",
-			"toast.autovacuum_freeze_min_age",
-			"toast.autovacuum_freeze_table_age",
-			"toast.autovacuum_multixact_freeze_max_age",
-			"toast.autovacuum_multixact_freeze_min_age",
-			"toast.autovacuum_multixact_freeze_table_age",
-			"toast.autovacuum_vacuum_cost_delay",
-			"toast.autovacuum_vacuum_cost_limit",
-			"toast.autovacuum_vacuum_scale_factor",
-			"toast.autovacuum_vacuum_threshold",
-			"toast.log_autovacuum_min_duration",
-			"user_catalog_table",
-			NULL
-		};
-
-		COMPLETE_WITH_LIST(list_TABLEOPTIONS);
-	}
+		COMPLETE_WITH_LIST(table_storage_parameters);
 	else if (Matches("ALTER", "TABLE", MatchAny, "REPLICA", "IDENTITY", "USING", "INDEX"))
 	{
 		completion_info_charp = prev5_wd;
@@ -2439,6 +2436,9 @@ psql_completion(const char *text, int start, int end)
 	else if (TailMatches("CREATE", "TEMP|TEMPORARY", "TABLE", MatchAny, "(*)"))

Re: Offline enabling/disabling of data checksums

2018-12-22 Thread Michael Banck
Hi,

I have added it to the commitfest now:

https://commitfest.postgresql.org/21/1944/

On Sat, Dec 22, 2018 at 08:28:34AM +0900, Michael Paquier wrote:
> On Fri, Dec 21, 2018 at 09:16:16PM +0100, Michael Banck wrote:
> > It adds an (now mandatory) --action parameter that takes either verify,
> > enable or disable as argument.
> 
> There are two discussion points which deserve attention here:
> 1) Do we want to rename pg_verify_checksums to something else, like
> pg_checksums.  I like a lot if we would do a simple renaming of the
> tool, which should be the first step taken. 

I am for it, but don't mind whether it's before or afterwards, your
call.

> 2) Which kind of interface do we want to use?  When I did my own
> flavor of pg_checksums, I used an --action switch able to use the
> following values:
> - enable
> - disable
> - verify
> The switch cannot be specified twice (perhaps we could enforce the
> last value as other binaries do in the tree, not sure if that's
> adapted here).  A second type of interface is to use one switch per
> action.  For both interfaces if no action is specify then the tool
> fails.  Vote is open.  

Even though my fork has the separate switches, I like the --action one.
On the other hand, it is a bit more typing as you always have to spell
out the action (is there precendent of accepting also incomplete option
arguments like 'v', 'e', 'd'?).

> > This is basically meant as a stop-gap measure in case online activation
> > of checksums won't make it for v12, but maybe it is independently
> > useful?
> 
> I think that this is independently useful, I got this stuff part of an
> upgrade workflow where the user is ready to accept some extra one-time
> offline time so as checksums are enabled.

OK; we have also used that at clients - if the instance has a size of
less than a few dozen GBs, enabling checksums during a routine minor
upgrade restart is not delaying things much.
 
> > 2. Rename the scan_* functions (Michael renamed them to operate_file and
> > operate_directory but I am not sure it is worth it.
> 
> The renaming makes sense, as scan implies only reading while enabling
> checksums causes a write.

Ok, will do in the next version.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz



Re: Referential Integrity Checks with Statement-level Triggers

2018-12-22 Thread Emre Hasegeli
> It is far from a premature optimization IMO, it is super useful and something 
> I was hoping would happen ever since I heard about transition tables being 
> worked on.

Me too.  Never-ending DELETEs are a common pain point especially for
people migrated from MySQL which creates indexes for foreign keys
automatically.



Re: Joins on TID

2018-12-22 Thread Tom Lane
Simon Riggs  writes:
> On Sat, 22 Dec 2018 at 04:31, Tom Lane  wrote:
>> BTW, if we're to start taking joins on TID seriously, we should also
>> add the missing hash opclass for TID, so that you can do hash joins
>> when dealing with a lot of rows.

> I don't think we are trying to do TID joins more seriously, just fix a
> special case.
> The case cited requires the batches of work to be small, so nested loops
> works fine.
> Looks to me that Edmund is trying to solve the same problem. If so, this is
> the best solution.

No, I think what Edmund is on about is unrelated, except that it touches
some of the same code.  He's interested in problems like "find the last
few tuples in this table".  You can solve that today, with e.g.
"SELECT ... WHERE ctid >= '(n,1)'", but you get a stupidly inefficient
plan.  If we think that's a use-case worth supporting then it'd be
reasonable to provide less inefficient implementation(s).

What I'm thinking about in this thread is joins on TID, which we have only
very weak support for today --- you'll basically always wind up with a
mergejoin, which requires full-table scan and sort of its inputs.  Still,
that's better than a naive nestloop, and for years we've been figuring
that that was good enough.  Several people in the other thread that
I cited felt that that isn't good enough.  But if we think it's worth
taking seriously, then IMO we need to add both parameterized scans (for
nestloop-with-inner-fetch-by-tid) and hash join, because each of those
can dominate depending on how many tuples you're joining.

regards, tom lane



Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-22 Thread Tom Lane
John Naylor  writes:
> Using a single file also gave me another idea: Take value and category
> out of ScanKeyword, and replace them with an index into another array
> containing those, which will only be accessed in the event of a hit.
> That would shrink ScanKeyword to 4 bytes (offset, index), further
> increasing locality of reference. Might not be worth it, but I can try
> it after moving on to the core scanner.

I like that idea a *lot*, actually, because it offers the opportunity
to decouple this mechanism from all assumptions about what the
auxiliary data for a keyword is.  Basically, we'd redefine
ScanKeywordLookup as having the API "given a string, return a
keyword index if it is a keyword, -1 if it isn't"; then the caller
would use the keyword index to look up the auxiliary data in a table
that it owns, and ScanKeywordLookup doesn't know about at all.

So that leads to a design like this: the master data is in a header
that's just like kwlist.h is today, except now we are thinking of
PG_KEYWORD as an N-argument macro not necessarily exactly 3 arguments.
The Perl script reads that, paying attention only to the first argument
of the macro calls, and outputs a file containing, say,

static const uint16 kw_offsets[] = { 0, 6, 15, ... };

static const char kw_strings[] =
"abort\0"
"absolute\0"
...
;

(it'd be a good idea to have a switch that allows specifying the
prefix of these constant names).  Then ScanKeywordLookup has the
signature

int ScanKeywordLookup(const char *string_to_lookup,
  const char *kw_strings,
  const uint16 *kw_offsets,
  int num_keywords);

and a file using this stuff looks something like

/* Payload data for keywords */
typedef struct MyKeyword
{
int16value;
int16category;
} MyKeyword;

#define PG_KEYWORD(kwname, value, category) {value, category},

static const MyKeyword MyKeywords[] = {
#include "kwlist.h"
};

/* String lookup table for keywords */
#include "kwlist_d.h"

/* Lookup code looks about like this: */
kwnum = ScanKeywordLookup(str,
  kw_strings,
  kw_offsets,
  lengthof(kw_offsets));
if (kwnum >= 0)
   ... look into MyKeywords[kwnum] for info ...

Aside from being arguably better from the locality-of-reference
standpoint, this gets us out of the weird ifdef'ing you've got in
the v2 patch.  The kwlist_d.h headers can be very ordinary headers.

regards, tom lane



Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-22 Thread Andres Freund
Hi,

On 2018-12-22 12:20:00 -0500, Tom Lane wrote:
> John Naylor  writes:
> > Using a single file also gave me another idea: Take value and category
> > out of ScanKeyword, and replace them with an index into another array
> > containing those, which will only be accessed in the event of a hit.
> > That would shrink ScanKeyword to 4 bytes (offset, index), further
> > increasing locality of reference. Might not be worth it, but I can try
> > it after moving on to the core scanner.
> 
> I like that idea a *lot*, actually, because it offers the opportunity
> to decouple this mechanism from all assumptions about what the
> auxiliary data for a keyword is.

OTOH, it doubles or triples the number of cachelines accessed when
encountering a keyword. The fraction of keywords to not-keywords in SQL
makes me wonder whether that makes it a good deal.

Greetings,

Andres Freund



Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-22 Thread Tom Lane
Andres Freund  writes:
> On 2018-12-22 12:20:00 -0500, Tom Lane wrote:
>> I like that idea a *lot*, actually, because it offers the opportunity
>> to decouple this mechanism from all assumptions about what the
>> auxiliary data for a keyword is.

> OTOH, it doubles or triples the number of cachelines accessed when
> encountering a keyword.

Compared to what?  The current situation in that regard is a mess.

Also, AFAICS this proposal involves the least amount of data touched
during the lookup phase of anything we've discussed, so I do not even
accept that your criticism is correct.  One extra cacheline fetch
to get the aux data for a particular keyword after the search is not
going to tip the scales away from this being a win.

regards, tom lane



Re: Joins on TID

2018-12-22 Thread Simon Riggs
On Sat, 22 Dec 2018 at 16:31, Tom Lane  wrote:


> What I'm thinking about in this thread is joins on TID, which we have only
> very weak support for today --- you'll basically always wind up with a
> mergejoin, which requires full-table scan and sort of its inputs.  Still,
> that's better than a naive nestloop, and for years we've been figuring
> that that was good enough.  Several people in the other thread that
> I cited felt that that isn't good enough.  But if we think it's worth
> taking seriously, then IMO we need to add both parameterized scans (for
> nestloop-with-inner-fetch-by-tid) and hash join, because each of those
> can dominate depending on how many tuples you're joining.
>

That would certainly help if you are building a column store, or other new
index types.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Speeding up text_position_next with multibyte encodings

2018-12-22 Thread Heikki Linnakangas

On 14/12/2018 20:20, John Naylor wrote:

I signed up to be a reviewer, and I will be busy next month, so I went
ahead and fixed the typo in the patch that broke assert-enabled
builds. While at it, I standardized on the spelling "start_ptr" in a
few places to match the rest of the file. It's a bit concerning that
it wouldn't compile with asserts, but the patch was written by a
committer and seems to work.


Thanks for fixing that!


On 10/19/18, Heikki Linnakangas  wrote:

This is a win in most cases. One case is slower: calling position() with
a large haystack input, where the match is near the end of the string.


I wanted to test this case in detail, so I ran the attached script, ...


I'm afraid that script doesn't work as a performance test. The 
position() function is immutable, so the result gets cached in the plan 
cache. All you're measuring is the speed to get the constant from the 
plan cache :-(.


I rewrote the same tests with a little C helper function, attached, to 
fix that, and to eliminate any PL/pgSQL overhead. And I adjusted the 
loop counts so that the runtimes are reasonable, now that it actually 
does some work.


UTF-8

length  master  patch
short   2.7s10.9s
med 2.6s11.4s
long3.9s9.9s

EUC_KR

length  master  patch
short   2.2s7.5s
med 2.2s3.5s
long3.4s3.6s

This doesn't look very flattering for the patch. Although, this is 
deliberately testing the worst case.


You chose interesting characters for the UTF-8 test. The haystack is a 
repeating pattern of byte sequence EC 99 84, and the needle is a 
repeating pattern of EC 84 B1. In the 'long' test, the lengths in the 
skip table are '2', '1' and '250'. But the search bounces between the 
'2' and '1' cases, and never hits the byte that would allow it to skip 
250 bytes. Interesting case, I had not realized that that can happen. 
But I don't think we need to put much weight on that, you could come up 
with other scenarios where the current code has skip table collisions, too.


So overall, I think it's still a worthwhile tradeoff, given that that is 
a worst case scenario. If you choose less pathological UTF-8 codepoints, 
or there is no match or the match is closer to the beginning of the 
string, the patch wins.


- Heikki
commit baebdb21cf6ebbe710bde189dd3d300610fb40c8
Author: Heikki Linnakangas 
Date:   Sun Dec 23 00:11:52 2018 +0200

benchmarking code for textpos.

diff --git a/src/test/regress/regress.c b/src/test/regress/regress.c
index a2e57768d40..59957db586f 100644
--- a/src/test/regress/regress.c
+++ b/src/test/regress/regress.c
@@ -863,3 +863,21 @@ test_fdw_handler(PG_FUNCTION_ARGS)
 	elog(ERROR, "test_fdw_handler is not implemented");
 	PG_RETURN_NULL();
 }
+
+
+PG_FUNCTION_INFO_V1(benchmark_textpos);
+Datum
+benchmark_textpos(PG_FUNCTION_ARGS)
+{
+	text	   *t1 = PG_GETARG_TEXT_PP(0);
+	text	   *t2 = PG_GETARG_TEXT_PP(1);
+	int			loops = PG_GETARG_INT32(2);
+	int			i;
+	Datum		result = Int32GetDatum(0);
+
+	for (i = 0; i < loops; i++)
+	{
+		result = DirectFunctionCall2Coll(textpos, PG_GET_COLLATION(), PointerGetDatum(t1), PointerGetDatum(t2));
+	}
+	return result;
+}


single-byte-bmh-benchmark.sql
Description: application/sql


Re: Speeding up text_position_next with multibyte encodings

2018-12-22 Thread Heikki Linnakangas

On 14/12/2018 23:40, John Naylor wrote:

I just noticed that the contrib/citext test fails. I've set the status
to waiting on author.


Hmm, it works for me. What failure did you see?

- Heikki



Re: Speeding up text_position_next with multibyte encodings

2018-12-22 Thread Heikki Linnakangas

On 23/12/2018 02:28, Heikki Linnakangas wrote:

On 14/12/2018 23:40, John Naylor wrote:

I just noticed that the contrib/citext test fails. I've set the status
to waiting on author.


Hmm, it works for me. What failure did you see?


Never mind, I'm seeing it now, with assertions enabled. Thanks, I'll 
investigate!


- Heikki



Re: [PATCH] Improve tab completion for CREATE TABLE

2018-12-22 Thread Michael Paquier
On Sat, Dec 22, 2018 at 01:33:23PM +, Dagfinn Ilmari Mannsåker wrote:
> The CREATE and ALTER TABLE documentation calls them storage parameters,
> so I've gone for table_storage_parameters in the attached v2 patch.

Sold.  And committed.

>> Reordering them is a good idea, log_autovacuum_min_duration being the
>> bad entry.
> 
> toast_tuple_target was also on the wrong side of the toast.*
> options.

Indeed.
--
Michael


signature.asc
Description: PGP signature


Re: Offline enabling/disabling of data checksums

2018-12-22 Thread Michael Paquier
On Sat, Dec 22, 2018 at 02:42:55PM +0100, Michael Banck wrote:
> On Sat, Dec 22, 2018 at 08:28:34AM +0900, Michael Paquier wrote:
>> There are two discussion points which deserve attention here:
>> 1) Do we want to rename pg_verify_checksums to something else, like
>> pg_checksums.  I like a lot if we would do a simple renaming of the
>> tool, which should be the first step taken. 
> 
> I am for it, but don't mind whether it's before or afterwards, your
> call.

Doing the renaming after would be a bit weird logically, as we would
finish with a point in time in the tree where pg_verify_checksums is
able to do something else than just verifying checksums.

> Even though my fork has the separate switches, I like the --action one.
> On the other hand, it is a bit more typing as you always have to spell
> out the action (is there precendent of accepting also incomplete option
> arguments like 'v', 'e', 'd'?).

Yes, there is a bit of that in psql for example for formats.  Not sure
that we should take this road for a checksumming tool though.  If a
new option is added which takes the first letter then we would have
incompatibility issues.  That's unlikely to happen, still that feels
uneasy.
--
Michael


signature.asc
Description: PGP signature


Re: Speeding up text_position_next with multibyte encodings

2018-12-22 Thread Heikki Linnakangas

On 23/12/2018 02:32, Heikki Linnakangas wrote:

On 23/12/2018 02:28, Heikki Linnakangas wrote:

On 14/12/2018 23:40, John Naylor wrote:

I just noticed that the contrib/citext test fails. I've set the status
to waiting on author.


Hmm, it works for me. What failure did you see?


Never mind, I'm seeing it now, with assertions enabled. Thanks, I'll
investigate!


The bug was in handling empty inputs. text_position_setup assumed and 
asserted that neither the needle nor haystack are empty, expecting the 
callers to have handled those special cases already, but not all callers 
did. Here is a fixed version.


- Heikki
>From 051069c56176e9fa5c4a0f25709bed4d59da4317 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Sun, 23 Dec 2018 02:40:48 +0200
Subject: [PATCH] Use single-byte Boyer-Moore-Horspool search even with
 multibyte encodings.

The old implementation first converted the input strings to arrays of
wchars, and performed the on those. However, the conversion is expensive,
and for a large input string, consumes a lot of memory.

Avoid the conversion, and instead use the single-byte algorithm even with
multibyte encodings. That has a couple of problems, though. Firstly, we
might get fooled if the needle string's byte sequence appears embedded
inside a different string. We might incorrectly return a match in the
middle of a multi-byte character. The second problem is that the
text_position function needs to return the position of the match, counted
in characters, rather than bytes. We can work around both of those problems
by an extra step after finding a match. Walk the string one character at a
time, starting from the beginning, until we reach the position of the match.
We can then check if the match was found at a valid character boundary,
which solves the first problem, and we can count the characters, so that we
can return the character offset. Walking the characters is faster than the
wchar conversion, especially in the case that there are no matches and we
can avoid it altogether. Also, avoiding the large allocation allows these
functions to work with inputs larger than 256 MB, even with multibyte
encodings.

For UTF-8, we can even skip the step to verify that the match lands on a
character boundary, because in UTF-8, the byte sequence of one character
cannot appear in the middle of a different character. I think many other
encodings have the same property, but I wasn't sure, so I only enabled
that optimization for UTF-8.

Most of the callers of the text_position_setup/next functions were actually
not interested in the position of the match, counted in characters. To
cater for them, refactor the text_position_next() interface into two parts:
searching for the next match (text_position_next()), and returning the
current match's position as a pointer (text_position_get_match_ptr()) or
as a character offset (text_position_get_match_pos()). Most callers of
text_position_setup/next are not interested in the character offsets, so
getting the pointer to the match within the original string is a more
convenient API for them. It also allows skipping the character counting
with UTF-8 encoding altogether.

diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 50a157df9fb..8b7c7f039ee 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -43,18 +43,33 @@ int			bytea_output = BYTEA_OUTPUT_HEX;
 typedef struct varlena unknown;
 typedef struct varlena VarString;
 
+/*
+ * State for text_position_* functions.
+ */
 typedef struct
 {
-	bool		use_wchar;		/* T if multibyte encoding */
-	char	   *str1;			/* use these if not use_wchar */
-	char	   *str2;			/* note: these point to original texts */
-	pg_wchar   *wstr1;			/* use these if use_wchar */
-	pg_wchar   *wstr2;			/* note: these are palloc'd */
-	int			len1;			/* string lengths in logical characters */
+	bool		is_multibyte;	/* T if multibyte encoding */
+	bool		is_multibyte_char_in_char;
+
+	char	   *str1;			/* haystack string */
+	char	   *str2;			/* needle string */
+	int			len1;			/* string lengths in bytes */
 	int			len2;
+
 	/* Skip table for Boyer-Moore-Horspool search algorithm: */
 	int			skiptablemask;	/* mask for ANDing with skiptable subscripts */
 	int			skiptable[256]; /* skip distance for given mismatched char */
+
+	char	   *last_match;		/* pointer to last match in 'str1' */
+
+	/*
+	 * Sometimes we need to convert byte positions to character positions, or
+	 * the other way round. These store the last position that was converted,
+	 * so that on the next call, we can continue from that point, rather than
+	 * count characters from the very beginning.
+	 */
+	char	   *refpoint;		/* pointer within original haystack string */
+	int			refpos;			/* 0-based character offset of the same point */
 } TextPositionState;
 
 typedef struct
@@ -109,7 +124,10 @@ static text *text_substring(Datum str,
 static text *text_overlay(text *t1, text *t2, int sp, int sl);
 static int	text_position(

Make relcache init write errors not be fatal

2018-12-22 Thread Jeff Janes
After running a testing server out of storage, I tried to track down why it
was so hard to get it back up again.  (Rather than what I usually do which
is just throwing it away and making the test be smaller).

I couldn't start a backend because it couldn't write the relcache init file.

I found this comment, but it did not carry its sentiment to completion:

/*
 * We used to consider this a fatal error, but we might as well
 * continue with backend startup ...
 */

With the attached patch applied, I could at least get a backend going so I
could drop some tables/indexes and free up space.

I'm not enamoured with the implementation of passing a flag down
to write_item, but it seemed better than making write_item return an error
code and then checking the return status in a dozen places.  Maybe we could
turn write_item into a macro, so the macro can implement the "return" from
the outer function directly?

Cheers,

Jeff


relcache_init_v1.patch
Description: Binary data


Re: Make relcache init write errors not be fatal

2018-12-22 Thread Andres Freund
Hi,

On 2018-12-22 20:49:58 -0500, Jeff Janes wrote:
> After running a testing server out of storage, I tried to track down why it
> was so hard to get it back up again.  (Rather than what I usually do which
> is just throwing it away and making the test be smaller).
> 
> I couldn't start a backend because it couldn't write the relcache init file.
> 
> I found this comment, but it did not carry its sentiment to completion:
> 
> /*
>  * We used to consider this a fatal error, but we might as well
>  * continue with backend startup ...
>  */
> 
> With the attached patch applied, I could at least get a backend going so I
> could drop some tables/indexes and free up space.

Why is this a good idea?  It'll just cause hard to debug performance
issues imo.

Greetings,

Andres Freund



Re: Make relcache init write errors not be fatal

2018-12-22 Thread Jeff Janes
On Sat, Dec 22, 2018 at 8:54 PM Andres Freund  wrote:

> Hi,
>
> On 2018-12-22 20:49:58 -0500, Jeff Janes wrote:
> > After running a testing server out of storage, I tried to track down why
> it
> > was so hard to get it back up again.  (Rather than what I usually do
> which
> > is just throwing it away and making the test be smaller).
> >
> > I couldn't start a backend because it couldn't write the relcache init
> file.
> >
> > I found this comment, but it did not carry its sentiment to completion:
> >
> > /*
> >  * We used to consider this a fatal error, but we might as well
> >  * continue with backend startup ...
> >  */
> >
> > With the attached patch applied, I could at least get a backend going so
> I
> > could drop some tables/indexes and free up space.
>
> Why is this a good idea?  It'll just cause hard to debug performance
> issues imo.
>
>
You get lots of WARNINGs, so it shouldn't be too hard to debug.  And once
you drop a table or an index, the init will succeed and you wouldn't have
the performance issues at all anymore.

The alternative, barring finding extraneous data on the same partition that
can be removed, seems to be having indefinite downtime until you can locate
a larger hard drive and move everything to it, or using dangerous hacks.

Cheers,

Jeff