Re: PITR on DROP DATABASE, deleting of the database directory despite the recovery_target_time set before.
Hello, Le mar. 19 nov. 2019 à 16:15, Nicolas Lutic a écrit : > > We are aware that this part is tricky and will have little effects on > normal operations, as best practices are to use xid_target or lsn_target. > > I'm working with Nicolas and we made some further testing. If we use xid target with inclusive to false at the next xid after the insert, we end up with the same DELETE/DROP directory behaviour which is quite confusing. One have to choose the xid-1 value with inclusive behaviour to lake it work. I assume this is the right first thing to document the behaviour. And give some examples on this. Maybe we could add some documentation in the xlog explanation and a warning in the recovery_target_time and xid in guc doc ? If there are better places in the docs let us know. Thanks -- Jean-Christophe Arnu
wal_dump output on CREATE DATABASE
Dear hackers, This is my first try to post on that list to propose a workaround on an issue I noticed while using pg_waldump. Each time an object is referenced by "oids" following output template : tablespace oid/database oid/relfilenodeid That template seems to be used for each operation but the *copy dir* operation. The latter is performed (at least) when CREATE DATABASE statement is called with the template: oid/tablespace oid Exemple on CREATE DATABASE (without defining a template database) : rmgr: Databaselen (rec/tot): 42/42, tx:568, lsn: 0/01865790, prev 0/01865720, desc: CREATE copy dir 1/1663 to 16384/1663 It comes out (to me) it may be more consistent to use the same template than the other operations in pg_waldump. I propose to swap the parameters positions for the copy dir operation output. You'll find a patch file included which does the switching job : rmgr: Databaselen (rec/tot): 42/42, tx:568, lsn: 0/01865790, prev 0/01865720, desc: CREATE copy dir 1663/1 to 1663/16384 - Project name ; PostgreSQL - File : copy_dir_message_switch_parameters_v0.patch - Description : Swaps copy dir output parameters - This patch is not WIP (but may be discussed) - Patch applied against master branch - Compiles and tests successfully - No platform specific code - No need of regression test - Not a new feature - No performance impact Any comment or advice are more than welcome. If I didn't do the things the right way, I would appreciate some help from a kind mentor. I'll put the patch for the next commitfest. Thank you, -- Jean-Christophe Arnu diff --git a/src/backend/access/rmgrdesc/dbasedesc.c b/src/backend/access/rmgrdesc/dbasedesc.c index 39e26d7ed4..7ee120f1d9 100644 --- a/src/backend/access/rmgrdesc/dbasedesc.c +++ b/src/backend/access/rmgrdesc/dbasedesc.c @@ -29,15 +29,15 @@ dbase_desc(StringInfo buf, XLogReaderState *record) xl_dbase_create_rec *xlrec = (xl_dbase_create_rec *) rec; appendStringInfo(buf, "copy dir %u/%u to %u/%u", - xlrec->src_db_id, xlrec->src_tablespace_id, - xlrec->db_id, xlrec->tablespace_id); + xlrec->src_tablespace_id, xlrec->src_db_id, + xlrec->tablespace_id, xlrec->db_id); } else if (info == XLOG_DBASE_DROP) { xl_dbase_drop_rec *xlrec = (xl_dbase_drop_rec *) rec; appendStringInfo(buf, "dir %u/%u", - xlrec->db_id, xlrec->tablespace_id); + xlrec->tablespace_id, xlrec->db_id); } }
Re: wal_dump output on CREATE DATABASE
Le ven. 2 nov. 2018 à 08:37, Peter Eisentraut < peter.eisentr...@2ndquadrant.com> a écrit : > On 26/10/2018 15:53, Jean-Christophe Arnu wrote: > > Exemple on CREATE DATABASE (without defining a template database) : > > rmgr: Databaselen (rec/tot): 42/42, tx:568, lsn: > > 0/01865790, prev 0/01865720, desc: CREATE copy dir 1/1663 to 16384/1663 > > > > It comes out (to me) it may be more consistent to use the same template > > than the other operations in pg_waldump. > > I propose to swap the parameters positions for the copy dir operation > > output. > > > > You'll find a patch file included which does the switching job : > > rmgr: Databaselen (rec/tot): 42/42, tx:568, lsn: > > 0/01865790, prev 0/01865720, desc: CREATE copy dir 1663/1 to 1663/16384 > > I see your point. I suspect this was mainly implemented that way > because that's how the fields occur in the underlying > xl_dbase_create_rec structure. (But that structure also has the target > location first, so it's not entirely consistent that way either.) We > could switch the fields around in the output, but I wonder whether we > couldn't make the whole thing a bit more readable like this: > > desc: CREATE copy dir ts=1663 db=1 to ts=1663 db=16384 > > or maybe like this > > desc: CREATE copy dir (ts/db) 1663/1 to 1663/16384 > Thank you Peter for your review and proposal . I like the last one which gives the fields semantics in a concise way. Pushing the idea a bit farther we could think of applying that description to any other operation that involves the ts/db/oid fields. What do you think ? Example in nbtdesc.c we could add the "(ts/db/oid)" information to help the DBA to identify the objects: case XLOG_BTREE_REUSE_PAGE: { xl_btree_reuse_page *xlrec = (xl_btree_reuse_page *) rec; appendStringInfo(buf, "rel (ts/db/rel) %u/%u/%u; latestRemovedXid %u", xlrec->node.spcNode, xlrec->node.dbNode, xlrec->node.relNode, xlrec->latestRemovedXid); break; } This may help DBAs to better identify the objects related to the messages. Any thought/comments/suggestions ? ~~~ Side story Well to be clear, my first proposal is born while i was writting a side script to textually identify which objects were involved into pg_waldump operations (if that objects already exists at script run time). I'm wondering whether it could be interesting to incllde such a "textual decoding" into pg_waldump or not (as a command line switch). Anyway, my script would be available as proof of concept first. We have time to discuss that last point in another thread. ~~~ Thank you >
Re: wal_dump output on CREATE DATABASE
Le dim. 4 nov. 2018 à 18:01, Jean-Christophe Arnu a écrit : > Le ven. 2 nov. 2018 à 08:37, Peter Eisentraut < > peter.eisentr...@2ndquadrant.com> a écrit : > >> On 26/10/2018 15:53, Jean-Christophe Arnu wrote: >> > Exemple on CREATE DATABASE (without defining a template database) : >> > rmgr: Databaselen (rec/tot): 42/42, tx:568, lsn: >> > 0/01865790, prev 0/01865720, desc: CREATE copy dir 1/1663 to 16384/1663 >> > >> > It comes out (to me) it may be more consistent to use the same template >> > than the other operations in pg_waldump. >> > I propose to swap the parameters positions for the copy dir operation >> > output. >> > >> > You'll find a patch file included which does the switching job : >> > rmgr: Databaselen (rec/tot): 42/42, tx:568, lsn: >> > 0/01865790, prev 0/01865720, desc: CREATE copy dir 1663/1 to 1663/16384 >> >> I see your point. I suspect this was mainly implemented that way >> because that's how the fields occur in the underlying >> xl_dbase_create_rec structure. (But that structure also has the target >> location first, so it's not entirely consistent that way either.) We >> could switch the fields around in the output, but I wonder whether we >> couldn't make the whole thing a bit more readable like this: >> >> desc: CREATE copy dir ts=1663 db=1 to ts=1663 db=16384 >> >> or maybe like this >> >> desc: CREATE copy dir (ts/db) 1663/1 to 1663/16384 >> > > > Thank you Peter for your review and proposal . > I like the last one which gives the fields semantics in a concise way. > Pushing the idea a bit farther we could think of applying that description > to any other operation that involves the ts/db/oid fields. What do you > think ? > > Example in nbtdesc.c we could add the "(ts/db/oid)" information to help > the DBA to identify the objects: > case XLOG_BTREE_REUSE_PAGE: > { > xl_btree_reuse_page *xlrec = (xl_btree_reuse_page *) rec; > > appendStringInfo(buf, "rel (ts/db/rel) %u/%u/%u; > latestRemovedXid %u", > xlrec->node.spcNode, xlrec->node.dbNode, > xlrec->node.relNode, > xlrec->latestRemovedXid); > break; > } > > This may help DBAs to better identify the objects related to the messages. > > Any thought/comments/suggestions ? > > ~~~ Side story > Well to be clear, my first proposal is born while i was writting a side > script to textually identify which objects were involved into pg_waldump > operations (if that objects already exists at script run time). I'm > wondering whether it could be interesting to incllde such a "textual > decoding" into pg_waldump or not (as a command line switch). Anyway, my > script would be available as proof of concept first. We have time to > discuss that last point in another thread. > ~~~ > > Thank you > >> I've dug a little more in that way and spotted different locations in the code where such semantics might be useful too. Sometimes just like in XLOG_HEAP_TRUNCATE (heapdesc.c), there's a bunch of rels and descendants that are held by a single db. Adding the database id may be useful in that case. Decoding tablespace for each object may be interesting (but not mandatory IMHO) for each rel. That information does not seem to be included in the structure, but as newcomer I assume there's a convenient way to retrieve that information easily. Another operation that might be eligible to end user message improvements is the one handled by xact_desc_commit() function (xactdesc.c) . Each time a COMMIT (XLOG_XACT_COMMIT or XLOG_XACT_COMMIT_PREPARED) occurs the folllowing snippets is output : desc: COMMIT 2018-11-05 15:11:03.087546 CET; rels: base/16384/16385 base/16384/16388 base/16384/16390; in that case, file path is output using relpathperm macro which ends up callin gthe GetRelationPath() function. In that last function, we can have the dbNode, spcNode (tablespace) and relNode Oid (include/common/relpath.h ) The question is now to know if it would be interesting to have a consistent way to represent all objects and their hierarchy : BTW, we could have db/ts/rel or ts/db/rel ? What about the "base/ts/rel" from XLOG_XACT_COMMIT/COMMIT_PREPARED ? Why is it different from the other representation (it must serve a purpose I assume) ? How do we represent multiples rels such as XLOG_HEAP_TRUNCATE ? My proposal (db/rels) dboid/ reloid1 reloid2 reloid3 ... reloidN (as TRUNCATE only deals with one DB, but no tablespace is defined, this may be another point ?) Well, if you could give me some directions, I would appreciate ! As ever, any thought, comments are more than welcomed. -- Jean-Christophe Arnu
Re: wal_dump output on CREATE DATABASE
Le lun. 5 nov. 2018 à 15:37, Jean-Christophe Arnu a écrit : > > > Le dim. 4 nov. 2018 à 18:01, Jean-Christophe Arnu a > écrit : > >> Le ven. 2 nov. 2018 à 08:37, Peter Eisentraut < >> peter.eisentr...@2ndquadrant.com> a écrit : >> >>> On 26/10/2018 15:53, Jean-Christophe Arnu wrote: >>> > Exemple on CREATE DATABASE (without defining a template database) : >>> > rmgr: Databaselen (rec/tot): 42/42, tx:568, lsn: >>> > 0/01865790, prev 0/01865720, desc: CREATE copy dir 1/1663 to 16384/1663 >>> > >>> > It comes out (to me) it may be more consistent to use the same template >>> > than the other operations in pg_waldump. >>> > I propose to swap the parameters positions for the copy dir operation >>> > output. >>> > >>> > You'll find a patch file included which does the switching job : >>> > rmgr: Databaselen (rec/tot): 42/42, tx:568, lsn: >>> > 0/01865790, prev 0/01865720, desc: CREATE copy dir 1663/1 to 1663/16384 >>> >>> I see your point. I suspect this was mainly implemented that way >>> because that's how the fields occur in the underlying >>> xl_dbase_create_rec structure. (But that structure also has the target >>> location first, so it's not entirely consistent that way either.) We >>> could switch the fields around in the output, but I wonder whether we >>> couldn't make the whole thing a bit more readable like this: >>> >>> desc: CREATE copy dir ts=1663 db=1 to ts=1663 db=16384 >>> >>> or maybe like this >>> >>> desc: CREATE copy dir (ts/db) 1663/1 to 1663/16384 >>> >> >> >> Thank you Peter for your review and proposal . >> I like the last one which gives the fields semantics in a concise way. >> Pushing the idea a bit farther we could think of applying that >> description to any other operation that involves the ts/db/oid fields. What >> do you think ? >> >> Example in nbtdesc.c we could add the "(ts/db/oid)" information to help >> the DBA to identify the objects: >> case XLOG_BTREE_REUSE_PAGE: >> { >> xl_btree_reuse_page *xlrec = (xl_btree_reuse_page *) rec; >> >> appendStringInfo(buf, "rel (ts/db/rel) %u/%u/%u; >> latestRemovedXid %u", >> xlrec->node.spcNode, xlrec->node.dbNode, >> xlrec->node.relNode, >> xlrec->latestRemovedXid); >> break; >> } >> >> This may help DBAs to better identify the objects related to the >> messages. >> >> Any thought/comments/suggestions ? >> >> ~~~ Side story >> Well to be clear, my first proposal is born while i was writting a side >> script to textually identify which objects were involved into pg_waldump >> operations (if that objects already exists at script run time). I'm >> wondering whether it could be interesting to incllde such a "textual >> decoding" into pg_waldump or not (as a command line switch). Anyway, my >> script would be available as proof of concept first. We have time to >> discuss that last point in another thread. >> ~~~ >> >> Thank you >> >>> > I've dug a little more in that way and spotted different locations in the > code where such semantics might be useful too. > Sometimes just like in XLOG_HEAP_TRUNCATE (heapdesc.c), there's a bunch of > rels and descendants that are held by a single db. Adding the database id > may be useful in that case. Decoding tablespace for each object may be > interesting (but not mandatory IMHO) for each rel. That information does > not seem to be included in the structure, but as newcomer I assume there's > a convenient way to retrieve that information easily. > > Another operation that might be eligible to end user message improvements > is the one handled by xact_desc_commit() function (xactdesc.c) . Each time > a COMMIT (XLOG_XACT_COMMIT or XLOG_XACT_COMMIT_PREPARED) occurs the > folllowing snippets is output : > > desc: COMMIT 2018-11-05 15:11:03.087546 CET; rels: base/16384/16385 > base/16384/16388 base/16384/16390; > > in that case, file path is output using relpathperm macro which ends up > callin gthe GetRelationPath() function. In that last function, we can have > the dbNode, spcNode (tablespace) and relNode Oid (include/common/relpath.h ) > > The question is now to know if it would be interesting to have a > consistent w
Re: wal_dump output on CREATE DATABASE
Le jeu. 15 nov. 2018 à 19:44, Robert Haas a écrit : > On Tue, Nov 13, 2018 at 3:40 PM Tomas Vondra > wrote: > > People reading pg_waldump output quickly learn to read the A/B/C format > > and what those fields mean. Breaking that into ts=A db=B relfilenode=C > > does not make that particularly clearer or easier to read. I'd say it'd > > also makes it harder to parse, and it increases the size of the output > > (both in terms of line length and data size). > > I agree. > First, thank you all for your reviews. I also agree that the A/B/C format is right (and it may be a good thing to document it, maybe by adding some changes in the doc/src/sgml/ref/pg_waldump.sgml file to this patch). To reply to Andres, I agree we should not change things for a target format that would not fit clearly defined syntax. In that way, I agree with Tomas on the fact that people reading pg_waldump output are quickly familiar with the A/B/C notation. My first use case was to decode the ids with a processing script to identify each id in A/B/C or pg_waldump output with a "human readable" item. For this, my processing script connects the cluster and tries resolve the ids with simple queries (and building a local cache for this). Then it replaces each looked up id item with its corresponding text. In some cases, this could be useful for DBA to find more easily when a specific relation was modified (searching for DELETE BTW). But that's only my use case and my little script. Going back to the code : As I can figure by crawling the source tree (and discovering it) there are messages with : * A/B/C notation which seems to be the one we should adopt ( meaning ts/db/refilenode ) some are only * A/B for the COPY message we discussed later On the other hand, and I don't know if it's relevant, I've pointed some examples such as XLOG_RELMAP_UPDATE in relmapdesc.c which could benefit of that "notation" : appendStringInfo(buf, "database %u tablespace %u size %u", xlrec->dbid, xlrec->tsid, xlrec->nbytes); could be written like this : appendStringInfo(buf, "%u/%u size %u", xlrec->tsid, xlrec->dbid, xlrec->nbytes); In that case ts and db should also be switched. In that case the message would only by B/C which is confusing, but we have other place where "base/" is put in prefix. The same transform may be also applied to standbydesc.c in standby_desc() function. appendStringInfo(buf, "xid %u db %u rel %u ", xlrec->locks[i].xid, xlrec->locks[i].dbOid, xlrec->locks[i].relOid); may be changed to appendStringInfo(buf, "xid %u (db/rel) %u/%u ", xlrec->locks[i].xid, xlrec->locks[i].dbOid, xlrec->locks[i].relOid); As I said, I don't know whether it's relevant to perform these changes or not. If the A/B/C notation is to be generalized, it would be worth document it in the SGML file. If not, the first patch provided should be enough. Regards -- Jean-Christophe Arnu
Re: wal_dump output on CREATE DATABASE
Le ven. 16 nov. 2018 à 14:32, Tomas Vondra a écrit : > > > > appendStringInfo(buf, "xid %u (db/rel) %u/%u ", > > xlrec->locks[i].xid, xlrec->locks[i].dbOid, > > xlrec->locks[i].relOid); > > > > > > As I said, I don't know whether it's relevant to perform these changes > > or not. > > Maybe, I'm not against doing that. But if we do that, I don't think we > need to add the "(db/rel)" bit - we don't do that elsewhere, so why > here? Yes, you spotted a bad copy/paste from me ; "(db/rel)" should bit should not be there. Sorry. Why that error so ? I admit I tried some changes on a local git branch of mine with another format helping the user to understand what ids really were. (each line containing ids had "(ts/db/rel)" or "(db/rel)" inside. On COMMIT messages, there also was only one "(db/rels)" bit. This was just an answer to Peter proposal on a different format. I wanted to keep A/B/C-like format and help DBA to understand what it was using a prefix string like (ts/db/rel). But as discussions were going on, I figured out it might be a bad idea so I did not submit. > And if we adopt the same format, should that also include the > tablespace? Although, maybe for locks that doesn't make much sense. > I agree that, in a way, it may be a good idea to use that A/B/C format every where each time we use ids. It would make sense for the dba to know what kind of ids are involved if we have partial ones (A/B or B/C). I don't know the code enough to say whether it would be difficult to retrieve each time the tablespace or not. There's also lines with base/db/rel ... So there's some different format. Anyway, I handle all of them (I hope so) in my processing script. So keeping the things just as they are (out of the initial COPY patch that should be applied *to me*) is no problem for me. On the other hand, there's a consensus not to go further than the initial patch. -- Jean-Christophe Arnu
Re: wal_dump output on CREATE DATABASE
Le mar. 20 nov. 2018 à 13:34, Peter Eisentraut < peter.eisentr...@2ndquadrant.com> a écrit : > On 16/11/2018 17:28, Jean-Christophe Arnu wrote: > > On the other hand, there's a consensus not to go further than the > > initial patch. > > I have committed your original patch. > > Great, thank you Peter. And thank you all for that interesting discussion. Regards -- Jean-Christophe Arnu
FDW pushdown of non-collated functions
ollation contains 0 (InvalidOid) so the control flow returns false thus the function cannot be pushed down. For date_bin() function : - fe variable contains the sub-expressions/arguments merged constraints. Here, fe->inputcollid is evaluated to 0 (InvalidOid) thus skips the else statement and continues the control flow in the function. For date_bin(), all arguments are “non-collatable” arguments (timestamp without time zone and interval). So the situation is that date_trunc() is a “non-collatable” function failing to be pushed down whereas it may be a good idea to do so. Maybe we could add another condition to the first if statement in order to allow a “no-collation” function to be pushed down even if they have “collatable” parameters. I’m not sure about the possible regressions of behaviour of this change, but it seems to work fine with date_trunc() and date_part() (which suffers the same problem). Here’s the following change /* * If function's input collation is not derived from a foreign * Var, it can't be sent to remote. */if (fe->inputcollid == InvalidOid || fe->funccollid == InvalidOid) /* OK, inputs are all noncollatable */ ;else if (inner_cxt.state != FDW_COLLATE_SAFE || fe->inputcollid != inner_cxt.collation) return false; I don’t presume this patch is free from side effects or fits all use-cases. A patch (tiny) is attached to this email. This patch works against master/head at the time of writing. Thank you for any thoughts. -- Jean-Christophe Arnu diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index 09d6dd60dd..16b938ebb2 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -568,7 +568,8 @@ foreign_expr_walker(Node *node, * If function's input collation is not derived from a foreign * Var, it can't be sent to remote. */ -if (fe->inputcollid == InvalidOid) +if (fe->inputcollid == InvalidOid || + fe->funccollid == InvalidOid) /* OK, inputs are all noncollatable */ ; else if (inner_cxt.state != FDW_COLLATE_SAFE || fe->inputcollid != inner_cxt.collation)
Re: FDW pushdown of non-collated functions
Dear Hackers, I figured out this email was sent at release time. The worst time to ask for thoughts on a subject IMHO. Anyway, I hope this email will pop the topic over the stack! Thank you! Le ven. 8 sept. 2023 à 16:41, Jean-Christophe Arnu a écrit : > Dear hackers, > > I recently found a weird behaviour involving FDW (postgres_fdw) and > planning. > > Here’s a simplified use-case: > > Given a remote table (say on server2) with the following definition: > > CREATE TABLE t1( > ts timestamp without time zone, > x bigint, > x2 text > ); > --Then populate t1 table:INSERT INTO t1 >SELECT > current_timestamp - 1000*random()*'1 day'::interval > ,x > ,''||x >FROM > generate_series(1,10) as x; > > > This table is imported in a specific schema on server1 (we do not use > use_remote_estimate) also with t1 name in a specific schema: > > On server1: > > CREATE SERVER server2 >FOREIGN DATA WRAPPER postgres_fdw >OPTIONS ( > host '127.0.0.1', > port '9002', > dbname 'postgres', > use_remote_estimate 'false' >); > CREATE USER MAPPING FOR jc >SERVER server2 >OPTIONS (user 'jc'); > CREATE SCHEMA remote; > > IMPORT FOREIGN SCHEMA public >FROM SERVER server2 >INTO remote ; > > On a classic PostgreSQL 15 version the following query using date_trunc() > is executed and results in the following plan: > > jc=# explain (verbose,analyze) select date_trunc('day',ts), count(1) from > remote.t1 group by date_trunc('day',ts) order by 1; > QUERY PLAN > > --- > Sort (cost=216.14..216.64 rows=200 width=16) (actual time=116.699..116.727 > rows=1001 loops=1) >Output: (date_trunc('day'::text, ts)), (count(1)) >Sort Key: (date_trunc('day'::text, t1.ts)) >Sort Method: quicksort Memory: 79kB >-> HashAggregate (cost=206.00..208.50 rows=200 width=16) (actual > time=116.452..116.532 rows=1001 loops=1) > Output: (date_trunc('day'::text, ts)), count(1) > Group Key: date_trunc('day'::text, t1.ts) > Batches: 1 Memory Usage: 209kB > -> Foreign Scan on remote.t1 (cost=100.00..193.20 rows=2560 > width=8) (actual time=0.384..106.225 rows=10 loops=1) >Output: date_trunc('day'::text, ts) >Remote SQL: SELECT ts FROM public.t1 > Planning Time: 0.077 ms > Execution Time: 117.028 ms > > > Whereas the same query with date_bin() > > jc=# explain (verbose,analyze) select date_bin('1day',ts,'2023-01-01'), > count(1) from remote.t1 group by 1 order by 1; > > QUERY PLAN > > > -- > Foreign Scan (cost=113.44..164.17 rows=200 width=16) (actual > time=11.297..16.312 rows=1001 loops=1) >Output: (date_bin('1 day'::interval, ts, '2023-01-01 00:00:00'::timestamp > without time zone)), (count(1)) >Relations: Aggregate on (remote.t1) >Remote SQL: SELECT date_bin('1 day'::interval, ts, '2023-01-01 > 00:00:00'::timestamp without time zone), count(1) FROM public.t1 GROUP BY 1 > ORDER BY date_bin('1 day'::interval, ts, '2023-01-01 00:00:00'::timestamp > without time zone) ASC NULLS LAST > Planning Time: 0.114 ms > Execution Time: 16.599 ms > > > > With date_bin() the whole expression is pushed down to the remote server, > whereas with date_trunc() it’s not. > > I dived into the code and live debugged. It turns out that decisions to > pushdown or not a whole query depends on many factors like volatility and > collation. In the date_trunc() case, the problem is all about collation ( > date_trunc() on timestamp without time zone). And decision is made in the > foreign_expr_walker() in deparse.c ( > https://git.postgresql.o
Re: FDW pushdown of non-collated functions
Hi Ashutosh, Le ven. 6 oct. 2023 à 14:16, Ashutosh Bapat a écrit : > Hi Jean-Christophe, > > On Fri, Sep 8, 2023 at 11:30 PM Jean-Christophe Arnu > wrote: > > > > Maybe we could add another condition to the first if statement in order > to allow a “no-collation” function to be pushed down even if they have > “collatable” parameters. I’m not sure about the possible regressions of > behaviour of this change, but it > seems to work fine with date_trunc() and date_part() (which suffers > the same problem). > > That may not work since the output of the function may be dependent > upon the collation on the inputs. > > There were similar discussions earlier. E.g. > > https://www.postgresql.org/message-id/flat/CACowWR1ARWyRepRxGfijMcsw%2BH84Dj8x2o9N3kvz%3Dz1p%2B6b45Q%40mail.gmail.com > . > > Reading Tom's first reply there you may work around this by declaring > the collation explicitly. > Thanks for your reply. I did not catch these messages in the archive. Thanks for spotting them. > Briefly reading Tom's reply, the problem seems to be trusting whether > the default collation locally and on the foreign server respectively > is same or not. May be a simple fix is to declare a foreign server > level option declaring that the default collation on the foreign > server is same as the local server may be a way to move forward. But > given that the problem remains unsolved for 7 years at least, may be > such a simple fix is not enough. > I studied postgres_fdw source code a bit and the problem is not as easy to solve : one could set an option telling the default remote collation is aligned with local per "server" but nothing guaranties that the parameter collation is known on the «remote» side. > > Another solution would be to attach another attribute to a function > indicating whether the output of that function depends upon the input > collations or not. Doing that just for FDW may not be acceptable > though. > Yes, definitely. I thought Anyway, you're right, after 7 years, this is a really difficult problem to solve and there's no straightforward solution (to my eyes). Thanks again for your kind explanations Regards -- Jean-Christophe Arnu
Empty string in lexeme for tsvector
Hello Hackers, This is my second proposal for a patch, so I hope not to make "rookie" mistakes. This proposal patch is based on a simple use case : If one creates a table this way CREATE TABLE tst_table AS (SELECT array_to_tsvector(ARRAY['','abc','def'])); the table content is : array_to_tsvector --- '' 'abc' 'def' (1 row) First it can be strange to have an empty string for tsvector lexeme but anyway, keep going to the real point. Once dumped, this table dump contains that empty string that can't be restored. tsvector_parse (./utils/adt/tsvector_parser.c) raises an error. Thus it is not possible for data to be restored this way. There are two ways to consider this : is it alright to have empty strings in lexemes ? * If so, empty strings should be correctly parsed by tsvector_parser. * If not, one should prevent empty strings from being stored into tsvectors. Since "empty strings" seems not to be a valid lexeme, I undertook to change some functions dealing with tsvector to check whether string arguments are empty. This might be the wrong path as I'm not familiar with tsvector usage... (OTOH, I can provide a fix patch for tsvector_parser() if I'm wrong). This involved changing the way functions like array_to_tsvector(), ts_delete() and setweight() behave. As for NULL values, empty string values are checked and an error is raised for such a value. It appears to me that ERRCODE_ZERO_LENGTH_CHARACTER_STRING (2200F) matched this behaviour but I may be wrong. Since this patch changes the way functions behave, consider it as a simple try to overcome a strange situation we've noticed on a specific use case. This included patch manages that checks for empty strings on the pointed out functions. It comes with modified regression tests. Patch applies along head/master and 14_RC1. Comments are more than welcome! Thank you, -- Jean-Christophe Arnu diff --git a/src/backend/utils/adt/tsvector_op.c b/src/backend/utils/adt/tsvector_op.c index 9236ebcc8f..5a33e1bb10 100644 --- a/src/backend/utils/adt/tsvector_op.c +++ b/src/backend/utils/adt/tsvector_op.c @@ -331,6 +331,11 @@ tsvector_setweight_by_filter(PG_FUNCTION_ARGS) lex_len = VARSIZE(dlexemes[i]) - VARHDRSZ; lex_pos = tsvector_bsearch(tsout, lex, lex_len); + if (lex_len == 0) + ereport(ERROR, + (errcode(ERRCODE_ZERO_LENGTH_CHARACTER_STRING), + errmsg("lexeme array may not contain empty strings"))); + if (lex_pos >= 0 && (j = POSDATALEN(tsout, entry + lex_pos)) != 0) { WordEntryPos *p = POSDATAPTR(tsout, entry + lex_pos); @@ -607,6 +612,11 @@ tsvector_delete_arr(PG_FUNCTION_ARGS) (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED), errmsg("lexeme array may not contain nulls"))); + if (VARSIZE(dlexemes[i]) - VARHDRSZ == 0) + ereport(ERROR, + (errcode(ERRCODE_ZERO_LENGTH_CHARACTER_STRING), + errmsg("lexeme array may not contain empty strings"))); + lex = VARDATA(dlexemes[i]); lex_len = VARSIZE(dlexemes[i]) - VARHDRSZ; lex_pos = tsvector_bsearch(tsin, lex, lex_len); @@ -761,13 +771,18 @@ array_to_tsvector(PG_FUNCTION_ARGS) deconstruct_array(v, TEXTOID, -1, false, TYPALIGN_INT, &dlexemes, &nulls, &nitems); - /* Reject nulls (maybe we should just ignore them, instead?) */ + /* Reject nulls and zero length strings (maybe we should just ignore them, instead?) */ for (i = 0; i < nitems; i++) { if (nulls[i]) ereport(ERROR, (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED), errmsg("lexeme array may not contain nulls"))); + + if (VARSIZE(dlexemes[i]) - VARHDRSZ == 0) + ereport(ERROR, + (errcode(ERRCODE_ZERO_LENGTH_CHARACTER_STRING), + errmsg("lexeme array may not contain empty strings"))); } /* Sort and de-dup, because this is required for a valid tsvector. */ diff --git a/src/test/regress/expected/tstypes.out b/src/test/regress/expected/tstypes.out index 2601e312df..a258179ef0 100644 --- a/src/test/regress/expected/tstypes.out +++ b/src/test/regress/expected/tstypes.out @@ -1260,6 +1260,8 @@ SELECT ts_delete('base hidden rebel spaceship strike'::tsvector, ARRAY['spaceshi SELECT ts_delete('base hidden rebel spaceship strike'::tsvector, ARRAY['spaceship','leya','rebel', NULL]); ERROR: lexeme array may not contain nulls +SELECT ts_delete('base hidden rebel spaceship strike'::tsvector, ARRAY['spaceship','leya','rebel', '']); +ERROR: lexeme array may not contain empty strings SELECT unnest('base:7 hidden:6 rebel:1 spaceship:2,33A,34B,35C,36D strike:3'::tsvector); unnest - @@ -1330,6 +1332,8 @@ SELECT array_to_tsvector(ARRAY['base','hidden','rebel
Re: Empty string in lexeme for tsvector
Le ven. 24 sept. 2021 à 13:03, Ranier Vilela a écrit : > > Comments are more than welcome! >> > 1. Would be better to add this test-and-error before tsvector_bsearch call. > > + if (lex_len == 0) > + ereport(ERROR, > + (errcode(ERRCODE_ZERO_LENGTH_CHARACTER_STRING), > + errmsg("lexeme array may not contain empty strings"))); > + > > If lex_len is equal to zero, better get out soon. > > 2. The second test-and-error can use lex_len, just like the first test, > I don't see the point in recalculating the size of lex_len if that's > already done. > > + if (VARSIZE(dlexemes[i]) - VARHDRSZ == 0) > + ereport(ERROR, > + (errcode(ERRCODE_ZERO_LENGTH_CHARACTER_STRING), > + errmsg("lexeme array may not contain empty strings"))); > + > Hello Ranier, Thank you for your comments. Here's a new patch file taking your comments into account. I was just wondering if empty string eviction is done in the right place. As you rightfully commented, lex_len is calculated later (once again for a right purpose) and my code checks for empty strings as soon as possible. To me, it seems to be the right thing to do (prevent further processing on lexemes as soon as possible) but I might omit something. Regards Jean-Christophe Arnu diff --git a/src/backend/utils/adt/tsvector_op.c b/src/backend/utils/adt/tsvector_op.c index 9236ebcc8f..00f80ffcbc 100644 --- a/src/backend/utils/adt/tsvector_op.c +++ b/src/backend/utils/adt/tsvector_op.c @@ -329,6 +329,12 @@ tsvector_setweight_by_filter(PG_FUNCTION_ARGS) lex = VARDATA(dlexemes[i]); lex_len = VARSIZE(dlexemes[i]) - VARHDRSZ; + + if (lex_len == 0) + ereport(ERROR, + (errcode(ERRCODE_ZERO_LENGTH_CHARACTER_STRING), + errmsg("lexeme array may not contain empty strings"))); + lex_pos = tsvector_bsearch(tsout, lex, lex_len); if (lex_pos >= 0 && (j = POSDATALEN(tsout, entry + lex_pos)) != 0) @@ -609,6 +615,12 @@ tsvector_delete_arr(PG_FUNCTION_ARGS) lex = VARDATA(dlexemes[i]); lex_len = VARSIZE(dlexemes[i]) - VARHDRSZ; + + if (lex_len == 0) + ereport(ERROR, + (errcode(ERRCODE_ZERO_LENGTH_CHARACTER_STRING), + errmsg("lexeme array may not contain empty strings"))); + lex_pos = tsvector_bsearch(tsin, lex, lex_len); if (lex_pos >= 0) @@ -761,13 +773,18 @@ array_to_tsvector(PG_FUNCTION_ARGS) deconstruct_array(v, TEXTOID, -1, false, TYPALIGN_INT, &dlexemes, &nulls, &nitems); - /* Reject nulls (maybe we should just ignore them, instead?) */ + /* Reject nulls and zero length strings (maybe we should just ignore them, instead?) */ for (i = 0; i < nitems; i++) { if (nulls[i]) ereport(ERROR, (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED), errmsg("lexeme array may not contain nulls"))); + + if (VARSIZE(dlexemes[i]) - VARHDRSZ == 0) + ereport(ERROR, + (errcode(ERRCODE_ZERO_LENGTH_CHARACTER_STRING), + errmsg("lexeme array may not contain empty strings"))); } /* Sort and de-dup, because this is required for a valid tsvector. */ diff --git a/src/test/regress/expected/tstypes.out b/src/test/regress/expected/tstypes.out index 2601e312df..a258179ef0 100644 --- a/src/test/regress/expected/tstypes.out +++ b/src/test/regress/expected/tstypes.out @@ -1260,6 +1260,8 @@ SELECT ts_delete('base hidden rebel spaceship strike'::tsvector, ARRAY['spaceshi SELECT ts_delete('base hidden rebel spaceship strike'::tsvector, ARRAY['spaceship','leya','rebel', NULL]); ERROR: lexeme array may not contain nulls +SELECT ts_delete('base hidden rebel spaceship strike'::tsvector, ARRAY['spaceship','leya','rebel', '']); +ERROR: lexeme array may not contain empty strings SELECT unnest('base:7 hidden:6 rebel:1 spaceship:2,33A,34B,35C,36D strike:3'::tsvector); unnest - @@ -1330,6 +1332,8 @@ SELECT array_to_tsvector(ARRAY['base','hidden','rebel','spaceship','strike']); SELECT array_to_tsvector(ARRAY['base','hidden','rebel','spaceship', NULL]); ERROR: lexeme array may not contain nulls +SELECT array_to_tsvector(ARRAY['base','hidden','rebel','spaceship', '']); +ERROR: lexeme array may not contain empty strings -- array_to_tsvector must sort and de-dup SELECT array_to_tsvector(ARRAY['foo','bar','baz','bar']); array_to_tsvector @@ -1375,6 +1379,8 @@ SELECT setweight('a asd w:5,6,12B,13A zxc'::tsvector, 'c', '{a,zxc}'); SELECT setweight('a asd w:5,6,12B,13A zxc'::tsvector, 'c', ARRAY['a', 'zxc', NULL]); ERR
Re: Empty string in lexeme for tsvector
Le dim. 26 sept. 2021 à 15:55, Artur Zakirov a écrit : > Nice catch! The patch looks good to me. > Can you also add a more general test case: > > =# SELECT $$'' '1' '2'$$::tsvector; > ERROR: syntax error in tsvector: "'' '1' '2'" > LINE 1: SELECT $$'' '1' '2'$$::tsvector; > > Thank you, Artur for spotting this test. It is now included into this patch. -- Jean-Christophe Arnu diff --git a/src/backend/utils/adt/tsvector_op.c b/src/backend/utils/adt/tsvector_op.c index 9236ebcc8f..00f80ffcbc 100644 --- a/src/backend/utils/adt/tsvector_op.c +++ b/src/backend/utils/adt/tsvector_op.c @@ -329,6 +329,12 @@ tsvector_setweight_by_filter(PG_FUNCTION_ARGS) lex = VARDATA(dlexemes[i]); lex_len = VARSIZE(dlexemes[i]) - VARHDRSZ; + + if (lex_len == 0) + ereport(ERROR, + (errcode(ERRCODE_ZERO_LENGTH_CHARACTER_STRING), + errmsg("lexeme array may not contain empty strings"))); + lex_pos = tsvector_bsearch(tsout, lex, lex_len); if (lex_pos >= 0 && (j = POSDATALEN(tsout, entry + lex_pos)) != 0) @@ -609,6 +615,12 @@ tsvector_delete_arr(PG_FUNCTION_ARGS) lex = VARDATA(dlexemes[i]); lex_len = VARSIZE(dlexemes[i]) - VARHDRSZ; + + if (lex_len == 0) + ereport(ERROR, + (errcode(ERRCODE_ZERO_LENGTH_CHARACTER_STRING), + errmsg("lexeme array may not contain empty strings"))); + lex_pos = tsvector_bsearch(tsin, lex, lex_len); if (lex_pos >= 0) @@ -761,13 +773,18 @@ array_to_tsvector(PG_FUNCTION_ARGS) deconstruct_array(v, TEXTOID, -1, false, TYPALIGN_INT, &dlexemes, &nulls, &nitems); - /* Reject nulls (maybe we should just ignore them, instead?) */ + /* Reject nulls and zero length strings (maybe we should just ignore them, instead?) */ for (i = 0; i < nitems; i++) { if (nulls[i]) ereport(ERROR, (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED), errmsg("lexeme array may not contain nulls"))); + + if (VARSIZE(dlexemes[i]) - VARHDRSZ == 0) + ereport(ERROR, + (errcode(ERRCODE_ZERO_LENGTH_CHARACTER_STRING), + errmsg("lexeme array may not contain empty strings"))); } /* Sort and de-dup, because this is required for a valid tsvector. */ diff --git a/src/test/regress/expected/tstypes.out b/src/test/regress/expected/tstypes.out index 2601e312df..f8bf9c6051 100644 --- a/src/test/regress/expected/tstypes.out +++ b/src/test/regress/expected/tstypes.out @@ -85,6 +85,10 @@ SELECT 'a:3A b:2a'::tsvector || 'ba:1234 a:1B'; 'a':3A,4B 'b':2A 'ba':1237 (1 row) +SELECT $$'' '1' '2'$$::tsvector; +ERROR: syntax error in tsvector: "'' '1' '2'" +LINE 1: SELECT $$'' '1' '2'$$::tsvector; + ^ --Base tsquery test SELECT '1'::tsquery; tsquery @@ -1260,6 +1264,8 @@ SELECT ts_delete('base hidden rebel spaceship strike'::tsvector, ARRAY['spaceshi SELECT ts_delete('base hidden rebel spaceship strike'::tsvector, ARRAY['spaceship','leya','rebel', NULL]); ERROR: lexeme array may not contain nulls +SELECT ts_delete('base hidden rebel spaceship strike'::tsvector, ARRAY['spaceship','leya','rebel', '']); +ERROR: lexeme array may not contain empty strings SELECT unnest('base:7 hidden:6 rebel:1 spaceship:2,33A,34B,35C,36D strike:3'::tsvector); unnest - @@ -1330,6 +1336,8 @@ SELECT array_to_tsvector(ARRAY['base','hidden','rebel','spaceship','strike']); SELECT array_to_tsvector(ARRAY['base','hidden','rebel','spaceship', NULL]); ERROR: lexeme array may not contain nulls +SELECT array_to_tsvector(ARRAY['base','hidden','rebel','spaceship', '']); +ERROR: lexeme array may not contain empty strings -- array_to_tsvector must sort and de-dup SELECT array_to_tsvector(ARRAY['foo','bar','baz','bar']); array_to_tsvector @@ -1375,6 +1383,8 @@ SELECT setweight('a asd w:5,6,12B,13A zxc'::tsvector, 'c', '{a,zxc}'); SELECT setweight('a asd w:5,6,12B,13A zxc'::tsvector, 'c', ARRAY['a', 'zxc', NULL]); ERROR: lexeme array may not contain nulls +SELECT setweight('a asd w:5,6,12B,13A zxc'::tsvector, 'c', ARRAY['a', 'zxc', '']); +ERROR: lexeme array may not contain empty strings SELECT ts_filter('base:7A empir:17 evil:15 first:11 galact:16 hidden:6A rebel:1A spaceship:2A strike:3A victori:12 won:9'::tsvector,
Re: Empty string in lexeme for tsvector
Le dim. 26 sept. 2021 à 22:41, Jean-Christophe Arnu a écrit : > > > Le dim. 26 sept. 2021 à 15:55, Artur Zakirov a écrit : > >> Nice catch! The patch looks good to me. >> Can you also add a more general test case: >> >> =# SELECT $$'' '1' '2'$$::tsvector; >> ERROR: syntax error in tsvector: "'' '1' '2'" >> LINE 1: SELECT $$'' '1' '2'$$::tsvector; >> >> > Thank you, Artur for spotting this test. > It is now included into this patch. > > > Two more things : * I updated the documentation for array_to_tsvector(), ts_delete() and setweight() functions (so here's a new patch); * I should mention François Ferry from Logilab who first reported the backup/restore problem that led to this patch. I think this should be ok, now the doc is up to date. Kind regards. -- Jean-Christophe Arnu diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 78812b2dbe..01f0b870ca 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -12896,7 +12896,9 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple Converts an array of lexemes to a tsvector. -The given strings are used as-is without further processing. +The given strings are used as-is. Some checks are performed +on array elements. Empty strings and NULL values +will cause an error to be raised. array_to_tsvector('{fat,cat,rat}'::text[]) @@ -13079,6 +13081,9 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple Assigns the specified weight to elements of the vector that are listed in lexemes. +Some checks are performed on lexemes. +Empty strings and NULL values +will cause an error to be raised. setweight('fat:2,4 cat:3 rat:5,6B'::tsvector, 'A', '{cat,rat}') @@ -13256,6 +13261,9 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple Removes any occurrences of the lexemes in lexemes from the vector. +Some checks are performed on lexemes. +Empty strings and NULL values +will cause an error to be raised. ts_delete('fat:2,4 cat:3 rat:5A'::tsvector, ARRAY['fat','rat']) diff --git a/src/backend/utils/adt/tsvector_op.c b/src/backend/utils/adt/tsvector_op.c index 9236ebcc8f..00f80ffcbc 100644 --- a/src/backend/utils/adt/tsvector_op.c +++ b/src/backend/utils/adt/tsvector_op.c @@ -329,6 +329,12 @@ tsvector_setweight_by_filter(PG_FUNCTION_ARGS) lex = VARDATA(dlexemes[i]); lex_len = VARSIZE(dlexemes[i]) - VARHDRSZ; + + if (lex_len == 0) + ereport(ERROR, + (errcode(ERRCODE_ZERO_LENGTH_CHARACTER_STRING), + errmsg("lexeme array may not contain empty strings"))); + lex_pos = tsvector_bsearch(tsout, lex, lex_len); if (lex_pos >= 0 && (j = POSDATALEN(tsout, entry + lex_pos)) != 0) @@ -609,6 +615,12 @@ tsvector_delete_arr(PG_FUNCTION_ARGS) lex = VARDATA(dlexemes[i]); lex_len = VARSIZE(dlexemes[i]) - VARHDRSZ; + + if (lex_len == 0) + ereport(ERROR, + (errcode(ERRCODE_ZERO_LENGTH_CHARACTER_STRING), + errmsg("lexeme array may not contain empty strings"))); + lex_pos = tsvector_bsearch(tsin, lex, lex_len); if (lex_pos >= 0) @@ -761,13 +773,18 @@ array_to_tsvector(PG_FUNCTION_ARGS) deconstruct_array(v, TEXTOID, -1, false, TYPALIGN_INT, &dlexemes, &nulls, &nitems); - /* Reject nulls (maybe we should just ignore them, instead?) */ + /* Reject nulls and zero length strings (maybe we should just ignore them, instead?) */ for (i = 0; i < nitems; i++) { if (nulls[i]) ereport(ERROR, (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED), errmsg("lexeme array may not contain nulls"))); + + if (VARSIZE(dlexemes[i]) - VARHDRSZ == 0) + ereport(ERROR, + (errcode(ERRCODE_ZERO_LENGTH_CHARACTER_STRING), + errmsg("lexeme array may not contain empty strings"))); } /* Sort and de-dup, because this is required for a valid tsvector. */ diff --git a/src/test/regress/expected/tstypes.out b/src/test/regress/expected/tstypes.out index 2601e312df..f8bf9c6051 100644 --- a/src/test/regress/expected/tstypes.out +++ b/src/test/regress/expected/tstypes.out @@ -85,6 +85,10 @@ SELECT 'a:3A b:2a'::tsvector || 'ba:1234 a:1B'; 'a':3A,4B 'b':2A 'ba':1237 (1 row) +SELECT $$'' '1' '2'$$::tsvector; +ERROR: syntax error in tsvector
Re: Empty string in lexeme for tsvector
Thank you Tom for your review. Le mer. 29 sept. 2021 à 21:36, Tom Lane a écrit : > Jean-Christophe Arnu writes: > > [ empty_string_in_tsvector_v4.patch ] > > I looked through this patch a bit. I don't agree with adding > these new error conditions to tsvector_setweight_by_filter and > tsvector_delete_arr. Those don't prevent bad lexemes from being > added to tsvectors, so AFAICS they can have no effect other than > breaking existing applications. In fact, tsvector_delete_arr is > one thing you could use to fix existing bad tsvectors, so making > it throw an error seems actually counterproductive. > Agreed. The new patch included here does not change tsvector_setweight_by_filter() anymore. Tests and docs are also upgraded. This patch is not ready yet. > (By the same token, I think there's a good argument for > tsvector_delete_arr to just ignore nulls, not throw an error. > That's a somewhat orthogonal issue, though.) > Nulls are now ignored in tsvector_delete_arr(). > What I'm wondering about more than that is whether array_to_tsvector > is the only place that can inject an empty lexeme ... don't we have > anything else that can add lexemes without going through the parser? > I crawled the docs [1] in order to check each tsvector output from functions and operators : * The only fonctions left that may lead to empty strings seem both json_to_tsvector() and jsonb_to_tsvector(). Both functions use parsetext (in ts_parse.c) that seems to behave correctly and don't create "empty string". * concat operator "||' allows to compute a ts_vector containing "empty string" if one of its operands contains itself an empty string tsvector. This seems perfectly correct from the operator point of view... Should we change this behaviour to filter out empty strings ? I also wonder if we should not also consider changing COPY FROM behaviour on empty string lexemes. Current version is just crashing on empty string lexemes. Should we allow them anyway as COPY FROM input (it seems weird not to be able to re-import dumped data) or "skipping them" just like array_to_tsvector() does in the patched version (that may lead to database content changes) or finally should we not change COPY behaviour ? I admit this is a tricky bunch of questions I'm too rookie to answer. [1] https://www.postgresql.org/docs/14/functions-textsearch.html -- Jean-Christophe Arnu diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 78812b2dbe..74e25db03c 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -12896,7 +12896,9 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple Converts an array of lexemes to a tsvector. -The given strings are used as-is without further processing. +The given strings are used as-is. Some checks are performed +on array elements. Empty strings and NULL values +will cause an error to be raised. array_to_tsvector('{fat,cat,rat}'::text[]) @@ -13079,6 +13081,8 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple Assigns the specified weight to elements of the vector that are listed in lexemes. +Some checks are performed on lexemes. +NULL values will cause an error to be raised. setweight('fat:2,4 cat:3 rat:5,6B'::tsvector, 'A', '{cat,rat}') @@ -13256,6 +13260,8 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple Removes any occurrences of the lexemes in lexemes from the vector. +NULL values in lexemes +will be ignored. ts_delete('fat:2,4 cat:3 rat:5A'::tsvector, ARRAY['fat','rat']) diff --git a/src/backend/utils/adt/tsvector_op.c b/src/backend/utils/adt/tsvector_op.c index 9236ebcc8f..e50d8a84e2 100644 --- a/src/backend/utils/adt/tsvector_op.c +++ b/src/backend/utils/adt/tsvector_op.c @@ -602,10 +602,9 @@ tsvector_delete_arr(PG_FUNCTION_ARGS) int lex_len, lex_pos; + // Ignore null values if (nulls[i]) - ereport(ERROR, - (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED), - errmsg("lexeme array may not contain nulls"))); + continue; lex = VARDATA(dlexemes[i]); lex_len = VARSIZE(dlexemes[i]) - VARHDRSZ; @@ -761,13 +760,18 @@ array_to_tsvector(PG_FUNCTION_ARGS) deconstruct_array(v, TEXTOID, -1, false, TYPALIGN_INT, &dlexemes, &nulls, &nitems); - /* Reject nulls (maybe we should just ignore them, instead?) */ + /* Reject nulls and
Re: FileFallocate misbehaving on XFS
Hello Mike, We encountered the same problem with a fixed allocsize=262144k. Removing this option seemed to fix the problem.We are now in an XFS managed allocation heuristic way. The problem does not show up since the change was made on monday, but I'm not totally sure it has fixed the problem. If we can provide you with more information, please let me know. Le ven. 31 janv. 2025 à 00:53, Michael Harris a écrit : > Hello All > > An update on this. > > Earlier in this thread, Jakub had suggested remounting the XFS > filesystems with the mount option allocsize=1m. > I've done that now, on a few systems that have been experiencing this > error multiple times a day, and it does seem to stop the errors from > occurring. > It has only been a few days, but it does look encouraging as a workaround. > > From the xfs man page, it seems that manually setting allocsize turns > off the dynamic preallocation size heuristics that are normally used > by XFS. > > The other piece of info is that we have raised a support ticket with > Redhat. I'm not directly in contact with them (it's another team that > handles that) but I will let you know of any developments. > > Cheers > Mike > > > -- Jean-Christophe Arnu