Em sex., 24 de set. de 2021 às 05:47, Jean-Christophe Arnu <jca...@gmail.com> escreveu:
> Hello Hackers, > > This is my second proposal for a patch, so I hope not to make "rookie" > mistakes. > > This proposal patch is based on a simple use case : > > If one creates a table this way > CREATE TABLE tst_table AS (SELECT > array_to_tsvector(ARRAY['','abc','def'])); > > the table content is : > array_to_tsvector > ------------------- > '' 'abc' 'def' > (1 row) > > First it can be strange to have an empty string for tsvector lexeme but > anyway, keep going to the real point. > > Once dumped, this table dump contains that empty string that can't be > restored. > tsvector_parse (./utils/adt/tsvector_parser.c) raises an error. > > Thus it is not possible for data to be restored this way. > > There are two ways to consider this : is it alright to have empty strings > in lexemes ? > * If so, empty strings should be correctly parsed by tsvector_parser. > * If not, one should prevent empty strings from being stored into > tsvectors. > > Since "empty strings" seems not to be a valid lexeme, I undertook to > change some functions dealing with tsvector to check whether string > arguments are empty. This might be the wrong path as I'm not familiar with > tsvector usage... (OTOH, I can provide a fix patch for tsvector_parser() if > I'm wrong). > > This involved changing the way functions like array_to_tsvector(), > ts_delete() and setweight() behave. As for NULL values, empty string values > are checked and an error is raised for such a value. It appears to me that > ERRCODE_ZERO_LENGTH_CHARACTER_STRING (2200F) matched this behaviour but I > may be wrong. > > Since this patch changes the way functions behave, consider it as a simple > try to overcome a strange situation we've noticed on a specific use case. > > This included patch manages that checks for empty strings on the pointed > out functions. It comes with modified regression tests. Patch applies along > head/master and 14_RC1. > > Comments are more than welcome! > 1. Would be better to add this test-and-error before tsvector_bsearch call. + if (lex_len == 0) + ereport(ERROR, + (errcode(ERRCODE_ZERO_LENGTH_CHARACTER_STRING), + errmsg("lexeme array may not contain empty strings"))); + If lex_len is equal to zero, better get out soon. 2. The second test-and-error can use lex_len, just like the first test, I don't see the point in recalculating the size of lex_len if that's already done. + if (VARSIZE(dlexemes[i]) - VARHDRSZ == 0) + ereport(ERROR, + (errcode(ERRCODE_ZERO_LENGTH_CHARACTER_STRING), + errmsg("lexeme array may not contain empty strings"))); + regards, Ranier Vilela