Example not producing shown output
The following documentation comment has been logged on the website: Page: https://www.postgresql.org/docs/12/functions-textsearch.html Description: Page: https://www.postgresql.org/docs/12/functions-textsearch.html Examples should have specified 'english'. Instead: to_tsvector('fat cats ate rats') @@ to_tsquery('cat & rat') which returns fale on my own blank installation of Postgre (Serbian locale Windows OS) instead of true: to_tsvector('english', 'fat cats ate rats') @@ to_tsquery('cat & rat')
Re: Add for ALTER TEXT SEARCH CONFIGURATION
Em sex., 15 de nov. de 2019 às 16:10, Jeff Janes escreveu: > > If you alter one of the built-in text search configurations, the > modifications to it will not get dumped by pg_dump, and thus won't get > propagated by pg_upgrade leading to silent behavior changes in the new > cluster (as well in any other type of restoration from pg_dump output) > It was a bad design to allow changes in builtin text search objects. User expects that all modifications should be propagated while dumping a cluster. Since pg_ts_config_map does not have an oid column, it is not easy to guess that a text search configuration was modified (unless we dump the initial table and compare row-by-row). We could disallow changes in builtin text search objects but it could potentially break some scenarios. Even if it breaks a scenario, a workaround with another user-defined text search configuration is possible. > I would say it is bad practise to alter one of those anyway, rather than copy > and alter the copy. But I wasn't aware until now of just how bad of an idea > it was. I think that this needs to be warned about in the docs for ALTER > TEXT SEARCH CONFIGURATION. If I were monkeying around in > $SHAREDLIB/tsearch_data, I'd expect traps like this, but not when executing > things from SQL after reading the docs on the command. > Let's start with a tangible idea: add a big "caution" and also a WARNING while ALTER TEXT SEARCH CONFIGURATION whose schema is pg_catalog. -- Euler Taveira Timbira - http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
Re: Example not producing shown output
> On 25 Nov 2019, at 13:05, PG Doc comments form wrote: > > The following documentation comment has been logged on the website: > > Page: https://www.postgresql.org/docs/12/functions-textsearch.html > Description: > > Page: https://www.postgresql.org/docs/12/functions-textsearch.html > > Examples should have specified 'english'. > > Instead: > to_tsvector('fat cats ate rats') @@ to_tsquery('cat & rat') > > which returns fale on my own blank installation of Postgre (Serbian locale > Windows OS) instead of true: > to_tsvector('english', 'fat cats ate rats') @@ to_tsquery('cat & rat') I agree with this complaint, mainly since other examples on the page do specify english where required AFAICT, this makes it consistent. The attached diff adds the english configuration directive to @@ and @@@. The examples on chapter 12.1 all have the same issue, but adding 'english' to all of them in order to make them work out of the box everywhere doesn't seem like an improvement to the docs so those are left as is. cheers ./daniel textsearch_english.diff Description: Binary data
Re: Add for ALTER TEXT SEARCH CONFIGURATION
Euler Taveira writes: > Em sex., 15 de nov. de 2019 às 16:10, Jeff Janes > escreveu: >> If you alter one of the built-in text search configurations, the >> modifications to it will not get dumped by pg_dump, and thus won't get >> propagated by pg_upgrade leading to silent behavior changes in the new >> cluster (as well in any other type of restoration from pg_dump output) This isn't really different from what happens if you alter any object defined by initdb. (There are limited exceptions now for GRANT/REVOKE, but not for any other object property.) > It was a bad design to allow changes in builtin text search objects. I disagree. It is reasonable to point out that if you do that, propagating your changes to new databases is your problem. But the fact that you can mess with built-in objects has always been seen as a feature not a bug, and I'm not willing to change that approach, nor to start plastering random man pages with bright yellow cautions against doing so. Having the code itself issue complaints is right out. regards, tom lane
legacy assumptions
The following documentation comment has been logged on the website: Page: https://www.postgresql.org/docs/12/datatype-json.html Description: I'm wondering if this one line of section 8.14 JSON Types (https://www.postgresql.org/docs/current/datatype-json.html) can be edited to remove the word "legacy": "In general, most applications should prefer to store JSON data as jsonb, unless there are quite specialized needs, such as legacy assumptions about ordering of object keys." I'm concerned that with the word "legacy" there, someone might come along eventually and decide the json column type isn't needed anymore because it's "legacy", where in fact there are modern and legitimate uses for a field that allows you to retrieve the data exactly as it was stored and allows JSON queries on that data (even if they are slower). In my application, there's a database table to store encrypted and integrity-protected information. The ciphertext is in a binary data column, the ciphertext metadata is in a jsonb column, and integrity-protected plaintext is in a json column. The integrity protection is implemented as either the AD portion of AES-GCM AEAD or as a separately-computed HMAC; either way the bytes must be read exactly as they were stored or the integrity check will fail. Being able to select records based on the content of that plaintext json data is nice. An alternative would be to store the plaintext as binary data for the integrity check and have a separate jsonb column with a second copy of the same data. Since different applications have different time/space tradeoffs, it's good to have the choice. My suggestion for that sentence: "In general, most applications should prefer to store JSON data as jsonb, unless there are quite specialized needs, such as assumptions about ordering of object keys or the need to retrieve the data exactly as it was stored."
Re: legacy assumptions
Hi, On 11/25/19 12:47 PM, PG Doc comments form wrote: > The following documentation comment has been logged on the website: > > Page: https://www.postgresql.org/docs/12/datatype-json.html > Description: > > I'm wondering if this one line of section 8.14 JSON Types > (https://www.postgresql.org/docs/current/datatype-json.html) can be edited > to remove the word "legacy": > > "In general, most applications should prefer to store JSON data as jsonb, > unless there are quite specialized needs, such as legacy assumptions about > ordering of object keys." > > I'm concerned that with the word "legacy" there, someone might come along > eventually and decide the json column type isn't needed anymore because it's > "legacy", where in fact there are modern and legitimate uses for a field > that allows you to retrieve the data exactly as it was stored and allows > JSON queries on that data (even if they are slower). While I'm certainly sensitive to this need as once upon a time I had a similar requirement, slightly less strict requirement, I made sure to not rely on the PostgreSQL JSON type itself to ensure ordering was preserved (and in my case I was able to rely on a solution external to PostgreSQL). The JSON RFC states that objects should be considered "unordered", and mentions that while different parsing libraries may preserve key ordering, "implementations whose behavior does not depend on member ordering will be interoperable in the sense that they will not be affected by these differences."[1] > An alternative would be to store the > plaintext as binary data for the integrity check and have a separate jsonb > column with a second copy of the same data. Since different applications > have different time/space tradeoffs, it's good to have the choice. Another approach is to leverage PostgreSQL's expression index capabilities, which would allow you to limit the data duplication. For example: CREATE TABLE docs (doc bytea); -- populating some test data INSERT INTO docs SELECT ('{"id": ' || x || ', "data": [1,2,3] }')::bytea FROM generate_series(1, 10) x; -- create an expression index that maps to the operators supported by GIN CREATE INDEX docs_doc_json_idx ON docs USING gin(jsonb(encode(doc, 'escape'))); and in one test run: EXPLAIN SELECT doc FROM docs WHERE encode(doc, 'escape')::jsonb @> '{"id": 567}'; I got a plan similar to: QUERY PLAN Bitmap Heap Scan on docs (cost=28.77..306.00 rows=100 width=31) Recheck Cond: ((encode(doc, 'escape'::text))::jsonb @> '{"id": 567}'::jsonb) -> Bitmap Index Scan on docs_doc_json_idx (cost=0.00..28.75 rows=100 width=0) Index Cond: ((encode(doc, 'escape'::text))::jsonb @> '{"id": 567}'::jsonb) In this way, you can: - Keep the key ordering preserved and perform any integrity checks, etc. that your application requires - Limit your data duplication to that of the index - Still get the benefits of the JSONB lookup functions that work with the indexing - Still perform JSON validation: INSERT INTO docs VALUES ('{]'::bytea); ERROR: invalid input syntax for type json DETAIL: Expected string or "}", but found "]". CONTEXT: JSON data, line 1: {] > My suggestion for that sentence: > > "In general, most applications should prefer to store JSON data as jsonb, > unless there are quite specialized needs, such as assumptions about ordering > of object keys or the need to retrieve the data exactly as it was stored." My preference would be that we guide in the documentation on what to do if one has an application sensitive to ordering. I'm not opposed to the wording, but I'd prefer we encourage people to leverage JSONB for storage & retrieval. Thanks! Jonathan [1] https://tools.ietf.org/html/rfc7159#section-4 signature.asc Description: OpenPGP digital signature