Example not producing shown output

2019-11-25 Thread PG Doc comments form
The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/12/functions-textsearch.html
Description:

Page: https://www.postgresql.org/docs/12/functions-textsearch.html

Examples should have specified 'english'.

Instead:
   to_tsvector('fat cats ate rats') @@ to_tsquery('cat & rat')

which returns fale on my own blank installation of Postgre (Serbian locale
Windows OS) instead of true:
   to_tsvector('english', 'fat cats ate rats') @@ to_tsquery('cat & rat')


Re: Add for ALTER TEXT SEARCH CONFIGURATION

2019-11-25 Thread Euler Taveira
Em sex., 15 de nov. de 2019 às 16:10, Jeff Janes
 escreveu:
>
> If you alter one of the built-in text search configurations, the 
> modifications to it will not get dumped by pg_dump, and thus won't get 
> propagated by pg_upgrade leading to silent behavior changes in the new 
> cluster (as well in any other type of restoration from pg_dump output)
>
It was a bad design to allow changes in builtin text search objects.
User expects that all modifications should be propagated while dumping
a cluster.

Since pg_ts_config_map does not have an oid column, it is not easy to
guess that a text search configuration was modified (unless we dump
the initial table and compare row-by-row). We could disallow changes
in builtin text search objects but it could potentially break some
scenarios. Even if it breaks a scenario, a workaround with another
user-defined text search configuration is possible.

> I would say it is bad practise to alter one of those anyway, rather than copy 
> and alter the copy.  But I wasn't aware until now of just how bad of an idea 
> it was.  I think that this needs to be warned about in the docs for ALTER 
> TEXT SEARCH CONFIGURATION.  If I were monkeying around in 
> $SHAREDLIB/tsearch_data, I'd expect traps like this, but not when executing 
> things from SQL after reading the docs on the command.
>
Let's start with a tangible idea: add a big "caution" and also a
WARNING while ALTER TEXT SEARCH CONFIGURATION whose schema is
pg_catalog.


-- 
   Euler Taveira   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento




Re: Example not producing shown output

2019-11-25 Thread Daniel Gustafsson
> On 25 Nov 2019, at 13:05, PG Doc comments form  wrote:
> 
> The following documentation comment has been logged on the website:
> 
> Page: https://www.postgresql.org/docs/12/functions-textsearch.html
> Description:
> 
> Page: https://www.postgresql.org/docs/12/functions-textsearch.html
> 
> Examples should have specified 'english'.
> 
> Instead:
>   to_tsvector('fat cats ate rats') @@ to_tsquery('cat & rat')
> 
> which returns fale on my own blank installation of Postgre (Serbian locale
> Windows OS) instead of true:
>   to_tsvector('english', 'fat cats ate rats') @@ to_tsquery('cat & rat')


I agree with this complaint, mainly since other examples on the page do specify
english where required AFAICT, this makes it consistent.  The attached diff
adds the english configuration directive to @@ and @@@.

The examples on chapter 12.1 all have the same issue, but adding 'english' to
all of them in order to make them work out of the box everywhere doesn't seem
like an improvement to the docs so those are left as is.

cheers ./daniel



textsearch_english.diff
Description: Binary data



Re: Add for ALTER TEXT SEARCH CONFIGURATION

2019-11-25 Thread Tom Lane
Euler Taveira  writes:
> Em sex., 15 de nov. de 2019 às 16:10, Jeff Janes
>  escreveu:
>> If you alter one of the built-in text search configurations, the 
>> modifications to it will not get dumped by pg_dump, and thus won't get 
>> propagated by pg_upgrade leading to silent behavior changes in the new 
>> cluster (as well in any other type of restoration from pg_dump output)

This isn't really different from what happens if you alter any object
defined by initdb.   (There are limited exceptions now for GRANT/REVOKE,
but not for any other object property.)

> It was a bad design to allow changes in builtin text search objects.

I disagree.  It is reasonable to point out that if you do that,
propagating your changes to new databases is your problem.  But
the fact that you can mess with built-in objects has always been
seen as a feature not a bug, and I'm not willing to change that
approach, nor to start plastering random man pages with bright
yellow cautions against doing so.  Having the code itself issue
complaints is right out.

regards, tom lane




legacy assumptions

2019-11-25 Thread PG Doc comments form
The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/12/datatype-json.html
Description:

I'm wondering if this one line of section 8.14 JSON Types
(https://www.postgresql.org/docs/current/datatype-json.html) can be edited
to remove the word "legacy":

"In general, most applications should prefer to store JSON data as jsonb,
unless there are quite specialized needs, such as legacy assumptions about
ordering of object keys."

I'm concerned that with the word "legacy" there, someone might come along
eventually and decide the json column type isn't needed anymore because it's
"legacy", where in fact there are modern and legitimate uses for a field
that allows you to retrieve the data exactly as it was stored and allows
JSON queries on that data (even if they are slower).

In my application, there's a database table to store encrypted and
integrity-protected information. The ciphertext is in a binary data column,
the ciphertext metadata is in a jsonb column, and integrity-protected
plaintext is in a json column. The integrity protection is implemented as
either the AD portion of AES-GCM AEAD or as a separately-computed HMAC;
either way the bytes must be read exactly as they were stored or the
integrity check will fail. Being able to select records based on the content
of that plaintext json data is nice. An alternative would be to store the
plaintext as binary data for the integrity check and have a separate jsonb
column with a second copy of the same data. Since different applications
have different time/space tradeoffs, it's good to have the choice.

My suggestion for that sentence:

"In general, most applications should prefer to store JSON data as jsonb,
unless there are quite specialized needs, such as assumptions about ordering
of object keys or the need to retrieve the data exactly as it was stored."


Re: legacy assumptions

2019-11-25 Thread Jonathan S. Katz
Hi,

On 11/25/19 12:47 PM, PG Doc comments form wrote:
> The following documentation comment has been logged on the website:
> 
> Page: https://www.postgresql.org/docs/12/datatype-json.html
> Description:
> 
> I'm wondering if this one line of section 8.14 JSON Types
> (https://www.postgresql.org/docs/current/datatype-json.html) can be edited
> to remove the word "legacy":
> 
> "In general, most applications should prefer to store JSON data as jsonb,
> unless there are quite specialized needs, such as legacy assumptions about
> ordering of object keys."
> 
> I'm concerned that with the word "legacy" there, someone might come along
> eventually and decide the json column type isn't needed anymore because it's
> "legacy", where in fact there are modern and legitimate uses for a field
> that allows you to retrieve the data exactly as it was stored and allows
> JSON queries on that data (even if they are slower).

While I'm certainly sensitive to this need as once upon a time I had a
similar requirement, slightly less strict requirement, I made sure to
not rely on the PostgreSQL JSON type itself to ensure ordering was
preserved (and in my case I was able to rely on a solution external to
PostgreSQL).

The JSON RFC states that objects should be considered "unordered", and
mentions that while different parsing libraries may preserve key
ordering, "implementations whose behavior does not depend on member
ordering will be interoperable in the sense that they will not be
affected by these differences."[1]

> An alternative would be to store the
> plaintext as binary data for the integrity check and have a separate jsonb
> column with a second copy of the same data. Since different applications
> have different time/space tradeoffs, it's good to have the choice.

Another approach is to leverage PostgreSQL's expression index
capabilities, which would allow you to limit the data duplication. For
example:

CREATE TABLE docs (doc bytea);

-- populating some test data
INSERT INTO docs
SELECT ('{"id": ' || x || ', "data": [1,2,3] }')::bytea
FROM generate_series(1, 10) x;

-- create an expression index that maps to the operators supported by GIN
CREATE INDEX docs_doc_json_idx ON docs
USING gin(jsonb(encode(doc, 'escape')));

and in one test run:

EXPLAIN
SELECT doc
FROM docs WHERE encode(doc, 'escape')::jsonb @> '{"id": 567}';

I got a plan similar to:

 QUERY PLAN


 Bitmap Heap Scan on docs  (cost=28.77..306.00 rows=100 width=31)
   Recheck Cond: ((encode(doc, 'escape'::text))::jsonb @> '{"id":
567}'::jsonb)
   ->  Bitmap Index Scan on docs_doc_json_idx  (cost=0.00..28.75
rows=100 width=0)
 Index Cond: ((encode(doc, 'escape'::text))::jsonb @> '{"id":
567}'::jsonb)

In this way, you can:

- Keep the key ordering preserved and perform any integrity checks, etc.
that your application requires
- Limit your data duplication to that of the index
- Still get the benefits of the JSONB lookup functions that work with
the indexing
- Still perform JSON validation:

INSERT INTO docs VALUES ('{]'::bytea);

ERROR:  invalid input syntax for type json
DETAIL:  Expected string or "}", but found "]".
CONTEXT:  JSON data, line 1: {]

> My suggestion for that sentence:
> 
> "In general, most applications should prefer to store JSON data as jsonb,
> unless there are quite specialized needs, such as assumptions about ordering
> of object keys or the need to retrieve the data exactly as it was stored."

My preference would be that we guide in the documentation on what to do
if one has an application sensitive to ordering. I'm not opposed to the
wording, but I'd prefer we encourage people to leverage JSONB for
storage & retrieval.

Thanks!

Jonathan

[1] https://tools.ietf.org/html/rfc7159#section-4



signature.asc
Description: OpenPGP digital signature