Re: '{"x": 42, "y": null}'::jsonb != '{"x": 42}'::jsonb ... Really?

2022-06-19 Thread jian he
On Sat, Jun 18, 2022 at 5:44 AM Bryn Llewellyn  wrote:

>
> */* — START OF SPEC —— */*
>
>
> *The document's top-level object may use only these keys:*
>
> *"isbn" — string*
> *values must be unique across the entire set of documents (in other words,
> it defines the unique business key); values must have this pattern:*
>
> *  « ^[0-9]{3}-[0-9]{1}-[0-9]{2}-[0-9]{6}-[0-9]{1}$ »*
>
> *"title" — string*
>
> *"year" — number*
> *must be a positive integral value*
>
> *"authors" — array of objects;*
> *must be at least one object*
>
> *"genre" — string*
>
> *Each object in the "authors" array object may use only these keys:*
>
> *"family name" — string*
>
> *"given name" — string*
>
> *String values other than for "isbn" are unconstrained.*
>
> *Any key other than the seven listed here is illegal. The "genre" and
> "given name" keys are not required. All the other keys are required.*
>
> *The meaning of *required* is that no extracted value must bring a SQL
> null (so a required key must not have a JSON null value).*
>
> *And the meaning of *not required* is simply "no information is available
> for this key" (with no nuances). The spec author goes further by adding a
> rule: this meaning must be expressed by the absence of such a key.*
>
>
> */* — END OF SPEC  */*
>

create temp table source(
isbn text primary key,
book_info_text text,
book_info jsonb generated always as ( book_info_text::jsonb ) stored
CONSTRAINT
test_jsonb_constraints1 check (book_info_text is json)
CONSTRAINTtest_jsonb_constraints2 check
(JSON_EXISTS(book_info_text::jsonb,'$.title') )
CONSTRAINTtest_jsonb_constraints3 check
(JSON_VALUE(book_info_text::jsonb,'$.year' returning int) > 0)
CONSTRAINTtest_jsonb_constraints4 check
(JSON_EXISTS(book_info_text::jsonb,'$.genre'))
CONSTRAINTtest_jsonb_constraints5 check (not
JSON_EXISTS(book_info_text::jsonb,'$.not_as_toplevel_key'))
CONSTRAINTtest_jsonb_constraints6 check (
(JSON_VALUE(book_info_text::jsonb,'$.authors[*]."family name"')
is not null)
OR
(JSON_VALUE(book_info_text::jsonb,'$.authors[*]."given name"' )
is not null)
 )
);

Some of the problems I don't know how to solve. My intuition feels like
that isbn attribute in the json document column then enforcing the unique
constraint would be anti-pattern. So I put the isbn outside as a separate
column.
Another constraint is that there are only certain keys  in the jsonb. I
don't know how to implement it. But I feel like it's do-able.
two columns, one text, another generated column stored jsonb, So there is a
duplication issue.

So there is another alternative way to do it.
normal relation tables, insert is done via json_table construct convert
json to table. output can be easily done with row_to_json.

For example:

> select * from json_table('{
>   "title"   : "Joy Luck Club",
>   "year": 2006,
>   "authors" : [{"given name": "Amy", "family name" : "Tan"}],
>   "genre"   : "Novel"
>   }'::jsonb,
> '$'
> COLUMNS(
> id for ordinality,
> title text path '$.title',
> year int path '$.year',
> genre text path '$.genre',
> nested path '$.authors[*]'
> columns(
> "given name" text path '$."given name"'
> ,"family name" text path '$."family name"'
> )
> )
> );
>







-- 
 I recommend David Deutsch's <>

  Jian


Index creation

2022-06-19 Thread Дмитрий Иванов
Good afternoon.
I have a query parser question. If there are two kinds of queries using an
indexed field. In this case, one view is limited to this field, the second
one uses a number of fields included in the index by the include directive.
It makes sense to have two indexes, lightweight and containing include. Or
will the plan rely on the nearest suitable index without considering its
weight?
--
Regards, Dmitry!


Re: Index creation

2022-06-19 Thread David G. Johnston
On Sun, Jun 19, 2022 at 12:06 PM Дмитрий Иванов 
wrote:

> Good afternoon.
> I have a query parser question. If there are two kinds of queries using an
> indexed field. In this case, one view is limited to this field, the second
> one uses a number of fields included in the index by the include directive.
> It makes sense to have two indexes, lightweight and containing include. Or
> will the plan rely on the nearest suitable index without considering its
> weight?
>
>
The system should avoid the larger sized index unless it will sufficiently
benefit from the Index Only Scan that such a larger covering index is
supposed to facilitate.

David J.


Outer joins and NULLs (old subject "ERROR: failed to find conversion function from key_vals_nn to record[]")

2022-06-19 Thread Bryn Llewellyn
> b...@yugabyte.com wrote:
> 
>> david.g.johns...@gmail.com  wrote:
>> 
>>> b...@yugabyte.com  wrote:
>>> 
>>> Can anybody show me an implementation of a realistic use case that follows 
>>> proper practice — like "every table must a primary key", "a foreign key 
>>> must refer to a primary key", and "joins may be made only "on" columns one 
>>> of which has a PK constraint and the other of which has a FK constraint" — 
>>> where using a not nullable data type brings a problem that wouldn't occur 
>>> if the column were defined with a nullable data type and an explicit "not 
>>> null" constraint?
>> 
>> Nothing obvious comes to mind. But frankly:
>> 
>> proper practice includes trying to write idiomatic code for the language you 
>> are using so others familiar with the language can learn your code more 
>> easily. You are violating this to an extreme degree.
>> 
>> I do not think it to be a good trade-off. SQL writers are practical people 
>> and the idioms largely avoid any downsides that the arise from SQL not being 
>> some paragon of language design.
>> 
>> -- "\d genres" shows "gk" with a "not null" constraint, whether I write it
>> -- or not. And convention seems to say "don't clutter you code by writing 
>> it".
>> create table genres(
>>   gk  int   primary key,
>>   gv  text  not null
>>   );
>> 
>> "Primary Key" is defined to be the application of both UNIQUE and NOT NULL 
>> constraints...
> 
> Yes, I know what "primary key" implies. I meant only to emphasize that the 
> source column for what the "outer join" projects has a not null constraint, 
> that it doesn't apply to the projection of that column, that this is 
> perfectly understandable, and that this isn't a problem. Never mind.

Peter, your email:

www.postgresql.org/message-id/20220618064453.wtz4hxyeptwdh37z%40hjp.at

ended up in my junk folder. (This happens often, but randomly, with emails sent 
to pgsql-general—and I can't control it.) I read it quickly. And then when I 
returned to read it more carefully I managed to delete it—irrevocably.

The discussion has wandered so far from the original topic that it seemed just 
as well to start a new thread with this.

> hjp-pg...@hjp.at wrote:
> 
> But it would be a problem if there was an actual type which wouldn't include 
> NULL.
> 
> The NOT NULL attribute is an attribute of the column, not the type. When you 
> use the primary key (or any other column marked as NOT NULL) the type of the 
> result is just the type of that column, the NOT NULL is dropped.

It seems to me that my new tescase reveals one clear bug and one outcome that 
seems to me to be a bug. Others might argue that neither is a bug. Of course, 
I'll be happy to be corrected—especially if I did some typos in my SQL or 
misinterpreted what I saw.



Thank you for that example. It stimulated me to think harder than I had before 
and to design a brand new testcase. I didn't need to refer to your example when 
I wrote what follows.

I'm using the term "view" here as a shorthand for « the results from a "select" 
». And I used real views in my testcase to make the programming easier.

I copied my self-contained testcase below so that you can simply run it "as 
is". (You'll have to add the "drop" statements that you need.)

Here's my interpretation of the testcase output:

Self-evidently, a view does *not* inherit constraints from the columns of its 
base table(s).

A view on a single table doesn't necessarily inherit the data types of its base 
table's columns. Rather, the view compilation's analysis is *sometimes* clever 
enough to notice when a projected column might have a NULL even when the base 
column doesn't allow NULLs. In this case, if the base column's data type is (in 
my example) the domain "text_nn", then the corresponding column in the view is 
given the data type plain "text". My test that uses a single table shows this.

However, the compilation's analysis for a view on a join (at least when it's a 
two-table "outer join") slavishly inherits the data types from all of the 
referenced columns—even when the human can easily predict that some projected 
columns might have NULLs.

It seems to me that this optimistic design choice was unfortunate—and that a 
pessimistic choice would have been better:

— when the analysis cannot predict the outcome, replace the data type of 
*every* column that has a "not null domain" data type with the domain's base 
data type.

But I accept that this cannot be changed now. Might it be possible (in some 
future PG release) to make the analysis clever enough to deal with the issue at 
hand (as it already does in my single-table example)?

With "insert-select", you (self-evidently) define the data types and 
constraints of the target table's columns explicitly, reflecting your analysis 
of what you expect. Of course, then, the "insert-select" m

Re: '{"x": 42, "y": null}'::jsonb != '{"x": 42}'::jsonb ... Really?

2022-06-19 Thread Bryn Llewellyn
> jian.universal...@gmail.com wrote:
> 
>> b...@yugabyte.com wrote:
>> 
>> /* — START OF SPEC —— */
>> 
>> The document's top-level object may use only these keys:
>> 
>> ...
>> 
>> And the meaning of *not required* is simply "no information is available for 
>> this key" (with no nuances). The spec author goes further by adding a rule: 
>> this meaning must be expressed by the absence of such a key.
>> 
>> 
>> /* — END OF SPEC  */
> 
> create temp table source(
>   ...
>   CONSTRAINT test_jsonb_constraints1 check (book_info_text is json)
>   CONSTRAINT test_jsonb_constraints2 check 
> (JSON_EXISTS(book_info_text::jsonb,'$.title') )
>   ...
>   )
> );
> 
> Some of the problems I don't know how to solve. My intuition feels like that 
> isbn attribute in the json document column then enforcing the unique 
> constraint would be anti-pattern. So I put the isbn outside as a separate 
> column. Another constraint is that there are only certain keys in the jsonb. 
> I don't know how to implement it. But I feel like it's do-able.
> two columns, one text, another generated column stored jsonb, So there is a 
> duplication issue...
> 
> So there is another alternative way to do it...

Thank you very much for your suggestions, Jian. I'll try them and think 
carefully about how everything then looks over the next week or so.

Meanwhile, I hope that it's clear to all that I have nothing more than 
prototype code on my own laptop. I can change any aspect of it in no time when 
I come to see better alternatives. (In particular, I can easily heave out my 
use of "domains with not null constraints".)

My code, as it stands, does meet the goals that I set for it (esp. by meeting 
the "JSON → relational → JSON" idempotency requirement). I also handle the 
"required", or "not required", status of the attributes of the objects (and 
only specified keys present) quite easily. But I don't see any sign of "only 
specified keys present" in your code.

I haven't tried my prototype with large volumes of synthetic data—and nor have 
I considered performance at all. That can come later. (But with my current 
implementation, my tiny end-to-end test kit completes crazily quickly.)

I probably didn't say out loud that the kinds of updates and queries that the 
four table Codd-and-Date representation suggests can be supported with fairly 
ordinary SQL and not too much thought. However (at least as it feels to me), 
the corresponding operations on the native "jsonb" representation would be 
harder to design and write. Moreover, *any* change implies updating all of the 
indexes and re-checking all of the constraints.

In other words, my aim here is to treat JSON in the way that first motivated it 
(as a data transport format) and *not* in the way that it's often used (as a 
bucket for a set of noticeably heterogeneous documents). This is where the the 
"JSON → relational → JSON" idempotency requirement comes from.

Sorry if I didn't make this clear.



Re: '{"x": 42, "y": null}'::jsonb != '{"x": 42}'::jsonb ... Really?

2022-06-19 Thread Thomas Kellerer

Bryn Llewellyn schrieb am 04.06.2022 um 03:41:

Am I missing a use case where an object with a key-value pair with a
JSON null value is meaningfully different from one where the key is
simply absent?

It seems the JSON specification doesn't actually define equality.
But the JSON patch RFC 6902[1] defines the equality of two objects
as:

  objects: are considered equal if they contain the same number of
  members, and if each member can be considered equal to a member in
  the other object, by comparing their keys (as strings) and their
  values (using this list of type-specific rules).

As {"x": 42, "y": null} and {"x": 42} do not contain the same number
of members, I think Postgres' behaviour is correct.

Thomas


[1] https://www.rfc-editor.org/rfc/rfc6902#section-4.6




Re: Outer joins and NULLs (old subject "ERROR: failed to find conversion function from key_vals_nn to record[]")

2022-06-19 Thread Tom Lane
Bryn Llewellyn  writes:
> Self-evidently, a view does *not* inherit constraints from the columns of its 
> base table(s).

Check.

> A view on a single table doesn't necessarily inherit the data types of its 
> base table's columns. Rather, the view compilation's analysis is *sometimes* 
> clever enough to notice when a projected column might have a NULL even when 
> the base column doesn't allow NULLs. In this case, if the base column's data 
> type is (in my example) the domain "text_nn", then the corresponding column 
> in the view is given the data type plain "text". My test that uses a single 
> table shows this.

This is nonsense.  The parser does not account for domain constraints in
that way.  It would be incorrect to do so, because then the view's data
types could need to change as a consequence of adding/dropping domain
constraints.  I think that your result is actually just an illustration
of the rules in
https://www.postgresql.org/docs/current/typeconv-union-case.html
about how the output type of a CASE expression is determined ---
specifically, that domains are smashed to base types as soon as
the CASE arms are discovered to not be all of the same type.

regards, tom lane




Re: Outer joins and NULLs (old subject "ERROR: failed to find conversion function from key_vals_nn to record[]")

2022-06-19 Thread David G. Johnston
On Sun, Jun 19, 2022 at 2:31 PM Bryn Llewellyn  wrote:

> It would be foolish, therefore, to define the target table for
> "insert-select" using "CTAS where false".
>

SQL is a strongly typed language where the structure of the query output is
determined without any consideration of whether said query returns zero,
one, or many rows.  Because of this property it is entirely consistent that
a CTAS produces a table even if the query execution produces zero rows.

That CTAS chooses to not try and produce constraints on the newly created
table by inferring them from the underlying query seems like a reasonable
trade-off between functionality and effort.  It simply stops at "data
types" and doesn't care for how any given one is implemented.  That domains
have constraints is incidental to the entire design.

Allowing domains to be defined as not null at this point is simply
something that we (IMO, the documentation is not this strongly worded)
don't support but don't error out upon in the interest of backward
compatibility.  It, as you note, has some corner-case bugs.  You can avoid
those bugs by simply not using a non-null constraint as suggested.

As for the "same source" optimization: the documentation reads - "For
example, this can happen in an outer-join query,...", the bug-fix here is
to simply add this situation as a second example.  However, it is
reasonably considered correct that a record you just read from a table
should be able to be written back to said table unchanged.  The "error" is
that we allowed the record to exist in the first place, but we absolved
ourselves of responsibility with the caveats on the CREATE DOMAIN page.
Subsequent consequences of that mis-use are likewise attributed to said
mis-use and are on the user's head for allowing their code to produce the
problematic behavior.

At most we should probably go from saying "Best practice therefore..." to
"We no longer support setting a not null constraint on a domain but will
not error in the interest of not breaking existing uses that are careful to
avoid the problematic corner-cases".

David J.


Re: '{"x": 42, "y": null}'::jsonb != '{"x": 42}'::jsonb ... Really?

2022-06-19 Thread Bryn Llewellyn
> sham...@gmx.net wrote:
> 
>> b...@yugabyte.com wrote:
>> 
>> Am I missing a use case where an object with a key-value pair with a JSON 
>> null value is meaningfully different from one where the key is simply absent?
> 
> It seems the JSON specification doesn't actually define equality. But the 
> JSON patch RFC 6902:
> 
> www.rfc-editor.org/rfc/rfc6902#section-4.6
> 
> defines the equality of two objects as:
> 
> «
> objects: are considered equal if they contain the same number of members, and 
> if each member can be considered equal to a member in the other object, by 
> comparing their keys (as strings) and their values (using this list of 
> type-specific rules)
> »
> 
> As {"x": 42, "y": null} and {"x": 42} do not contain the same number of 
> members, I think Postgres' behaviour is correct.

Thanks. I believe that I addressed this in a separate branch in this thread. 
The distinction (between « "some key": null » and "some key" absent), which by 
all means matters in some contexts, does not matter w.r.t. the requirements of 
my (demo) app. And I've now proved (to myself, at least) that I can meet my 
goal easily with existing Postgres features—especially "jsonb_strip_nulls()".

I hope that this means that we're all happy.



A error happend when I am clone the git repository

2022-06-19 Thread Wen Yi
When I reinstall my system,I clone the postgresql repository,but there's 
something wrong.
The console print as these:

[beginnerc@fedora Research]$ git clone 
https://git.postgresql.org/git/postgresql.git
Cloning into 'postgresql'...
remote: Enumerating objects: 30747, done.
remote: Counting objects: 100% (30747/30747), done.
remote: Compressing objects: 100% (13431/13431), done.
error: RPC failed; curl 92 HTTP/2 stream 3 was not closed cleanly before end of 
the underlying stream
fetch-pack: unexpected disconnect while reading sideband packet
fatal: early EOF
fatal: fetch-pack: invalid index-pack output

I not clone the repository from github,directly from 
git.postgresql.org
I try it again and again,but failed.
Eventually I have to download the source from the offical website.
My system is Fedora 36 and my git version is 2.36.1
I check the download speed,that is about 220kb/s.

I have ask this problem in the slack group,and now I report this to the mail 
list,please check it,thanks very much!



Re: A error happend when I am clone the git repository

2022-06-19 Thread Adrian Klaver

On 6/19/22 20:33, Wen Yi wrote:
When I reinstall my system,I clone the postgresql repository,but there's 
something wrong.

The console print as these:

|[beginnerc@fedora Research]$ git clone 
||https://git.postgresql.org/git/postgresql.git 
|

|Cloning into 'postgresql'...|
|remote: Enumerating objects: 30747, done.|
|remote: Counting objects: 100% (30747/30747), done.|
|remote: Compressing objects: 100% (13431/13431), done.|
|error: RPC failed; curl 92 HTTP/2 stream 3 was not closed cleanly 
before end of the underlying stream|

|fetch-pack: unexpected disconnect while reading sideband packet|
|fatal: early EOF|
|fatal: fetch-pack: invalid index-pack output|

I not clone the repository from github,directly fromgit.postgresql.org 


I try it again and again,but failed.
Eventually I have to download the source from the offical website.
My system is Fedora 36 and my git version is 2.36.1
I check the download speed,that is about 220kb/s.


I tried it on two different machines with Git 2.30.2 and 2.25.1 and it 
worked.


I have to believe it is related to bleeding edge OS Fedora 36 in 
combination with latest Git 2.36.1.




I have ask this problem in the slack group,and now I report this to the 
mail list,please check it,thanks very much!





--
Adrian Klaver
adrian.kla...@aklaver.com




Re: A error happend when I am clone the git repository

2022-06-19 Thread Tom Lane
Adrian Klaver  writes:
> On 6/19/22 20:33, Wen Yi wrote:
>> |[beginnerc@fedora Research]$ git clone 
>> ||https://git.postgresql.org/git/postgresql.git 
>> |
>> |Cloning into 'postgresql'...|
>> |remote: Enumerating objects: 30747, done.|
>> |remote: Counting objects: 100% (30747/30747), done.|
>> |remote: Compressing objects: 100% (13431/13431), done.|
>> |error: RPC failed; curl 92 HTTP/2 stream 3 was not closed cleanly 
>> before end of the underlying stream|

> I have to believe it is related to bleeding edge OS Fedora 36 in 
> combination with latest Git 2.36.1.

No, I think it's more about this:

>> I check the download speed,that is about 220kb/s.

I've seen this failure multiple times on very slow machines.
I think there's some sort of connection timeout somewhere in
the git.postgresql.org infrastructure, causing a "git clone"
that takes more than a couple of minutes to fail.  I've
complained about it before, but we've not isolated the cause.

regards, tom lane




Re: Index creation

2022-06-19 Thread Дмитрий Иванов
Your statement seems obvious to me. But what I see doesn't seem like a
conscious choice. It turns out that it is better to have a lighter
general-purpose index than to strive to create a target covering index for
a certain kind of operation.

DROP INDEX IF EXISTS bpd.index_class_prop_id_prop_inherit;
CREATE INDEX IF NOT EXISTS index_class_prop_id_prop_inherit
ON bpd.class_prop USING btree
(id_prop_inherit ASC NULLS LAST, timestamp_class_inherit ASC NULLS LAST)
TABLESPACE pg_default;



DROP INDEX IF EXISTS bpd.index_class_prop_id_prop_inherit_covering;
CREATE INDEX IF NOT EXISTS index_class_prop_id_prop_inherit_covering
ON bpd.class_prop USING btree
(id_prop_inherit ASC NULLS LAST, timestamp_class_inherit ASC NULLS LAST)
INCLUDE(id, id_class, inheritance)
TABLESPACE pg_default;

Uchet=# SELECT relid, indexrelid, schemaname, relname, indexrelname,
idx_scan, idx_tup_read, idx_tup_fetch
Uchet-#
Uchet-# FROM bpd.cfg_v_stat_all_indexes WHERE indexrelname LIKE
'index_class_prop_id_prop_inherit%';
 relid | indexrelid | schemaname |  relname   |   indexrelname
   | idx_scan | idx_tup_read | idx_tup_fetch
---++++---+--+--+---
 17572 |  40036 | bpd| class_prop |
index_class_prop_id_prop_inherit  |0 |0 |
  0
 17572 |  40037 | bpd| class_prop |
index_class_prop_id_prop_inherit_covering | 7026 | 7026 |
  0
(2 rows)


DROP INDEX IF EXISTS bpd.index_class_prop_id_prop_inherit_covering;

CREATE INDEX IF NOT EXISTS index_class_prop_id_prop_inherit_covering
ON bpd.class_prop USING btree
(id_prop_inherit ASC NULLS LAST, timestamp_class_inherit ASC NULLS LAST)
INCLUDE(id, id_class, inheritance)
TABLESPACE pg_default;

DROP INDEX IF EXISTS bpd.index_class_prop_id_prop_inherit;

CREATE INDEX IF NOT EXISTS index_class_prop_id_prop_inherit
ON bpd.class_prop USING btree
(id_prop_inherit ASC NULLS LAST, timestamp_class_inherit ASC NULLS LAST)
TABLESPACE pg_default;

Uchet=# SELECT relid, indexrelid, schemaname, relname, indexrelname,
idx_scan, idx_tup_read, idx_tup_fetch FROM bpd.cfg_v_stat_all_indexes WHERE
indexrelname LIK
E 'index_class_prop_id_prop_inherit%';
 relid | indexrelid | schemaname |  relname   |   indexrelname
   | idx_scan | idx_tup_read | idx_tup_fetch
---++++---+--+--+---
 17572 |  40049 | bpd| class_prop |
index_class_prop_id_prop_inherit  | 6356 | 6356 |
  0
 17572 |  40048 | bpd| class_prop |
index_class_prop_id_prop_inherit_covering |0 |0 |
  0
(2 rows)
--
Regards, Dmitry!


пн, 20 июн. 2022 г. в 00:08, David G. Johnston :

> On Sun, Jun 19, 2022 at 12:06 PM Дмитрий Иванов 
> wrote:
>
>> Good afternoon.
>> I have a query parser question. If there are two kinds of queries using
>> an indexed field. In this case, one view is limited to this field, the
>> second one uses a number of fields included in the index by the include
>> directive. It makes sense to have two indexes, lightweight and containing
>> include. Or will the plan rely on the nearest suitable index without
>> considering its weight?
>>
>>
> The system should avoid the larger sized index unless it will sufficiently
> benefit from the Index Only Scan that such a larger covering index is
> supposed to facilitate.
>
> David J.
>