> david.g.johns...@gmail.com wrote:
> 
>> b...@yugabyte.com wrote:
>> 
>> This is what the doc promises. But how can you see it as anything but a bug? 
>> The subquery evaluates to "null", and only then is the attempt made to 
>> create a new row which self-evidently violates the domain's constraint. How 
>> is it any different from this:
> 
> Because NULL is doing double-duty. The absence of any possible value is 
> represented by null, as is knowing that a value exists but not knowing what 
> that value is. The most obvious example of this problem is...
> 
> ...The resultant value of "b.lbl" is null in both queries, but not for the 
> same reason (left join missing semantics pre-empt null type value semantics).
> 
> So, yes, queries can produce NULL even for domains that are defined not null.
> 
> If you want protection from null values in your database tables you must 
> define your columns to not accept nulls. It becomes a habit and can be 
> readily checked for in the catalogs (I suggest adding a column comment for 
> why columns defined as null are allowed to be null, then you can query the 
> column contents to exclude those columns where null was intended - or go with 
> your idea and just report every column as non-conforming. COMMENT ON 
> table.column IS '@NULLABLE - optional information the customer might not 
> provide').

First off, thank you very much for the tip to use "comment on" together with 
catalog queries to police a practice policy. Meanwhile, I've noted the 
unsurprising fact that you can write a "not null" constraint in the "create 
table" statement that defines a column's data type using a domain that, too, 
has such a constraint.

Maybe a slightly more lightweight practice could be to use domains like 
"text_nn" and "text_uc" (for unconstrained text) and then in "create table" 
statements and similar add the explicit constraint for the "_nn" case. Using 
"text_uc" and similar would support searching the catalog, too. And both 
approaches are vulnerable to ordinary human error that would be very hard to 
detect. Typing an explanation in any kind of comment, including external prose 
doc, always has this character.

Back to NULLs...

Your code examples ran without error and produced the results that you 
described. I do understand the fact that, on its face,  the NULLs in the two 
cases arise for different reasons. But this (still) seems to me to be a 
distinction without a difference. It rather reminds me of my earlier discussion 
with you (all) about the distinction (in the world of JSON using "jsonb") 
between the presence of an object key "k" with the value "JSON null" and the 
absence of key "k".

The semantic proposition behind the "outer join", as it seems to me, is 
inextricably bound up with the notion that, in the resulting rows, one table 
might not have a partner row with the other. (It doesn't matter here which 
table lacks the partner or if you decide to spell your query so that "right" is 
the appropriate choice or "left" is—as long as you spell the whole thing 
correctly to express your intention.) And the "outer join" semantics bring the 
notion that you simply have no information about the facts that, were it 
present, the missing row might have given you. Whoever it was on the Committee 
back in the day, decided in concert to represent this "no information" outcome, 
in the relation that results for an "outer join", as an "emergent" SQL NULL.

I've talked endlessly about NULL, over the years and face-to-face, with 
colleagues whose reasoning ability and lucidity I hugely respect. They are 
unwavering in how they explain NULL. It says simply: "I have absolutely no 
information about the value that I sought to interrogate." And these experts 
argue that there are no flavors of the bare fact of having no information. They 
argue, too, that to say "the value of this variable (or row-column 
intersection) is NULL" is an oxymoron because the absence of information is not 
a value—in the purist sense.

I copied a reasonably realistic example, below, where the "outer join" is 
between two physical tables, "genres" and "books". They have properly defined 
PKs. And there's an FK that points in the direction that the table names 
suggest. Here's the requirement:

«
Each book may be labelled by exactly one (known) genre.
Each genre may label one or several books.
»

It's the canonical list-of-values use case. It's perfectly OK to have a book 
for whose genre there is simply (as yet) no information available. And it's 
also perfectly OK to have a genre that, as yet, labels no book.

The PK columns inevitably end up as "not null". And the FK column is 
deliberately nullable.

I avoided domains altogether, using explicit "not null" constraints, or not, as 
appropriate to the requirements set out above.

Notwithstanding the proper PK and FK declarations, I cannot stop the "genres" 
table having a row that has no "books" partner. And this is exactly what I want.

My example creates three "genres" rows. And it creates five "books" rows. Three 
are labelled by 'genre-1'; one is labeled by 'genre-2'; and one isn't (as yet) 
labelled at all.

Here's the result from the obvious query to characterize books—defined for easy 
re-use as the view "book_genre_facts":

 ISBN | Book title |   Genre   
------+------------+-----------
   10 | book-1     | genre-1
   20 | book-2     | genre-1
   30 | book-3     | genre-1
   40 | book-4     | genre-2
   50 | book-5     | «UNKNOWN»

And here's the result from the obvious query to characterize genres—defined for 
easy re-use as the view "genres_labelling_books":

 Genre # |  Genre  |    Book title    
---------+---------+------------------
       1 | genre-1 | book-1
       1 | genre-1 | book-2
       1 | genre-1 | book-3
       2 | genre-2 | book-4
       3 | genre-3 | «Labels no book»

I can easily produce «UNKNOWN» simply from the fact that the "genres" FK is 
NULL. And I can produce «Labels no book» because, following proper practice, 
I'm joining to the primary key of the "books" table. This means that David's 
"which kind of NULL is it?" isn't the question. Rather, understanding the 
general semantics of "outer" join, I need only to detect if a book's PK, in the 
join result, in NULL. (A table without a primary key violoates the rule of 
proper practice. This is why I found David's terse example too unrealistic to 
illustrate the issue at hand here.)

I said all this to emphasize that, in a realistic implementation of a realistic 
use case, it all falls out with no dilemmas.

Finally, this is what I get when I show the data type of a book's PK in the 
"outer join" result:

 Book PK | Data type of Book PK 
---------+----------------------
      10 | integer
      20 | integer
      30 | integer
      40 | integer
      ~~ | integer

I know that "Book PK" is the projection of a value from a column that has a PK 
constraint and therefore an implied "not null" constraint. But I'm seeing its 
projection in the result of an "outer join" that might miss the potential 
source row altogether. Ergo, I have no information. And it doesn't seem wrong 
that this is reflected by NULL *in the projection*, even though the source of 
what is projected is not nullable.

I can emphasize what I said by defining a domain "Not nullable integer" and 
using this as the data type of the PK of each of my tables. This is done in a 
heartbeat by making a small change to the code below. This is the result:

 Book PK |  Data type of Book PK  
---------+------------------------
      10 | "Not nullable integer"
      20 | "Not nullable integer"
      30 | "Not nullable integer"
      40 | "Not nullable integer"
      ~~ | "Not nullable integer"

But I can't see that this changes anything about the fact that I'm seeing the 
projection of a column in an "outer join" rather than the column itself. I 
might get a problem if I try to write the output of an "outer join" to a table 
that uses the data types of the columns in the result that I read with 
"pg_typeof()". But I'd see this as a failure to understand the semantics of 
"outer join", and a failure to think clearly about NULL, rather than a danger 
that flows from using, in this example, the domain "Not nullable integer" as 
the data type of a table column.

Can anybody show me an implementation of a realistic use case that follows 
proper practice — like "every table must a primary key", "a foreign key must 
refer to a primary key", and "joins may be made only "on" columns one of which 
has a PK constraint and the other of which has a FK constraint" — where using a 
not nullable data type brings a problem that wouldn't occur if the column were 
defined with a nullable data type and an explicit "not null" constraint?

————————————————————

\pset null '~~'

-- "\d genres" shows "gk" with a "not null" constraint, whether I write it
-- or not. And convention seems to say "don't clutter you code by writing it".
create table genres(
  gk  int   primary key,
  gv  text  not null
  );

insert into genres(gk, gv) values
  (1, 'genre-1'),
  (2, 'genre-2'),
  (3, 'genre-3');

create table books(
  bk  int   primary key,
  bv  text  not null,
  gk  int   references genres(gk) /* NULLABLE by design! */
  );

insert into books(bk, bv, gk) values
  (10, 'book-1', 1),
  (20, 'book-2', 1),
  (30, 'book-3', 1),
  (40, 'book-4', 2),
  (50, 'book-5', NULL)
  ;

create view book_genre_facts("ISBN", "Book title", "Genre") as
select
  b.bk,
  b.bv,
  case
    when (g.gv is not null) then g.gv
    else                         '«UNKNOWN»'
  end
from
  books b
  left outer join
  genres g
  on b.gk = g.gk;

select *
from book_genre_facts
order by 1;

create view genres_labelling_books("Genre #", "Genre", "Book title") as
select
  g.gk,
  g.gv,
  case
    when (b.bk is null) then '«Labels no book»'
    else                     b.bv
  end
from
  genres g
  left outer join
  books b
  on b.gk = g.gk;

select *
from genres_labelling_books
order by 1, 2;

select
  b.bk as "Book PK",
  pg_typeof(b.bk) as "Data type of Book PK"
from
  genres g
  left outer join
  books b
  on b.gk = g.gk
order by 1;

Reply via email to