On Wed, May 1, 2024 at 9:47 PM Tom Lane <t...@sss.pgh.pa.us> wrote:

> David Rowley <dgrowle...@gmail.com> writes:
> > Let's bash it into shape a bit more before going any further on actual
> wording.
>
> FWIW, I want to push back on the idea of making it a tutorial section.
> I too considered that, but in the end I think it's a better idea to
> put it into the "main" docs, for two reasons:
>
>
Version 2 attached.  Still a draft, focused on topic picking and overall
structure.  Examples and links planned plus the usual semantic markup stuff.

I chose to add a new sect1 in the user guide (The SQL Language) chapter,
"Data".  Don't tell Robert.

The "Data Basics" sub-section lets us readily slide this Chapter into the
main flow and here the NULL discussion feels like a natural fit.  In
hindsight, the lack of a Data chapter in a Database manual seems like an
oversight.  One easily made because we assume if you are here you "know"
what data is, but there is still stuff to be discussed, if nothing else to
establish a common understanding between us and our users.

David J.
From 7798121992154edab4768d7eab5a89be04730b2f Mon Sep 17 00:00:00 2001
From: "David G. Johnston" <david.g.johns...@gmail.com>
Date: Wed, 1 May 2024 07:45:48 -0700
Subject: [PATCH] Document NULL

---
 doc/src/sgml/data.sgml     | 169 +++++++++++++++++++++++++++++++++++++
 doc/src/sgml/filelist.sgml |   1 +
 doc/src/sgml/postgres.sgml |   1 +
 3 files changed, 171 insertions(+)
 create mode 100644 doc/src/sgml/data.sgml

diff --git a/doc/src/sgml/data.sgml b/doc/src/sgml/data.sgml
new file mode 100644
index 0000000000..2b09382494
--- /dev/null
+++ b/doc/src/sgml/data.sgml
@@ -0,0 +1,169 @@
+<chapter id="data">
+ <title>Data</title>
+
+ <para>
+  This chapter provides definitions for, and an overview of, data.
+  It discusses the basic design of the metadata related to values
+  and then goes on to describe the special value NULL which typically
+  represents &quot;unknown&quot;
+ </para>
+
+ <sect1 id="data-basics">
+  <title>Data Basics</title>
+  <para>
+   All literals, columns, variables, and expression results in PostgreSQL
+   are typed, which are listed in the next chapter.  Literals and columns
+   must only use one of the concrete types while variables can use
+   either a concrete type or a pseudo-type.  Expression results
+   are limited to concrete types and the pseudo-type record described below.
+  </para>
+  <para>
+   The pseudo-types prefixed with &quot;any&quot; implement polymorphism
+   in PostgreSQL.  Polymorphism allows a single function specification
+   to act on multiple concrete types.  At runtime, the function body
+   associates concrete types to all polymorphic types based upon the
+   conrete argument of its inputs. See ... for more details.
+  </para>
+  <para>
+   The record pseudo-type is also polymorphic in nature but allows
+   the caller of the function to specify the row-like structure of
+   output within the containing query. See ... for more details.
+   The ROW(...) expression (see ...) will also produce a record
+   result comprised of the named columns.
+  </para>
+ </sect1>
+
+  <sect1 id="data-null">
+   <title>Unknown Values (NULL)</title>
+   <para>
+    This section first introduces the meaning of NULL and then goes
+    on to explain how different parts of the system behave when faced
+    with NULL input.
+   </para>
+
+  <sect2 id="data-null-model">
+   <title>NULL in Data Models</title>
+   <para>
+    Generally NULL is assumed to mean "unknown".  However,
+    in practice meaning comes from context and so a model design may state that
+    NULL is to be used to represent "not applicable" - i.e., that a value is not
+    even possible.  SQL has only the single value NULL while there are multiple
+    concepts that people have chosen to apply it to.  In any case the behavior
+    of the system when dealing with NULL is the same regardless of the meaning
+    the given to it in the surrounding context.
+   </para>
+   <para>
+    NULL also takes on a literal meaning of "not found" when produced as the
+    result of an outer join.
+   </para>
+  </sect2>
+
+  <sect2 id="data-null-usage">
+   <title>NULL Usage</title>
+   <para>
+    As NULL is treated as a data value it, like all values, must have
+    a data type.  NULL is a valid value for all data types.
+   </para>
+
+   <para>
+    A NULL literal is written as unquoted NULL.  Its type is unknown but
+    can be cast to any concrete data type.  The [type 'string'] syntax
+    however will not work as there is no way to express NULL using single
+    quotes and unlike.
+   </para>
+
+   <para>
+    The presence of NULL in the system results in three-valued logic.
+    In binary logic every outcome is either true or false.  In
+    three-valued logic unknown, represented using a NULL value, is
+    also an outcome.  Aspects of the system that branch based upon
+    whether a condition variable is true or false thus must also
+    decide how to behave when then condition is NULL.  The remaining
+    sub-sections summarize these decisions.
+   </para>
+  </sect2>
+
+  <sect2 id="data-null-cardinalrule">
+    <title>The Cardinal Rule of NULL</title>
+   <para>
+    The cardinal rule, NULL is never equal or unequal to any non-null
+    value (or itself).  [NULL = anything yields NULL].
+    Checking for NULL has an explicit test documented
+    [here] and additionally there are distinctness tests that treat NULL like
+    a value equal to itself and unequal to any other value.
+   </para>
+   <para>
+    There is also a cardinal warning: when dealing with composite types
+    [see ...] the expressions; composite IS NULL and composite IS NOT NULL
+    are not the opposites of each other in the case where some of the
+    composite's fields are NULL.  Write NOT(composite IS NULL) instead.
+   </para>
+  </sect2>
+
+  <sect2 id="data-null-opertions">
+   <title>NULLs in Operations</title>
+   <para>
+    As a general expectation, operator invocation expressions where one of inputs
+    will result in a NULL output.
+    [add: 1 + NULL yields NULL]
+    [concatenate: 'text' || NULL yields NULL]
+    Operators that behave otherwise should document their deviation from this norm.
+   </para>
+  </sect2>
+
+  <sect2 id="data-null-functions">
+   <title>NULLs in Normal Function Calls</title>
+   <para>
+    Function specifications has a "strictness" attribute that, when set to "strict"
+    (a.k.a. "null on null input") will tell the executor to return NULL for any
+    function call having at least one NULL input value, without executing the
+    function.  Most SQL standard functions behave strictly, and in many cases
+    non-standard functions may exist that do not.
+   </para>
+  </sect2>
+
+  <sect2 id="data-null-aggregates">
+   <title>NULLs in Aggregate and Window Processing</title>
+   <para>  (needs research)
+    When executing an aggregate or window function the state tracking
+    component will remain unchanged even if the underlying processing
+    function returns NULL, whether from being defined strict (see above)
+    or simply returns a NULL value upon execution.
+   </para>
+  </sect2>
+
+  <sect2 id="data-null-filters">
+   <title>NULLs in Filters</title>
+   <para>
+    A WHERE clause that evaluates to NULL for a given row will exclude that row.
+   </para>
+  </sect2>
+
+  <sect2 id="data-null-constraints">
+   <title>NULLs in Constraints</title>
+   <para>
+    It is possible to define validation expressions on
+    tables that ensure only values passing those expressions are inserted.  While
+    this seems like it would behave the same as a filter, the choice here,
+    when an expression evaulates to NULL, is allow the row to be inserted.  See
+    [check constraints] in create table for details.
+   </para>
+   <para>
+    The NOT NULL column constraint is largely syntax sugar for the corresponding
+    column IS NOT NULL check constraint, though there are metadata differences
+    described in create table.
+   </para>
+  </sect2>
+
+  <sect2 id="data-null-grouping">
+   <title>NULLs When Grouping</title>
+   <para>
+    In the context of both DISTINCT and GROUP BY it is necessary that all inputs
+    resolve to being either equal to or not equal to all other values.  These features
+    use distinctness (see ...) instead of simple equality in order to handle
+    NULL like a definite value equal to itself and unequal to all other values.
+   </para>
+  </sect2>
+ </sect1>
+
+</chapter>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 38ec362d8f..2b4fc79b8a 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -14,6 +14,7 @@
 
 <!-- user's guide -->
 <!ENTITY array      SYSTEM "array.sgml">
+<!ENTITY data       SYSTEM "data.sgml">
 <!ENTITY datatype   SYSTEM "datatype.sgml">
 <!ENTITY ddl        SYSTEM "ddl.sgml">
 <!ENTITY dml        SYSTEM "dml.sgml">
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index ec9f90e283..bd0e4007ee 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -101,6 +101,7 @@ break is not needed in a wider output rendering.
   </partintro>
 
   &syntax;
+  &data;
   &ddl;
   &dml;
   &queries;
-- 
2.34.1

Reply via email to