Re: Clarification on the column order of UNION, INTERSECT, and EXCEPT

2025-07-24 Thread David G. Johnston
On Thu, Jul 24, 2025, 09:22 PG Doc comments form 
wrote:

> The following documentation comment has been logged on the website:
>
> Page: https://www.postgresql.org/docs/17/queries-union.html
> Description:
>
> When combining queries with UNION, INTERSECT, or EXCEPT, columns are
> matched
> in the order in which they occur, and not by column name. This can be
> unexpected, and I think it would be helpful to call it out explicitly in
> the
> documentation.


The docs never imply that matching by name is a thing in SQL generally.
People make assumptions all the time and we generally don't try and write
to anticipate and disabuse people of random assumptions.  We instead
document how things work.  We don't document here that all column names in
both queries must be identical which is what your assumption would require.

The existing wording strongly implies that corresponding matches are done
by index.  Then concretely an example on that page unions columns, A and X,
this clearly showing that names are immaterial and, like the requirements
state, data type matching columns at each position are paired.

I could maybe see adding "position" somewhere but calling it out with a
warning seems unnecessary.  Even if one gets confused it is immediately
obvious when checking the result that there is an issue.

David J.


Clarification on the column order of UNION, INTERSECT, and EXCEPT

2025-07-24 Thread PG Doc comments form
The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/17/queries-union.html
Description:

When combining queries with UNION, INTERSECT, or EXCEPT, columns are matched
in the order in which they occur, and not by column name. This can be
unexpected, and I think it would be helpful to call it out explicitly in the
documentation.

Consider the following query:

create table widgets (x text, y text);
insert into widgets (x, y) values ('x', 'y');
(select x, y from widgets) union all (select y, x from widgets);

On Postges 17.5, this returns:

x y
─ ─
x y
y x

Note that it returns a value 'y' for column 'x', despite column x only
containing the value 'x'. This result makes sense if you think of rows as
anonymous tuples (ordered lists of values), but it can be surprising when
you expect rows to behave like structs/dictionaries with named fields.

This behavior can be especially tricky when columns x and y have the same
data type, such that confusing them does not trigger an error. In particular
when the values are similar (e.g. numeric data in the same range), it may
not be obvious from the result that data from different columns got mixed
up.

I read the documentation at

 - https://www.postgresql.org/docs/17/queries-union.html
 - https://www.postgresql.org/docs/17/typeconv-union-case.html
 - https://www.postgresql.org/docs/17/sql-select.html#SQL-UNION

These pages mention that the operands need to be compatible, e.g. ‘In order
to calculate the union, intersection, or difference of two queries, the two
queries must be “union compatible”, which means that they return the same
number of columns and the corresponding columns have compatible data types,
as described in Section 10.5.’ and ‘operands of the UNION must produce the
same number of columns, and corresponding columns must be of compatible data
types’. It does not explicitly state that “corresponding columns” means
“corresponding by index”, and not “corresponding by name”. I think it would
be helpful to clarify that, and maybe even warn about it.

Kind regards,
Ruud