Hi,

I've revisited the idea to somehow use foreign keys to do joins,
in the special but common case when joining on columns that exactly match a 
foreign key.

The idea is to add a new ternary operator, which would be allowed only in the 
FROM clause.

It would take three operands:

1) referencing_table_alias
2) foreign_key_constraint_name
3) referenced_table_alias

POSSIBLE BENEFITS

* Eliminate risk of joining on the wrong columns
Although probably an uncommon class of bugs, a join can be made on the wrong 
columns, which could go undetected if the desired row is included by 
coincidence, such as if the test environment might only contain a single row in 
some table, and the join condition happened to be always true.
By joining using the foreign key, it can be verified at compile time, that the 
referenced_table_alias is actually an alias for the table referenced by the 
foreign key. If some other alias would be given, an error would be thrown, to 
avoid failure.

* Conciser syntax
In a traditional join, you have to explicitly state all columns for the 
referencing and referenced table.
I think writing joins feels like you are repeating the same table aliases and 
column names over and over again, all the time.
This is especially true for multiple-column joins.
This is somewhat addressed by the USING join form, but USING has other 
drawbacks, why I tend to avoid it except for one-off queries.
When having to use fully-qualified table aliases, that adds even further to the 
verboseness.

* Makes abnormal joins stand out
If joining on something else than foreign key columns, or some inequality 
expression, such joins will continue to be written in the traditional way, and 
will therefore stand out and be more visible, if all other foreign key-based 
joins are written using the new syntax.
When reading SQL queries, I think this would be a great improvement, since the 
boring normal joins on foreign keys could be given less attention, and focus 
could instead be made on making sure you understand the more complex joins.

* Explicit direction of the join
In a traditional join on foreign key columns, it's not possible to derive if 
the join is a one-to-many or many-to-one join, by just looking at the SQL code 
itself. One must also know/inspect the data model or make assumptions based on 
the naming of columns and tables. This is perhaps the least interesting benefit 
though, since good naming makes the direction quite obvious anyway. But I think 
it at least reduces the total cognitive load of reading a SQL query.

POSSIBLE DRAWBACKS

* Another thing users would have to learn
* Would require changes to the SQL standard, i.e. SQL committee work
* Introduces a hard dependency on foreign keys, they cannot be dropped

SYNTAX

Syntax is hard, but here is a proposal to start the discussion:

    from_item join_type from_item WITH 
[referencing_table_alias]->[foreign_key_constraint_name] = 
[referenced_table_alias] [ AS join_using_alias ]

EXAMPLE

To experiment with the idea, I wanted to find some real-world queries written 
by others,
to see how such SQL queries would look like, using traditional joins vs foreign 
key joins.

I came up with the idea of searching Github for "LEFT JOIN", since just 
searching for "JOIN" would match a lot of non-SQL code as well.
Here is one of the first examples I found, a query below from the Grafana 
project [1]
[1] 
https://github.com/grafana/grafana/blob/main/pkg/services/accesscontrol/database/resource_permissions.go

SELECT
    p.*,
    ? AS resource_id,
    ur.user_id AS user_id,
    u.login AS user_login,
    u.email AS user_email,
    tr.team_id AS team_id,
    t.name AS team,
    t.email AS team_email,
    r.name as role_name
FROM permission p
    LEFT JOIN role r ON p.role_id = r.id
    LEFT JOIN team_role tr ON r.id = tr.role_id
    LEFT JOIN team t ON tr.team_id = t.id
    LEFT JOIN user_role ur ON r.id = ur.role_id
    LEFT JOIN user u ON ur.user_id = u.id
WHERE p.id = ?

Here is how the FROM clause could be rewritten:

FROM permission p
    LEFT JOIN role r WITH p->permission_role_id_fkey = r
    LEFT JOIN team_role tr WITH tr->team_role_role_id_fkey = r
    LEFT JOIN team t WITH tr->team_role_team_id_fkey = t
    LEFT JOIN user_role ur WITH ur->user_role_role_id_fkey = r
    LEFT JOIN "user" u WITH ur->user_role_user_id_fkey = u
WHERE p.id = 1;

In PostgreSQL, the foreign keys could also be given shorter names, since they 
only need to be unique per table and not per namespace. I think a nice 
convention is to give the foreign keys the same name as the referenced table, 
except if the same table is referenced multiple times or is self-referenced.

Rewriting our example, using such naming convention for the foreign keys:

FROM permission p
    LEFT JOIN role r WITH p->role = r
    LEFT JOIN team_role tr WITH tr->role = r
    LEFT JOIN team t WITH tr->team = t
    LEFT JOIN user_role ur WITH ur->role = r
    LEFT JOIN "user" u WITH ur->user = u
WHERE p.id = 1;

A better example to illustrate how conciseness is improved, would be one with 
lots of multi-column joins.
Please feel free to share better query examples to evaluate.

I cannot stop thinking about this idea, I really think it would greatly improve 
SQL as a language.
Foreign keys feels like such an underused valuable potential resource!
If someone can convince me this is a bad idea, that would help me forget about 
all of this,
so I would greatly appreciate your thoughts, no matter how negative or positive.

Thank you for digesting.

/Joel

PS.

To readers who might remember the old flawed version of this new idea:

In the old proposal, the third operand (referenced_table_alias) was missing.
There wasn't a way of specifying what table alias the join was supposed to be 
made against.
It was assumed the referenced table was always the one being joined in,
which is not always the case, since the referenced table
might already be in scope, and it's instead the referencing table which is 
being joined in.

Another problem with the old idea was you were forced to write the joins in a 
the same order
as the foreign keys, which often resulted in an awkward join order.

These two problems have now been solved with this new proposal.
Perhaps new problems have been introduced though?

Reply via email to