Thank you for the great idea Tom. While yes I can't modify the original WHERE clause I do think I'll be able to introspect it or get the system generating it to tell me which columns it references and then add an OR x is NULL OR y is NULL ...
For context, just in case it's interesting, I store Parquet statistics in a Postgres table and run the output of this thing on them: https://github.com/apache/datafusion/blob/f92442ea8e8944c78f8e40d6648d049ff8e335ec/datafusion/physical-optimizer/src/pruning.rs#L146-L456 Hence why I can't really control the WHERE clause (at least not without re-implementing a bunch of finicky error prone code). On Wed, Dec 18, 2024 at 10:38 PM Tom Lane <t...@sss.pgh.pa.us> wrote: > "David G. Johnston" <david.g.johns...@gmail.com> writes: > > On Wednesday, December 18, 2024, Adrian Garcia Badaracco < > > adr...@adriangb.com> wrote: > >> Is there any way to include the rows where the predicate evaluates to > null > >> while still using an index? > > > ... A btree index, which handles =, can’t be told to behave > > differently and so cannot fulfill your desire to produce rows where the > > stored value is null; it can only produce those equal to 5000. > > Not in a single scan, no. But multiple scans are possible: > > regression=# create table t (id int unique); > CREATE TABLE > regression=# explain select * from t where id = 5000 or id is null; > QUERY PLAN > > > ------------------------------------------------------------------------------ > Bitmap Heap Scan on t (cost=8.42..18.98 rows=14 width=4) > Recheck Cond: ((id IS NULL) OR (id = 5000)) > -> BitmapOr (cost=8.42..8.42 rows=14 width=0) > -> Bitmap Index Scan on t_id_key (cost=0.00..4.25 rows=13 > width=0) > Index Cond: (id IS NULL) > -> Bitmap Index Scan on t_id_key (cost=0.00..4.16 rows=1 > width=0) > Index Cond: (id = 5000) > (7 rows) > > The OP was quite unclear about what semantics he wants for > multiple-variable WHERE clauses, but maybe something like this > would work: > > WHERE (original-clause) OR x IS NULL OR y IS NULL OR ... > > where each variable mentioned in original-clause is allowed > to also be NULL. Or perhaps what is wanted is > > WHERE (original-clause) OR (x IS NULL AND y IS NULL AND ...) > > ?? > > regards, tom lane >