Thanks for looking at it.

I think these two queries are equivalent, as shown by the explain.

In both cases the index scan only selects tuples with xx_season=3 as shown in 
both explains:

         Index Cond: (tmaster.t1_season = 3)
               Index Cond: (tfact.t2_season = 3)
So no tuple can have a null value for xx_season.

My point is the construction of the hash table, wich includes the t2_season 
even if it is constant and not null. From explain:

with overhead:
   Hash Cond: ((tmaster.t1_season = tfact.t2_season) AND (tmaster.t1_id_t2 = 
tfact.id_t2))

optimized:
   Hash Cond: (tmaster.t1_id_t2 = tfact.id_t2)

The planner correctly sets the index conditions (knows that the xx_season 
columns are constant), but fails to apply this constantness to the hash 
conditions by discarding a constant column in a hash table.

In my real application most of the xx_season columns are declared not null, but 
this should not change the outcome.

The performance difference is slightly lower when the created tables are 
previously analyzed (what I forgot).

But the percentual gain is much higher considering only the construction of the 
hash table, the only part of the query execution altered by this optimization.

In my opinion this scenario could be quite common in multi-tenant cases, in 
logging, time based data sets etc.

I tried to look at the pg source code but could not yet find the place where 
the hash conditions are selected and potentially tested.

When optimizing the constants away there my be a special case where all hash 
conditions are constants, so a hash table has not to be build (or at least one 
hash cond has to be preserved). 


Hans Buschmann



Reply via email to