Re: Expr. extended stats are skipped with equality operator

2022-08-07 Thread Tomas Vondra



On 8/5/22 11:50, Danny Shemesh wrote:
> Hey all !
> 
> I'm on a quest to help the planner (on pg14) use the best of several
> partial, expressional indices we have on some large tables (few TBs in
> size, billions of records).
> 
> As we know, stats for expressions in partial indices aren't gathered by
> default - so I'm tinkering with expressional extended stats to cover for
> those.
> 
> I've tackled two interesting points there:
> 1. Seems like expressional stats involving the equality operator are
> skipped or mismatched (fiddle
> )
> Let's take the following naive example:
> /create table t1 (x integer[]);
> insert into t1 select array[1]::integer[] from generate_series(1,
> 10, 1);
> create statistics s1 on (x[1] = 1) from t1;
> analyze t1;
> /
> /explain analyze select * from t1 where x[1] = 1;/
> /> Seq Scan on t1 (cost=0.00..1986.00 rows=500 width=29) (actual
> time=0.009..36.035 rows=10 loops=1)/
> /
> /
> Now, of course one can just create the stat on x[1] directly in this
> case, but I have a more complex use case where an equality operator is
> beneficial; 
> should the above case be supported ? feels like I'm just missing
> something fundamental.
> 

Hmmm. The problem here is that the expression may be interpreted either
as an operator clause (A=B), or as a boolean expression ((A=B)=TRUE). In
principle we might check which option matches extended statistics, but
that's not for free :-(

So what the current code does is trying to match the more specific cases
first, leaving the "bool expression" as a last resort. That seems like a
reasonable trade off to me, but I'll think about this a bit more.

There are probably other ways to match expressions, and we can't
possibly explore all of them. For example you may create statistics on
(x=1 AND y=2) and I doubt we'll match that, because we'll try matching
individual clauses not some arbitrary combinations of clauses. (Maybe we
shouldn't even allow creating such statistics ...)

> 2. Less important, just a minor note - feel free to ignore - although
> the eq. operator above seems to be skipped when matching the ext. stats,
> I can work around this by using a CASE expression (fiddle
> );
> Building on the above example, we can:
> /create statistics s2 on (case x[1] when 1 then true else false end)
> from t1;/
> /explain analyze select * from t1 where (case x[1] when 1 then true else
> false end/
> />  Seq Scan on t1 (cost=0.00..1986.00 rows=10 width=25) (actual
> time=0.011..33.721 rows=10 loops=1)/
> /

Yes, because this end ups not being matches as opclause, and therefore
goes all the way to the boolvarsel() in clause_selectivity_ext.

> /
> What's a bit problematic here, though, is that if we mix other dependent
> columns to the extended stat, and specifically if we create an mcv, 
> queries involving the CASE expression throw with `error: unknown clause
> type 130`, where clause type == T_CaseExpr.
> 
> The second point for me would be that I've found it a bit non intuitive
> that creating an extended statistic can fail queries at query time; it
> makes sense that the mcv wouldn't work for case expressions, but it
> might've been a bit clearer to:
> 
> a. Fail this at statistic creation time, potentially, or 
> b. Convert the type numeric in the above error to its text
> representation, if we can extract it out at runtime somehow - 
> I couldn't find a mapping of clause type numerics to their names, and as
> the node tags are generated at compile time, it could be build-dependent
> and a bit hard to track down if one doesn't control the build flags
> 

Yeah, this seems like a clear bug - we should not fail queries like
this. It's a sign statext_is_compatible_clause() and the MCV code
disagrees which clauses are compatible.

Can you share an example triggering this?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Expr. extended stats are skipped with equality operator

2022-08-07 Thread Tom Lane
Tomas Vondra  writes:
> Yeah, this seems like a clear bug - we should not fail queries like
> this. It's a sign statext_is_compatible_clause() and the MCV code
> disagrees which clauses are compatible.

Indeed.  I attempted to fix that at e33ae53dd.

regards, tom lane




Re: Expr. extended stats are skipped with equality operator

2022-08-07 Thread Tomas Vondra
On 8/7/22 19:28, Tom Lane wrote:
> Tomas Vondra  writes:
>> Yeah, this seems like a clear bug - we should not fail queries like
>> this. It's a sign statext_is_compatible_clause() and the MCV code
>> disagrees which clauses are compatible.
> 
> Indeed.  I attempted to fix that at e33ae53dd.
> 

Thanks! That probably explains why I've been unable to reproduce that, I
haven't realized there's a fix already.



-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: PostgreSQL 14.4 ERROR: out of memory issues

2022-08-07 Thread Aleš Zelený
Hello,

The problem Joe spotted is that a variable name is misspelled in one
> place. It should (probably) be external_complete_id, not
> _external_complete_id).


Oh, copy-paste issue :-)


> > better solution proposal?
>
> I think you can make that clearer by using IS [NOT] DISTINCT FROM:
>
> SELECT ... simple join of two tables...
> WHERE opd.id_data_provider = _id_data_provider
> AND external_id IS NOT DISTINCT FROM _external_id
> AND external_complete_id IS NOT DISTINCT FROM _external_complete_id
> ;
>
> However, your version may be faster, as last time I looked (it's been
> some time) the optimizer wasn't especially good at handlung DISTINCT
> FROM (probably because it's so rarely used).g
>
> Distinct from was the original version, but it ignores indexes, 130ms  vs
0.3 ms, you are absolutely correct.

Kind regards
Ales Zeleny