On 26.10.2023 22:58, Robert Haas wrote:
On Thu, Oct 26, 2023 at 3:47 PM Alena Rybakina
<a.rybak...@postgrespro.ru> wrote:
With small amounts of "OR" elements, the cost of orexpr is lower than with
"ANY", on the contrary, higher.
Alexander's example seems to show that it's not that simple. If I'm
reading his example correctly, with things like aid = 1, the
transformation usually wins even if the number of things in the OR
expression is large, but with things like aid + 1 * bid = 1, the
transformation seems to lose at least with larger numbers of items. So
it's not JUST the number of OR elements but also what they contain,
unless I'm misunderstanding his point.
Yes, I agree, with Alexander's example, this option will not help and
here I need to look inside Expr itself. But I noticed that such a
complex non-constant expression is always an OpExpr type, otherwise if
the non-constant part contains only one variable, then it is a Var type.
We can add a constraint that we will transform expressions with the
simple variables like x=1 or x=2 or x=3, etc., but expressions like
x*1+y=1 or x*2+y=2... we ignore.
But then, we do not consider expressions when the nonconstant part is
always the same for expressions. For example, we could transform x*1+y=1
or x*1+y=2... to x*1+y = ANY([1,2,...]). But I think it's not so
critical, because such cases are rare.
Index Scan using pg_class_oid_index on pg_class (cost=0.27..2859.42 rows=414
width=68) (actual time=1.504..34.183 rows=260 loops=1)
Index Cond: (oid = ANY (ARRAY['1'::oid, '2'::oid, '3'::oid, '4'::oid,
'5'::oid, '6'::oid, '7'::oid,
Bitmap Heap Scan on pg_class (cost=43835.00..54202.14 rows=414 width=68)
(actual time=39.958..41.293 rows=260 loops=1)
Recheck Cond: ((oid = '1'::oid) OR (oid = '2'::oid) OR (oid = '3'::oid) OR
(oid = '4'::oid) OR (oid =
I think we could see which value is lower, and if lower with expressions converted to
ANY, then work with it further, otherwise work with the original "OR"
expressions. But we still need to make this conversion to find out its cost.
To me, this sort of output suggests that perhaps the transformation is
being done in the wrong place. I expect that we have to decide whether
to convert from OR to = ANY(...) at a very early stage of the planner,
before we have any idea what the selected path will ultimately be. But
this output suggests that we want the answer to depend on which kind
of path is going to be faster, which would seem to argue for doing
this sort of transformation as part of path generation for only those
paths that will benefit from it, rather than during earlier phases of
expression processing/simplification.
I'm not sure I have the full picture here, though, so I might have
this all wrong.
This would be the most ideal option, and to be honest, I like the
conversion at an early stage also because there are no problems with
selectivity or link updates if we changed the structure of RestrictInfo
of relation.
But in terms of calculating which option is better to use transformed or
original, I think this solution might be complicated, since we need not
only to highlight the cases in which the transformation wins in
principle, but also with which types of data it will work best and there
is a risk of missing some cases and we may need the own evaluation
model. Now it's hard for me to come up with something simple.
The cost option seems simpler and clearer to me, but yes, it is
difficult to decide when it is better to do the conversion for the most
correct estimate.
--
Regards,
Alena Rybakina