On 3/26/20 4:58 PM, Li, Zheng wrote:
>BTW, so far as I can see, the only reason you're bothering with the whole
thing is to compare the size of the subquery output with work_mem, because
that's all that subplan_is_hashable does. I wonder whether that
consideration is even sti
You should do small rebase (conflict with 911e7020770) and pgindent of
the patch to repair problems with long lines and backspaces.
I am reviewing your patch in small steps. Questions:
1. In the find_innerjoined_rels() routine you stop descending on
JOIN_FULL node type. I think it is wrong beca
>BTW, so far as I can see, the only reason you're bothering with the whole
thing is to compare the size of the subquery output with work_mem, because
that's all that subplan_is_hashable does. I wonder whether that
consideration is even still necessary in the wake of 1f39bce02. If
"Li, Zheng" writes:
> * I find it entirely unacceptable to stick some planner temporary
> fields into struct SubLink. If you need that storage you'll have
> to find some other place to put it. But in point of fact I don't
> think you need it; it doesn't look to me to be critical
Hi Tom,
Thanks for the feedback.
* I find it entirely unacceptable to stick some planner temporary
fields into struct SubLink. If you need that storage you'll have
to find some other place to put it. But in point of fact I don't
think you need it; it doesn't look to me to be
On 1/7/20 12:34 AM, Li, Zheng wrote:
Hi Andrey,
Thanks for the comment!
The unimproved cases you mentioned all fall into the category “correlated
subquery”. This category is explicitly disallowed by existing code to convert
to join in convert_ANY_sublink_to_join:
/*
* The sub-se
Hi Andrey,
Thanks for the comment!
The unimproved cases you mentioned all fall into the category “correlated
subquery”. This category is explicitly disallowed by existing code to convert
to join in convert_ANY_sublink_to_join:
/*
* The sub-select must not refer to any Vars of the paren
At the top of the thread your co-author argued the beginning of this
work with "findings about the performance of PostgreSQL, MySQL, and
Oracle on various subqueries:"
https://antognini.ch/2017/12/how-well-a-query-optimizer-handles-subqueries/
I launched this test set with your "not_in ..." pa
On Wed, Jun 26, 2019 at 09:26:16PM +, Li, Zheng wrote:
> Let me know if you have any comments.
I have one: the latest patch visibly applies, but fails to build
because of the recent API changes around lists in the backend code.
So a rebase is in order. The discussion has not moved a iota in t
On 2019-Aug-02, Thomas Munro wrote:
> Hi Zheng, Jim,
>
> With my Commitfest doozer hat on, I have moved this entry to the
> September 'fest. I noticed in passing that it needs to be adjusted
> for the new pg_list.h API. It'd be good to get some feedback from
> reviewers on these two competing p
On Sat, Jun 29, 2019 at 4:19 AM Li, Zheng wrote:>
> Resending patch v2.2, looks like the previous submission did not get attached
> to the original thread.
>
> This version fixed an issue that involves CTE. Because we call
> subquery_planner before deciding whether to proceed with the transforma
On Wed, 6 Mar 2019 at 12:25, David Rowley wrote:
> That sounds fine. I'll take mine elsewhere since I didn't start this thread.
Moved to
https://www.postgresql.org/message-id/CAKJS1f82pqjqe3WT9_xREmXyG20aOkHc-XqkKZG_yMA7JVJ3Tw%40mail.gmail.com
--
David Rowley http://www.2ndQ
On Wed, 6 Mar 2019 at 03:37, Tom Lane wrote:
>
> David Steele writes:
> > I'm not sure if I have an issue with competing patches on the same
> > thread. I've seen that before and it can lead to a good outcome. It
> > case, as you say, also lead to confusion.
>
> It's a bit of a shame that the c
David Steele writes:
> I'm not sure if I have an issue with competing patches on the same
> thread. I've seen that before and it can lead to a good outcome. It
> case, as you say, also lead to confusion.
It's a bit of a shame that the cfbot will only be testing one of them
at a time if we lea
On Sun, 3 Mar 2019 at 01:34, Jim Finnerty wrote:
> I agree with Tom's comment above - when the cost of the NOT IN is dominated
> by the cost of the outer scan (i.e. when the cardinality of the outer
> relation is large relative to the cardinality of the subquery), and if the
> inner cardinality is
On 3/5/19 10:53 AM, David Rowley wrote:
On Tue, 5 Mar 2019 at 21:21, David Steele wrote:
On 2/27/19 2:26 AM, David Rowley wrote:
FWIW, I did add this to the March CF, but I set the target version to
13. I wasn't considering this for PG12. I see Zheng was, but I agree
with you on PG13 being t
On Tue, 5 Mar 2019 at 21:21, David Steele wrote:
>
> On 2/27/19 2:26 AM, David Rowley wrote:
> > FWIW, I did add this to the March CF, but I set the target version to
> > 13. I wasn't considering this for PG12. I see Zheng was, but I agree
> > with you on PG13 being the target for this.
>
> Looks
On 2/27/19 2:26 AM, David Rowley wrote:
On Wed, 27 Feb 2019 at 13:13, Tom Lane wrote:
"Li, Zheng" writes:
However, given that the March CommitFest is imminent and the runtime smarts
patent concerns David had pointed out (which I was not aware of before), we
would not move that direction at
On Mon, 4 Mar 2019 at 11:06, Tom Lane wrote:
>
> David Rowley writes:
> > On Mon, 4 Mar 2019 at 04:42, Tom Lane wrote:
> >> You absolutely will get errors during btree insertions and searches
> >> if a datatype's btree comparison functions ever return NULL (for
> >> non-NULL inputs).
>
> > I und
David Rowley writes:
> On Mon, 4 Mar 2019 at 04:42, Tom Lane wrote:
>> You absolutely will get errors during btree insertions and searches
>> if a datatype's btree comparison functions ever return NULL (for
>> non-NULL inputs).
> I understand this is the case if an index happens to be used, but
On Mon, 4 Mar 2019 at 04:42, Tom Lane wrote:
>
> David Rowley writes:
> > On Sun, 3 Mar 2019 at 17:11, Tom Lane wrote:
> >> (At the code level, this is implicit in the fact that the comparison
> >> function will be called via FunctionCall2Coll or a sibling, and those
> >> all throw an error if t
David Rowley writes:
> On Sun, 3 Mar 2019 at 17:11, Tom Lane wrote:
>> (At the code level, this is implicit in the fact that the comparison
>> function will be called via FunctionCall2Coll or a sibling, and those
>> all throw an error if the called function returns NULL.)
> Ah okay. I can get it
On Sun, 3 Mar 2019 at 17:11, Tom Lane wrote:
> (At the code level, this is implicit in the fact that the comparison
> function will be called via FunctionCall2Coll or a sibling, and those
> all throw an error if the called function returns NULL.)
>
> Now, it doesn't say in so many words that the c
David Rowley writes:
> On Sun, 3 Mar 2019 at 05:25, Tom Lane wrote:
>> My initial thought about plugging that admittedly-academic point is
>> to insist that the join operator be both strict and a member of a
>> btree opclass (hash might be OK too; less sure about other index types).
> Why strict
On Sun, 3 Mar 2019 at 05:25, Tom Lane wrote:
> Hmm ... thinking about the strictness angle some more: what we really
> need to optimize NOT IN, IIUC, is an assumption that the join operator
> will never return NULL. While not having NULL inputs is certainly a
> *necessary* condition for that (ass
David Rowley writes:
> On Sat, 2 Mar 2019 at 13:45, Tom Lane wrote:
>> Yeah --- that has a nontrivial risk of making things significantly worse,
>> which makes it a hard sell. I think the most reasonable bet here is
>> simply to not perform the transformation if we can't prove the inner side
>>
On Sat, 2 Mar 2019 at 13:45, Tom Lane wrote:
>
> David Rowley writes:
> > I think you're fighting a losing battle here with adding OR quals to
> > the join condition.
>
> Yeah --- that has a nontrivial risk of making things significantly worse,
> which makes it a hard sell. I think the most reas
David Rowley writes:
> I think you're fighting a losing battle here with adding OR quals to
> the join condition.
Yeah --- that has a nontrivial risk of making things significantly worse,
which makes it a hard sell. I think the most reasonable bet here is
simply to not perform the transformation
On Sat, 2 Mar 2019 at 12:39, Li, Zheng wrote:
> However, if s.a is nullable, we would do this transformation:
> select count(*) from big b where not exists(select 1 from small s
> where s.a = b.a or s.a is null);
I understand you're keen to make this work, but you're assuming again
that f
The current transformation would not add "or s.a is NULL" in the example
provided since it is non-nullable. You will be comparing these two cases in
terms of the transformation:
select count(*) from big b where not exists(select 1 from small s
where s.a = b.a);
Time: 51.416 ms
select count(*) f
On Sat, 2 Mar 2019 at 12:13, Tom Lane wrote:
>
> "Li, Zheng" writes:
> > Although adding "or var is NULL" to the anti join condition forces the
> > planner to choose nested loop anti join, it is always faster compared to
> > the original plan.
>
> TBH, I am *really* skeptical of sweeping claims
"Li, Zheng" writes:
> Although adding "or var is NULL" to the anti join condition forces the
> planner to choose nested loop anti join, it is always faster compared to the
> original plan.
TBH, I am *really* skeptical of sweeping claims like that. The existing
code will typically produce a has
Thanks all for the feedbacks! I'm working on a refined patch.
Although adding "or var is NULL" to the anti join condition forces the planner
to choose nested loop anti join, it is always faster compared to the original
plan. In order to enable the transformation from NOT IN to anti join when the
David Rowley writes:
> On Sat, 2 Mar 2019 at 05:44, Tom Lane wrote:
>> I'm not sure if the second one is actually a semantics bug or just a
>> misoptimization? But yeah, +1 for putting in some simple tests for
>> corner cases right now. Anyone want to propose a specific patch?
> The second is
On Sat, 2 Mar 2019 at 05:44, Tom Lane wrote:
>
> Andres Freund writes:
> > I've not checked, but could we please make sure these cases are covered
> > in the regression tests today with a single liner?
>
> I'm not sure if the second one is actually a semantics bug or just a
> misoptimization? Bu
Andres Freund writes:
> On March 1, 2019 4:53:03 AM PST, David Rowley
> wrote:
>> On Fri, 1 Mar 2019 at 15:27, Richard Guo wrote:
>>> 1. The patch would give wrong results when the inner side is empty.
>>> 2. Because of the new added predicate 'OR (var is NULL)', we cannot
>>> use hash join or
Hi,
On March 1, 2019 4:53:03 AM PST, David Rowley
wrote:
>On Fri, 1 Mar 2019 at 15:27, Richard Guo wrote:
>> I have reviewed your patch. Good job except two issues I can find:
>>
>> 1. The patch would give wrong results when the inner side is empty.
>In this
>> case, the whole data from outer s
On Fri, 1 Mar 2019 at 15:27, Richard Guo wrote:
> I have reviewed your patch. Good job except two issues I can find:
>
> 1. The patch would give wrong results when the inner side is empty. In this
> case, the whole data from outer side should be in the outputs. But with the
> patch, we will lose t
On Tue, Feb 26, 2019 at 6:51 AM Li, Zheng wrote:
> Resend the patch with a whitespace removed so that "git apply patch" works
> directly.
>
>
Hi Zheng,
I have reviewed your patch. Good job except two issues I can find:
1. The patch would give wrong results when the inner side is empty. In this
On Wed, Feb 27, 2019 at 4:52 AM David Rowley
wrote:
> On Wed, 27 Feb 2019 at 03:07, Jim Finnerty wrote:
>
> If you're proposing to do that for this thread then I can take my
> planner only patch somewhere else. I only posted my patch as I pretty
> much already had what I thought you were origin
On Wed, 27 Feb 2019 at 13:41, Li, Zheng wrote:
> We still check for inner side's nullability, when it is nullable we
> append a "var is NULL" to the anti join condition. So every outer
> tuple is going to evaluate to true on the join condition when there
> is indeed a null entry in the inner.
Tha
I'm totally fine with setting the target to PG13.
--
I'm interested to know how this works without testing for inner
nullability. If any of the inner side's join exprs are NULL then no
records can match. What do you propose to work around that?
--
We still check for inner side's nullability, whe
On Wed, 27 Feb 2019 at 13:13, Tom Lane wrote:
>
> "Li, Zheng" writes:
> > However, given that the March CommitFest is imminent and the runtime smarts
> > patent concerns David had pointed out (which I was not aware of before), we
> > would not move that direction at the moment.
>
> > I propose
On Wed, 27 Feb 2019 at 13:05, Li, Zheng wrote:
> -With the latest fix (for the empty table case), our patch does the
> transformation as long as the outer is non-nullable regardless of the inner
> nullability, experiments show that the results are always faster.
Hi Zheng,
I'm interested to kno
"Li, Zheng" writes:
> However, given that the March CommitFest is imminent and the runtime smarts
> patent concerns David had pointed out (which I was not aware of before), we
> would not move that direction at the moment.
> I propose that we collaborate to build one patch from the two patches
I agree we will need some runtime smarts (such as a new anti join type as
pointed out by Richard) to "ultimately" account for all the cases of NOT IN
queries.
However, given that the March CommitFest is imminent and the runtime smarts
patent concerns David had pointed out (which I was not aware
On Wed, 27 Feb 2019 at 03:07, Jim Finnerty wrote:
>
> The problem is that the special optimization that was proposed for the case
> where the subquery has no WHERE clause isn't valid when the subquery returns
> no rows. That additional optimization needs to be removed, and preferably
> replaced w
Greenplum Database does this optimization. The idea is to use a new join
type, let's call it JOIN_LASJ_NOTIN, and its semantic regarding NULL is
defined as below:
1. If there is a NULL in the outer side, and the inner side is empty, the
NULL should be part of the outputs.
2. If there is a NULL
On Tue, 26 Feb 2019 at 11:51, Li, Zheng wrote:
> Resend the patch with a whitespace removed so that "git apply patch" works
> directly.
I had a quick look at this and it seems to be broken for the empty
table case I mentioned up thread.
Quick example:
Setup:
create table t1 (a int);
create ta
On Fri, 22 Feb 2019 at 09:44, David Rowley wrote:
> I've attached the rebased and still broken version.
I set about trying to make a less broken version of this.
A quick reminder of the semantics of NOT IN:
1. WHERE NOT IN(SELECT FROM table);
If table is non-empty:
will filter out rows where
On Thu, 21 Feb 2019 at 16:27, Jim Finnerty wrote:
> We can always correctly transform a NOT IN to a correlated NOT EXISTS. In
> almost all cases it is more efficient to do so. In the one case that we've
> found that is slower it does come down to a more general costing issue, so
> that's probabl
Jim Finnerty writes:
> re: The idea that's been kicked around in the past is to detect whether the
> subselect's output column(s) can be proved NOT NULL, and if so, convert
> to an antijoin just like NOT EXISTS
> basically, yes. this will handle nullability of both the outer and inner
> correlat
Jim Finnerty writes:
> We have been working on improved optimization of NOT IN, and we would like
> to share this optimizaton with the community.
The idea that's been kicked around in the past is to detect whether the
subselect's output column(s) can be proved NOT NULL, and if so, convert
to an a
53 matches
Mail list logo