Re: Assertion failure with LEFT JOINs among >500 relations

2020-10-19 Thread David Rowley
On Mon, 19 Oct 2020 at 13:06, Tom Lane wrote: > > David Rowley writes: > > I guess we could resolve that concern by just changing the failing > > assert to become: Assert(outer_skip_rows <= outer_rows || > > isinf(outer_rows)); > > I can't really object to just weakening the Assert a tad. > My th

Re: Assertion failure with LEFT JOINs among >500 relations

2020-10-18 Thread David Rowley
On Mon, 19 Oct 2020 at 13:06, Tom Lane wrote: > (BTW, am I wrong to suppose that the same case fails the same > way in our older branches? Certainly that Assert has been there > a long time.) I only tested as back as far as 9.5, but it does fail there. David

Re: Assertion failure with LEFT JOINs among >500 relations

2020-10-18 Thread Tom Lane
David Rowley writes: > It would be good to hear Onder's case to see if he has a good argument > for having a vested interest in pg13 not failing this way with assets > enabled. Yeah, some context for this report would be a good thing. (BTW, am I wrong to suppose that the same case fails the same

Re: Assertion failure with LEFT JOINs among >500 relations

2020-10-18 Thread David Rowley
On Mon, 19 Oct 2020 at 12:25, Tom Lane wrote: > > David Rowley writes: > > On Mon, 19 Oct 2020 at 12:10, Tom Lane wrote: > >> TBH, I see no need to do anything in the back branches. This is not > >> an issue for production usage. > > > I understand the Assert failure is pretty harmless, so non-

Re: Assertion failure with LEFT JOINs among >500 relations

2020-10-18 Thread Tom Lane
David Rowley writes: > On Mon, 19 Oct 2020 at 12:10, Tom Lane wrote: >> TBH, I see no need to do anything in the back branches. This is not >> an issue for production usage. > I understand the Assert failure is pretty harmless, so non-assert > builds shouldn't suffer too greatly. I just assume

Re: Assertion failure with LEFT JOINs among >500 relations

2020-10-18 Thread David Rowley
On Mon, 19 Oct 2020 at 12:10, Tom Lane wrote: > > David Rowley writes: > > For the backbranches, I think I go with something more minimal in the > > form of adding: > > TBH, I see no need to do anything in the back branches. This is not > an issue for production usage. I understand the Assert f

Re: Assertion failure with LEFT JOINs among >500 relations

2020-10-18 Thread Tom Lane
David Rowley writes: > On Sat, 17 Oct 2020 at 06:00, Tom Lane wrote: >> I'm good with the v2 patch. > Thanks a lot for having a look. I'll proceed in getting the v2 which I > sent earlier into master. > For the backbranches, I think I go with something more minimal in the > form of adding: TBH

Re: Assertion failure with LEFT JOINs among >500 relations

2020-10-18 Thread David Rowley
On Sat, 17 Oct 2020 at 06:00, Tom Lane wrote: > I'm confused now, because the v2 patch does remove those isnan calls? I think that was a case of a last-minute change of mind and forgetting to attach the updated patch. > I rechecked the archives, and I agree that there's no data about > exactly h

Re: Assertion failure with LEFT JOINs among >500 relations

2020-10-16 Thread Tom Lane
I wrote: > David Rowley writes: >> I've ended up leaving the NaN checks in the join costing functions. >> There was no case mentioned in [1] that showed how we hit that >> reported test case, so I'm not really confident enough to know I'm not >> just reintroducing the same problem again by removin

Re: Assertion failure with LEFT JOINs among >500 relations

2020-10-13 Thread Tom Lane
David Rowley writes: > On Wed, 14 Oct 2020 at 04:16, Tom Lane wrote: >> So now I'm imagining something like >> #define MAXIMUM_ROWCOUNT 1e100 > That seems more reasonable. We likely could push it a bit higher, but > I'm not all that motivated to since if that was true, then you could > expect th

Re: Assertion failure with LEFT JOINs among >500 relations

2020-10-13 Thread David Rowley
Thanks for having a look at this. On Wed, 14 Oct 2020 at 04:16, Tom Lane wrote: > I'm on board with trying to get rid of NaN rowcount estimates more > centrally. I do not think it is a good idea to try to wire in a > prohibition against zero rowcounts. That is actually the correct > thing in as

Re: Assertion failure with LEFT JOINs among >500 relations

2020-10-13 Thread Tom Lane
David Rowley writes: > Because there's been quite a few of these, and this report is yet > another one, I wonder if it's time to try and stamp these out at the > source rather than where the row counts are being used. I'm on board with trying to get rid of NaN rowcount estimates more centrally.

Re: Assertion failure with LEFT JOINs among >500 relations

2020-10-13 Thread David Rowley
On Sat, 10 Oct 2020 at 02:19, Tom Lane wrote: > I'm fairly certain that every one of the existing NaN checks was put > there on the basis of hard experience. Possibly digging in the git > history would offer more info about exactly where the NaNs came from. I had a look at this and found there'

Re: Assertion failure with LEFT JOINs among >500 relations

2020-10-09 Thread Tom Lane
David Rowley writes: > On Fri, 9 Oct 2020 at 15:06, Tom Lane wrote: >> I notice there are some other ad-hoc isnan() checks scattered >> about costsize.c, too. Maybe we should indeed consider fixing >> clamp_row_estimate to get rid of inf (and nan too, I suppose) >> so that we'd not need those.

Re: Assertion failure with LEFT JOINs among >500 relations

2020-10-08 Thread David Rowley
On Fri, 9 Oct 2020 at 15:06, Tom Lane wrote: > I notice there are some other ad-hoc isnan() checks scattered > about costsize.c, too. Maybe we should indeed consider fixing > clamp_row_estimate to get rid of inf (and nan too, I suppose) > so that we'd not need those. I don't recall the exact cas

Re: Assertion failure with LEFT JOINs among >500 relations

2020-10-08 Thread Tom Lane
David Rowley writes: > Are you worried about the costs above the join that triggers that > coming out as NaN with that fix? It appears that's the case. [ pokes at that... ] Yeah, it looks like nestloop cost estimation also has some issues with inf-times-zero producing NaN; it's just not assertin

Re: Assertion failure with LEFT JOINs among >500 relations

2020-10-08 Thread David Rowley
On Fri, 9 Oct 2020 at 12:59, Tom Lane wrote: > If we did want to do something here, I'd consider something like > > if (isnan(outer_skip_rows)) > outer_skip_rows = 0; > if (isnan(inner_skip_rows)) > inner_skip_rows = 0; Are you worried about the costs above

Re: Assertion failure with LEFT JOINs among >500 relations

2020-10-08 Thread Tom Lane
David Rowley writes: > I admit it's annoying to add cycles to clamp_row_est() for such insane cases. I poked at this a bit more closely, and noted that the actual problem is that when we do this: outer_skip_rows = rint(outer_path_rows * outerstartsel); we have outer_path_rows = inf, out

Re: Assertion failure with LEFT JOINs among >500 relations

2020-10-08 Thread David Rowley
On Fri, 9 Oct 2020 at 12:16, Tom Lane wrote: > > > Perhaps the right fix is to modify clamp_row_est() with: > > I thought of that too, but as you say, if the rowcount has overflowed a > double then we've got way worse problems. It'd make more sense to try > to keep the count to a saner value in t

Re: Assertion failure with LEFT JOINs among >500 relations

2020-10-08 Thread Tom Lane
David Rowley writes: > The reason it fails is that outer_path_rows has become infinity due to > calc_joinrel_size_estimate continually multiplying in the join > selectivity of 0.05 (due to our 200 default num distinct from lack of > any stats) which after a number of iterations causes the number t

Re: Assertion failure with LEFT JOINs among >500 relations

2020-10-08 Thread David Rowley
On Fri, 9 Oct 2020 at 08:16, Onder Kalaci wrote: > I hit an assertion failure. When asserts disabled, it works fine even with > more tables (>5000). > > Steps to reproduce: > CREATE TABLE users_table (user_id int, time timestamp, value_1 int, value_2 > int, value_3 float, value_4 bigint); > 250