Re: Optimizing count(), but Explain estimates wildly off

2024-03-06 Thread Chema
> > Yours will be different, as I cannot exactly duplicate your schema or data > distribution, but give "SELECT 1" a try. This was on Postgres 16, FWIW, > with a default_statistics_target of 100. > Select 1 produces a sequential scan, like Select * did before Vacuum Full. But if I force an index s

Re: Optimizing count(), but Explain estimates wildly off

2024-03-05 Thread Greg Sabino Mullane
> columns has not improved the planner's estimates, which are off by almost > 1M, and there's been no suggestion of what could cause that. You are asking a lot of the planner - how would it know that the average number of items is much higher for ids derived indirectly from "Mexico" versus ids der

Re: Optimizing count(), but Explain estimates wildly off

2024-03-05 Thread Chema
El lun, 4 mar 2024 a la(s) 7:50 p.m., Greg Sabino Mullane ( htamf...@gmail.com) escribió: > On Mon, Mar 4, 2024 at 2:14 PM Chema wrote: > >> There's one JSON column in each table with a couple fields, and a column >> with long texts in Items. > > and earlier indicated the query was: > >> Select

Re: Optimizing count(), but Explain estimates wildly off

2024-03-04 Thread Greg Sabino Mullane
On Mon, Mar 4, 2024 at 2:14 PM Chema wrote: > There's one JSON column in each table with a couple fields, and a column > with long texts in Items. and earlier indicated the query was: > Select * from tenders inner join items You do not want to do a "select star" on both tables unless you 100

Re: Optimizing count(), but Explain estimates wildly off

2024-03-04 Thread Chema
> > > -> Parallel Seq Scan on pricescope_items (cost=0.00..1027794.01 > rows=3277101 width=522) (actual time=0.753..41654.507 rows=2621681 loops=3) > > Why does it take over 41 seconds to read a table with less than > 3 million rows? Are the rows so large? Is the tabe bloated? > What is the siz

Re: Optimizing count(), but Explain estimates wildly off

2024-03-01 Thread Laurenz Albe
On Thu, 2024-02-29 at 17:15 -0600, Chema wrote: > No major changes after doing Analyze, and also Vacuum Analyze. Indeed. This caught my attention: > -> Parallel Seq Scan on pricescope_items (cost=0.00..1027794.01 > rows=3277101 width=522) (actual time=0.753..41654.507 rows=2621681 loops=3) W

Re: Optimizing count(), but Explain estimates wildly off

2024-02-29 Thread Chema
> > > Reading around, seems many people are still using this 2005 snippet to > obtain the > > row count estimate from Explain: > > I recommend using FORMAT JSON and extracting the top row count from that. > It is > simpler and less error-prone. > Good tip, thanks Laurenze! > > > Is this still the

Re: Optimizing count(), but Explain estimates wildly off

2024-02-27 Thread Alvaro Herrera
Hi Chema, On 2024-Feb-26, Chema wrote: > Dear pgsqlers, > > I'm trying to optimize simple queries on two tables (tenders & items) with > a couple million records. Besides the resulting records, the app also > displays the count of total results. Doing count() takes as much time as > the other

Re: Optimizing count(), but Explain estimates wildly off

2024-02-26 Thread Laurenz Albe
On Mon, 2024-02-26 at 18:25 -0600, Chema wrote: > I'm trying to optimize simple queries on two tables (tenders & items) with a > couple > million records.  Besides the resulting records, the app also displays the > count of > total results.  Doing count() takes as much time as the other query (wh

Re: Optimizing count(), but Explain estimates wildly off

2024-02-26 Thread Vitalii Tymchyshyn
Is your transaction id more or less monotonic according to the date? If so, something like the next can help: with tenders_filtered as (select * from tenders where country = 'Mexico' and "date" >= '2023-01-01' and "date" < '2024-01-01') Select * from tenders_filtered inner join items on transactio

Optimizing count(), but Explain estimates wildly off

2024-02-26 Thread Chema
Dear pgsqlers, I'm trying to optimize simple queries on two tables (tenders & items) with a couple million records. Besides the resulting records, the app also displays the count of total results. Doing count() takes as much time as the other query (which can be 30+ secs), so it's an obvious tar