Re: Guidance Needed: Scaling PostgreSQL for 12 TB Data Growth - New Feature Implementation

2025-07-04 Thread Greg Sabino Mullane
It's hard to give generic recommendations for what really depends on your specific needs, but here is one attempt: using HikariCP for connection pooling. For better scaling, look into PGBouncer, which has very fast "transaction" and "statement" modes. ... manage 10-12 TB of data in a production

Re: Bulk DML performance

2025-03-17 Thread Greg Sabino Mullane
On Mon, Mar 17, 2025 at 4:19 AM wrote: Can you help me understand why performing 3 million lookups on a b-tree > index with all pages cached in memory takes so long? It's not the lookup, it's writing the 3 million rows (and in this particular upsert case, deleting 3 million, then inserting 3 m

Re: Poor performance with row wise comparisons

2025-02-07 Thread Greg Sabino Mullane
On Fri, Feb 7, 2025 at 2:05 AM Jon Emord wrote: > but with limit 101, the extra shared hits return > Can you show the explain analyze for the limit 101 case? Cheers, Greg -- Crunchy Data - https://www.crunchydata.com Enterprise Postgres Software Products & Tech Support

Re: Simple query with Planner underestimating rows.

2025-01-28 Thread Greg Sabino Mullane
On Tue, Jan 28, 2025 at 5:30 PM Felipe López Montes wrote: > For the sake of clarity and simplicity, I have disabled the nestloop join > in the session because it involved a gather clause and parallel workers and > was still underestimating rows, so the same problem happens with nestloop > strate

Re: Question: consolidating strpos searches?

2025-01-04 Thread Greg Sabino Mullane
On Sat, Jan 4, 2025 at 12:16 PM James Addison wrote: > In the context of considering writing a patch: would the complexity of > implementing such a feature for PostgreSQL be worth the potential > performance benefits? Probably not. As Tom said, this sounds like it should be tried as an extensio

Re: Why a bitmap scan in this case?

2024-12-19 Thread Greg Sabino Mullane
> > Why wouldn't it do an index (or, really, an index only) scan in this case Well, it did do an index scan (and a bitmap scan is a pretty good solution here), but as to why no indexonly scan, there is probably not enough assurance that it won't have to hit the heap heavily anyway. Try doing a SE

Re: PostgreSQL and a Catch-22 Issue related to dead rows

2024-12-10 Thread Greg Sabino Mullane
Thanks for that link; seeing actual queries is a big help. One thing to try is to get some index-only scans to run. Regular indexes need to consult the heap (main table) for visibility information, and a bloated table can make that consultation expensive. For example, an index like this should wor

Re: PostgreSQL and a Catch-22 Issue related to dead rows

2024-12-10 Thread Greg Sabino Mullane
On Tue, Dec 10, 2024 at 3:55 AM Lars Aksel Opsahl wrote: > Is it difficult to add parameter like force-dead-rows-removal that we send > to the vacuum job that will remove this rows like this ? > > I'm still not sure what the ask here is - complete literal removal of the dead rows? That's not how

Re: PostgreSQL and a Catch-22 Issue related to dead rows

2024-12-09 Thread Greg Sabino Mullane
On Mon, Dec 9, 2024 at 6:03 AM Lars Aksel Opsahl wrote: > In one case, we processed a total of 750 cells, with an overall runtime of > 40 hours. However, one specific cell took over 12 hours to complete, most > of which was spent on removing small areas by deleting edges in PostGIS > Topology. Th

Re: Unexpected Performance for the Function simplify_function

2024-10-24 Thread Greg Sabino Mullane
On Thu, Oct 24, 2024 at 3:49 PM Shiv Iyer wrote: > Hello, > > > The query plans and results you shared illustrate the unexpected > performance differences between using and bypassing the > simplify_function() logic in PostgreSQL’s optimizer. Here’s an in-depth > analysis and thoughts on optimizin

Re: Postgres index usage

2024-08-07 Thread Greg Sabino Mullane
On Wed, Aug 7, 2024 at 1:06 PM Dirschel, Steve < steve.dirsc...@thomsonreuters.com> wrote: > I’ve found multiple postings out there saying you can query > pg_stat_all_indexes and look at idx_scan to know if an index has been used > by queries. I want to be 100% sure I can rely on that table/colum

Re: Distinct performance dropped by multiple times in v16

2024-06-10 Thread Greg Sabino Mullane
On Mon, Jun 10, 2024 at 3:32 AM Vitaliy Litovskiy < vitaliy.litovs...@gmail.com> wrote: > 1. query runs for 4-5 seconds on v16 and less than a second on v15 > Yeah, that's a big regression. Seeing the actual EXPLAIN ANALYZE output for both systems would be very helpful to us. Also nice to see th

Re: Optimizing count(), but Explain estimates wildly off

2024-03-05 Thread Greg Sabino Mullane
> columns has not improved the planner's estimates, which are off by almost > 1M, and there's been no suggestion of what could cause that. You are asking a lot of the planner - how would it know that the average number of items is much higher for ids derived indirectly from "Mexico" versus ids der

Re: Optimizing count(), but Explain estimates wildly off

2024-03-04 Thread Greg Sabino Mullane
On Mon, Mar 4, 2024 at 2:14 PM Chema wrote: > There's one JSON column in each table with a couple fields, and a column > with long texts in Items. and earlier indicated the query was: > Select * from tenders inner join items You do not want to do a "select star" on both tables unless you 100