On Thu, Jan 30, 2020 at 2:40 PM Peter Geoghegan <p...@bowt.ie> wrote: > On Thu, Jan 30, 2020 at 11:16 AM Peter Geoghegan <p...@bowt.ie> wrote: > > I prefer to think of the patch as being about improving the stability > > and predictability of Postgres with certain workloads, rather than > > being about overall throughput. Postgres has an ungoing need to VACUUM > > indexes, so making indexes smaller is generally more compelling than > > it would be with another system. That said, there are certainly quite > > a few cases that have big improvements in throughput and latency. > > I also reran TPC-C/benchmarksql with the patch (v30). TPC-C has hardly > any non-unique indexes, which is a little unrealistic. I found that > the patch was up to 7% faster in the first few hours, since it can > control the bloat from certain non-HOT updates. This isn't a > particularly relevant workload, since almost all UPDATEs don't affect > indexed columns. The incoming-item-is-duplicate heuristic works well > with TPC-C, so there is probably hardly any possible downside there. > > I think that I should commit the patch without the GUC tentatively. > Just have the storage parameter, so that everyone gets the > optimization without asking for it. We can then review the decision to > enable deduplication generally after the feature has been in the tree > for several months. > > There is no need to make a final decision about whether or not the > optimization gets enabled before committing the patch.
That seems reasonable. I suspect that you're right that the worst-case downside is not big enough to really be a problem given all the upsides. But the advantage of getting things committed is that we can find out what users think. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company