On Fri, 25 Oct 2024 at 13:14, Greg Sabino Mullane <htamf...@gmail.com> wrote: > > On Mon, Oct 14, 2024 at 10:20 PM David Rowley <dgrowle...@gmail.com> wrote: >> >> Yeah, I think any effort to change the default value for this setting would >> require some analysis to prove that the newly proposed default >> is a more suitable setting than the current default. I mean, why 1.2 and not >> 1.1 or 1.3? Where's the evidence that 1.2 is the best value >> for this? > > As I said, I was just throwing that 1.2 number out there. It felt right, > although perhaps a tad high (which seems right as we keep things very > conservative). I agree we should make a best effort to have an accurate, > defendable default. We all know (I hope) that 4.0 is wrong for SSDs.
I don't think we're going to find the correct new value for this setting by throwing randomly chosen numbers at each other on an email thread. Unfortunately, someone is going to have to do some work to figure out what the number should be, and then hopefully someone else can verify that work to check that person is correct. I'm not trying to be smart or funny here, but I just am failing to comprehend why you think you offering a number without any information about how you selected that number to set as the new default random_page_cost would be acceptable. Are you expecting someone else to go and do the work to prove that your selected number is the correct one? It's been 4 weeks since your first email and nobody has done that yet, so maybe you might need to consider other ways to achieve your goal. >> I don't think just providing evidence that random read times are closer to >> sequential read times on SSDs are closer than they are with >> HDDs is going to be enough. > > ... >> >> It would be nice to have this as a script so that other people could easily >> run it on their hardware to ensure that random_page_cost we >> choose as the new default is representative of the average hardware. > > > Heh, this is starting to feel like belling the cat (see > https://fablesofaesop.com/belling-the-cat.html) I don't see the similarity. Changing the default random_page_cost requires analysis to find what the new default should be. The execution of the actual change in default is dead simple. With belling the cat, it seems like the execution is the hard part and nobody is debating the idea itself. > Remember this is still just a default, and we should encourage people to > tweak it themselves based on their own workloads. I just want people to start > in the right neighborhood. I'll see about working on some more research / > generating a script, but help from others is more than welcome. You might be mistakenly thinking that the best random_page_cost is an exact ratio of how much slower a random seek and read is from a sequential read. There are unfortunately many other factors to consider. The correct setting is going to be the one where the chosen plan uses the scan method that's the fastest and knowing the answer to that is going to take some benchmarks on PostgreSQL. Our cost model simply just isn't perfect enough for you to assume that I/O is the only factor that changes between an Index Scan and a Seq Scan. I'd say it's not overly difficult to come up with test cases that go to prove the value you select is "correct". I've done this before for CPU-related costs. I think with I/O the main difference will be that your tests should be much larger, and doing that will mean getting the results takes much more time. Here's a link to some analysis I did to help solve a problem relating to partition-wise aggregates [1]. Maybe you can use a similar method to determine random_page_cost. David [1] https://www.postgresql.org/message-id/CAKJS1f9UXdk6ZYyqbJnjFO9a9hyHKGW7B%3DZRh-rxy9qxfPA5Gw%40mail.gmail.com