On Mon, Aug 14, 2017 at 11:58 PM, Michael Paquier <michael.paqu...@gmail.com> wrote: > As $subject has been touched on two threads recently > (https://www.postgresql.org/message-id/CAB7nPqTbHLcHFn6m11tfpwAdgz8BmnBza2jjN9AK=sdx_kb...@mail.gmail.com > and > https://www.postgresql.org/message-id/20170808213537.wkmmagf2a6i3hjyi@alvherre.pgsql), > the list of wait event and their category are becoming harder to > maintain because of their number and the fact that: > 1) It is easy to add a wait event and... > 2) It is easy to not update its documentation. > 3) the documentation tables get easily broken.
The documentation provides nice "developer-perspective" descriptions of the wait events. It's a great start. If you have some familiarity with the code, or if you're willing to dig into it, then the descriptions get you going. However I think that it would be immensely helpful to start gathering knowledge somewhere on wait events from the user perspective. Listing a few of the wait events which users will probably see most often along with practical suggestions of what users could further examine or even what they could change to make improvements on their database systems. When Postgres 10 goes GA and users start adopting it, we will start getting a real-world sense of which wait events are most commonly observed. But I don't think it's too early to stub out a wiki page for gathering user-level community knowledge on this. As collective knowledge about wait event meaning and action matures, the best advice might even be considered for eventual incorporation back into the official documentation. I do realize there are some big pitfalls in the area of documenting performance optimization. Workloads vary significantly and it's all to easy for sound bites to devolve into "golden rules" or "best practices" and then become mantras and silver bullets. Nonetheless I believe it is possible to make some basic, scientific and well-qualified observations. Not only is this immensely useful but in fact it's critical to users' ability to have a good experience when they dramatically ramp up their usage of PostgreSQL. One particular section of Oracle's perf tuning doc actually isn't a bad example in giving a few qualified and actionable user-perspective wait event descriptions: http://docs.oracle.com/database/122/TGDBA/instance-tuning-using-performance-views.htm#TGDBA94472 I'm just making thus up off the top of my head, so I'm sure it will be bad... but in the interest of starting a conversation here's a example "stub/starter": ^-^-^-^-^-^-^-^-^-^-^-^ https://wiki.postgresql.org/wiki/Wait_Events Lock:tuple - this means that the SQL is waiting because some application has previously locked the tuple (row) with a different database connection and has not yet committed or rolled back that transaction. if you frequently see this wait event then you probably need to look for application-level optimizations to decrease insert/update/delete/lock contention on the same rows. IO:DataFileRead - this means that the SQL is waiting for data to be physically read into PostgreSQL's internal memory cache. if you frequently see this wait event then you might want to examine the SQL statements which cause the most I/O and look for optimizations. Can you accomplish the same result while examining less data? If not, is there an index that might make the processing more efficient? Can the SQL be restructured to generate a more optimal plan? If no application-level optimizations are possible, then can the I/O subsystem be optimized somehow? IO:XactSync and IO:WALWrite - [what's the exact difference between these?] this probably means that the SQL is waiting for WAL data to either be asynchronously written to disk, or for the final fsync to complete before returning from a transactions's COMMIT call. if you frequently see this wait event then you might want to review the content and frequency of updates and inserts generated by your application. If your application cannot be refactored to remove unneeded DML then you might example the WAL I/O path for opportunities to optimize or increase capacity. Notes about scaling out/up hardware: IO-related waits can often be somewhat remediated by scaling out or scaling up your hardware. However it's generally worthwhile to first review your application for optimization opportunities. A small application-level change can make orders-of-magnitude greater improvements over hardware scaling! Note that locking and concurrency related waits can often become even worse with scaling out/up. (Waiting for something on another machine can be slower than waiting for something on another processor.) Also, scaling the wrong hardware component can exacerbate IO-related waits if you don't scale in a way that directly addresses the actual bottleneck. ^-^-^-^-^-^-^-^-^-^-^-^ What do others think about this? Is the wiki.postgresql.org the right place for something like this, or should it just start out as blog posts or tweets or people's personal websites until the ideas are a little more mature? -Jeremy P.S. wasn't sure if this discussion was most appropriate for hackers, users, or www list as it seems to overlap with all of them :) -- http://about.me/jeremy_schneider -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers