we faced this issue 3 times this week, each time last only 2 seconds, so
not easy to run perf in peak business time to capture that, anyway, I will
try. before that, I want to understand if "os page cache" or "pg buffer
cache" can contribute to the wait_event time "extend" and "DataFileRead",
or bgwriter ,checkpoint flushing data to disk can impact that too ?  we
enable bgwriter , and we see checkpointer get scheduled by "wal" during the
time, so I just increased max_wal_size to make checkpoint scheduled in
longer time.

Thanks,

James


Frits Hoogland <frits.hoogl...@gmail.com> 於 2025年6月26日週四 下午2:40寫道:

> Okay. So it's a situation that is reproducable.
> And like was mentioned, the system time (percentage) is very high.
> Is this a physical machine, or a virtual machine?
>
> The next thing to do, is use perf to record about 20 seconds or so during
> a period of time when you see this behavior (perf record -g, taking the
> backtrace with it).
> This records (samples) the backtraces of on cpu tasks, from which you then
> can derive what they are doing, for which you should see lots of tasks in
> kernel space, and what that is, using perf report.
>
> *Frits Hoogland*
>
>
>
>
> On 26 Jun 2025, at 04:32, James Pang <jamespang...@gmail.com> wrote:
>
> thans for you suggestions, we have iowait from sar command too, copy here,
> checking with infra team not found abnormal IO activities either.
> 02:00:01 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal
>  %guest  %gnice   %idle
> 02:00:03 PM  all         15.92    0.00   43.02    0.65    0.76    2.56
>  0.00    0.00    0.00   37.09
> 02:00:03 PM    0         17.59    0.00   46.73    1.01    0.50    0.50
>  0.00    0.00    0.00   33.67
> 02:00:03 PM    1          9.50     0.00   61.50    0.50    0.50    1.00
>  0.00    0.00    0.00   27.00
> 02:00:03 PM    2          20.71    0.00   44.44    1.01    0.51    0.51
>  0.00    0.00    0.00   32.83
> 02:00:03 PM    3         14.00    0.00   51.50    2.00    1.00    1.00
>  0.00    0.00    0.00   30.50
> 02:00:03 PM    4          6.57     0.00   52.53    0.51    0.51    3.54
>  0.00    0.00    0.00   36.36
> 02:00:03 PM    5          10.20    0.00   49.49    1.02    1.53    0.00
>  0.00    0.00    0.00   37.76
> 02:00:03 PM    6          27.64    0.00   41.21    0.50    0.50    0.50
>  0.00    0.00    0.00   29.65
> 02:00:03 PM    7          9.05    0.00   50.75    0.50    1.01    0.50
>  0.00    0.00    0.00   38.19
> 02:00:03 PM    8          12.18    0.00   49.75    0.51    0.51    0.51
>  0.00    0.00    0.00   36.55
> 02:00:03 PM    9          13.00    0.00    9.50    0.50    1.50   15.50
>  0.00    0.00    0.00   60.00
> 02:00:03 PM   10         15.58    0.00   46.23    0.00    0.50    0.50
>  0.00    0.00    0.00   37.19
> 02:00:03 PM   11          20.71    0.00   10.10    0.00    1.01   14.14
>  0.00    0.00    0.00   54.04
> 02:00:03 PM   12         21.00    0.00   37.00    0.50    1.00    1.00
>  0.00    0.00    0.00   39.50
> 02:00:03 PM   13         13.57    0.00   45.73    1.01    1.01    1.01
>  0.00    0.00    0.00   37.69
> 02:00:03 PM   14   18.18    0.00   39.39    1.01    0.51    0.51    0.00
>  0.00    0.00   40.40
> 02:00:03 PM   15   14.00    0.00   49.50    0.50    0.50    3.50    0.00
>  0.00    0.00   32.00
> 02:00:03 PM   16   19.39    0.00   39.80    1.02    1.53    0.51    0.00
>  0.00    0.00   37.76
> 02:00:03 PM   17   16.75    0.00   45.18    1.52    1.02    2.54    0.00
>  0.00    0.00   32.99
> 02:00:03 PM   18   12.63    0.00   50.00    0.00    1.01    0.00    0.00
>  0.00    0.00   36.36
> 02:00:03 PM   19    5.56    0.00   82.32    0.00    0.00    0.00    0.00
>  0.00    0.00   12.12
> 02:00:03 PM   20   15.08    0.00   48.24    0.50    0.50    3.52    0.00
>  0.00    0.00   32.16
> 02:00:03 PM   21   17.68    0.00    9.09    0.51    1.52   13.64    0.00
>  0.00    0.00   57.58
> 02:00:03 PM   22   13.13    0.00   43.94    0.51    0.51    0.51    0.00
>  0.00    0.00   41.41
> 02:00:03 PM   23   14.07    0.00   42.71    0.50    0.50    0.50    0.00
>  0.00    0.00   41.71
> 02:00:03 PM   24   13.13    0.00   41.92    1.01    0.51    0.51    0.00
>  0.00    0.00   42.93
> 02:00:03 PM   25   16.58    0.00   47.74    0.50    1.01    0.50    0.00
>  0.00    0.00   33.67
> 02:00:03 PM   26   16.58    0.00   46.73    0.50    1.01    0.50    0.00
>  0.00    0.00   34.67
> 02:00:03 PM   27   45.50    0.00   54.50    0.00    0.00    0.00    0.00
>  0.00    0.00    0.00
> 02:00:03 PM   28    6.06    0.00   32.32    0.00    0.51   13.13    0.00
>  0.00    0.00   47.98
> 02:00:03 PM   29   13.93    0.00   44.78    1.00    1.00    0.50    0.00
>  0.00    0.00   38.81
> 02:00:03 PM   30   11.56    0.00   57.79    0.00    0.50    1.01    0.00
>  0.00    0.00   29.15
> 02:00:03 PM   31   33.85    0.00    9.23    0.51    1.54    0.51    0.00
>  0.00    0.00   54.36
> 02:00:03 PM   32   30.15    0.00   41.71    0.50    0.50    1.51    0.00
>  0.00    0.00   25.63
>
> Thanks,
>
> James
>
> Frits Hoogland <frits.hoogl...@gmail.com> 於 2025年6月25日週三 下午10:27寫道:
>
>>
>>
>> > On 25 Jun 2025, at 07:59, Laurenz Albe <laurenz.a...@cybertec.at>
>> wrote:
>> >
>> > On Wed, 2025-06-25 at 11:15 +0800, James Pang wrote:
>> >> pgv14, RHEL8, xfs , we suddenly see tens of sessions waiting on
>> "DataFileRead" and
>> >> "extend", it last about 2 seconds(based on pg_stat_activity query) ,
>> during the
>> >> waiting time, "%sys" cpu increased to 80% , but from "iostat" , no
>> high iops and
>> >> io read/write latency increased either.
>> >
>> > Run "sar -P all 1" and see if "%iowait" is high.
>> I would (strongly) advise against the use of iowait as an indicator. It
>> is a kernel approximation of time spent in IO from which cannot be use used
>> in any sensible way other than possibly you're doing IO.
>> First of all, iowait is not a kernel state, and therefore it's taken from
>> idle. This means that if there is no, or too little, idle time, iowait that
>> should be there is gone.
>> Second, the calculation to transfer idle time to iowait is done for
>> synchronous IO calls only. Which currently is not a problem for postgres
>> because it uses exactly that, but in the future it might.
>> Very roughly put, what the kernel does is keep a counter of tasks
>> currently in certain system IO calls, and then try to express that using
>> iowait. The time in IO wait can't be used calculate any IO facts.
>>
>> In that sense, it puts it in the same area as the load figure:
>> indicative, but mostly useless because it doesn't give you any facts about
>> what it is expressing.
>> >
>> > Check if you have transparent hugepages enabled:
>> >
>> >  cat /sys/kernel/mm/transparent_hugepage/enabled
>> >
>> > If they are enabled, disable them and see if it makes a difference.
>> >
>> > I am only guessing here.
>> Absolutely. Anything that is using signficant amounts of memory and is
>> not created to take advantage of transparent hugepages will probably
>> experience more downsides from THP than it helps.
>> >
>> >> many sessions were running same "DELETE FROM xxxx" in parallel waiting
>> on "extend"
>> >> and "DataFileRead", there are triggers in this table "After delete" to
>> insert/delete
>> >> other tables in the tigger.
>> >
>> > One thing that almost certainly would improve your situation is to run
>> fewer
>> > concurrent statements, for example by using a reasonably sized
>> connection pool.
>> This is true if the limits of the IO device, or anything towards to IO
>> device or devices are hit.
>> And in general, high "%sys", alias lots of time spent in kernel mode
>> alias system time indicates lots of time spent in system calls, which is
>> what the read and write calls in postgres are.
>> Therefore these figures suggest blocking for IO, for which Laurenz'
>> advise to lower the amount of concurrent sessions doing IO in general makes
>> sense.
>> A more nuanced analysis: if IO requests get queued, these will wait in
>> 'D' state in linux, which by definition is off cpu, and thus do not spent
>> cpu (system/kernel) time.
>>
>> What sounds suspicious is that you indicate you indicate there is you see
>> no signficant change in the amount of IO in iostat.
>>
>> In order to understand this, you will have to first carefully find the
>> actual IO physical IO devices that you are using for postgres IO.
>> In current linux this can be tricky, depending on how the hardware or
>> virtual machine looks like, and how the disks are arranged in linux.
>> What you need to determine is which actual disk devices are used, and
>> what their limits are.
>> Limits for any disk are IOPS (operations per second) and MBPS (megabytes
>> per second -> bandwdith).
>>
>> There is an additional thing to realize, which makes this really tricky:
>> postgres for common IO uses buffered IO.
>> Buffered IO means any read or write will use the linux buffercache, and
>> read or writes can be served from the buffercache if possible.
>>
>> So in your case, if you managed to make the database perform identical
>> read or write requests, this could result in a difference of amounts of
>> read and write IOs served from the cache, which can make an enormous
>> amounts of difference for how fast these requests are served. If somehow
>> you managed to make the operating system choose to use the physical IO
>> path, you will see significant amounts time spent on that, which will have
>> IO related wait events.
>>
>> Not a simple answer, but this is how it works.
>>
>> So I would suggest checking the difference between the situation of when
>> it's doing the same which is considered well performing versus badly
>> performing.
>>
>>
>> >
>> > Yours,
>> > Laurenz Albe
>> >
>> >
>>
>>
>

Reply via email to