in addition to "DataFileRead", actually we have more session waiting on
"extend", and we enabled log_lock, for example
2025-06-24 18:00:11.368 xxxx:[1865315]:[4-1]:mbsLOG:  process 1865315 still
waiting for ExclusiveLock on extension of relation 14658239 of database
16384 after 1000.161 ms
2025-06-24 18:00:11.368 xxxx:mbs@pgaaweb:[1865315]:[5-1]:mbsDETAIL:
 Process holding the lock: 1748100. Wait queue: 232658, 606235, 1865315,
1939296, 1368901, 2472806, 2072266, 963893, 1304751, 1084094, 875968,
451652, 672314, 1240983, 2085501, 851401, 2043926, 2522611, 889816, 191663,
1090944, 536057, 1810868, 1240510, 1195564, 1067298.

see both "DataFileRead" and "extend" , each time last 2 seconds and
automatically recoverred.  for this "extend" wait_event, in addtion IO, OS
page_cache and PG buffer cache, or bgwriter,checkpointer could be impact
that too?

Thanks,

James

James Pang <jamespang...@gmail.com> 於 2025年6月26日週四 下午2:47寫道:

>    we faced this issue 3 times this week, each time last only 2 seconds,
> so not easy to run perf in peak business time to capture that, anyway, I
> will try. before that, I want to understand if "os page cache" or "pg
> buffer cache" can contribute to the wait_event time "extend" and
> "DataFileRead", or bgwriter ,checkpoint flushing data to disk can impact
> that too ?  we enable bgwriter , and we see checkpointer get scheduled by
> "wal" during the time, so I just increased max_wal_size to make checkpoint
> scheduled in longer time.
>
> Thanks,
>
> James
>
>
> Frits Hoogland <frits.hoogl...@gmail.com> 於 2025年6月26日週四 下午2:40寫道:
>
>> Okay. So it's a situation that is reproducable.
>> And like was mentioned, the system time (percentage) is very high.
>> Is this a physical machine, or a virtual machine?
>>
>> The next thing to do, is use perf to record about 20 seconds or so during
>> a period of time when you see this behavior (perf record -g, taking the
>> backtrace with it).
>> This records (samples) the backtraces of on cpu tasks, from which you
>> then can derive what they are doing, for which you should see lots of tasks
>> in kernel space, and what that is, using perf report.
>>
>> *Frits Hoogland*
>>
>>
>>
>>
>> On 26 Jun 2025, at 04:32, James Pang <jamespang...@gmail.com> wrote:
>>
>> thans for you suggestions, we have iowait from sar command too, copy
>> here, checking with infra team not found abnormal IO activities either.
>> 02:00:01 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal
>>  %guest  %gnice   %idle
>> 02:00:03 PM  all         15.92    0.00   43.02    0.65    0.76    2.56
>>  0.00    0.00    0.00   37.09
>> 02:00:03 PM    0         17.59    0.00   46.73    1.01    0.50    0.50
>>  0.00    0.00    0.00   33.67
>> 02:00:03 PM    1          9.50     0.00   61.50    0.50    0.50    1.00
>>  0.00    0.00    0.00   27.00
>> 02:00:03 PM    2          20.71    0.00   44.44    1.01    0.51    0.51
>>  0.00    0.00    0.00   32.83
>> 02:00:03 PM    3         14.00    0.00   51.50    2.00    1.00    1.00
>>  0.00    0.00    0.00   30.50
>> 02:00:03 PM    4          6.57     0.00   52.53    0.51    0.51    3.54
>>  0.00    0.00    0.00   36.36
>> 02:00:03 PM    5          10.20    0.00   49.49    1.02    1.53    0.00
>>  0.00    0.00    0.00   37.76
>> 02:00:03 PM    6          27.64    0.00   41.21    0.50    0.50    0.50
>>  0.00    0.00    0.00   29.65
>> 02:00:03 PM    7          9.05    0.00   50.75    0.50    1.01    0.50
>>  0.00    0.00    0.00   38.19
>> 02:00:03 PM    8          12.18    0.00   49.75    0.51    0.51    0.51
>>  0.00    0.00    0.00   36.55
>> 02:00:03 PM    9          13.00    0.00    9.50    0.50    1.50   15.50
>>  0.00    0.00    0.00   60.00
>> 02:00:03 PM   10         15.58    0.00   46.23    0.00    0.50    0.50
>>  0.00    0.00    0.00   37.19
>> 02:00:03 PM   11          20.71    0.00   10.10    0.00    1.01   14.14
>>  0.00    0.00    0.00   54.04
>> 02:00:03 PM   12         21.00    0.00   37.00    0.50    1.00    1.00
>>  0.00    0.00    0.00   39.50
>> 02:00:03 PM   13         13.57    0.00   45.73    1.01    1.01    1.01
>>  0.00    0.00    0.00   37.69
>> 02:00:03 PM   14   18.18    0.00   39.39    1.01    0.51    0.51    0.00
>>    0.00    0.00   40.40
>> 02:00:03 PM   15   14.00    0.00   49.50    0.50    0.50    3.50    0.00
>>    0.00    0.00   32.00
>> 02:00:03 PM   16   19.39    0.00   39.80    1.02    1.53    0.51    0.00
>>    0.00    0.00   37.76
>> 02:00:03 PM   17   16.75    0.00   45.18    1.52    1.02    2.54    0.00
>>    0.00    0.00   32.99
>> 02:00:03 PM   18   12.63    0.00   50.00    0.00    1.01    0.00    0.00
>>    0.00    0.00   36.36
>> 02:00:03 PM   19    5.56    0.00   82.32    0.00    0.00    0.00    0.00
>>    0.00    0.00   12.12
>> 02:00:03 PM   20   15.08    0.00   48.24    0.50    0.50    3.52    0.00
>>    0.00    0.00   32.16
>> 02:00:03 PM   21   17.68    0.00    9.09    0.51    1.52   13.64    0.00
>>    0.00    0.00   57.58
>> 02:00:03 PM   22   13.13    0.00   43.94    0.51    0.51    0.51    0.00
>>    0.00    0.00   41.41
>> 02:00:03 PM   23   14.07    0.00   42.71    0.50    0.50    0.50    0.00
>>    0.00    0.00   41.71
>> 02:00:03 PM   24   13.13    0.00   41.92    1.01    0.51    0.51    0.00
>>    0.00    0.00   42.93
>> 02:00:03 PM   25   16.58    0.00   47.74    0.50    1.01    0.50    0.00
>>    0.00    0.00   33.67
>> 02:00:03 PM   26   16.58    0.00   46.73    0.50    1.01    0.50    0.00
>>    0.00    0.00   34.67
>> 02:00:03 PM   27   45.50    0.00   54.50    0.00    0.00    0.00    0.00
>>    0.00    0.00    0.00
>> 02:00:03 PM   28    6.06    0.00   32.32    0.00    0.51   13.13    0.00
>>    0.00    0.00   47.98
>> 02:00:03 PM   29   13.93    0.00   44.78    1.00    1.00    0.50    0.00
>>    0.00    0.00   38.81
>> 02:00:03 PM   30   11.56    0.00   57.79    0.00    0.50    1.01    0.00
>>    0.00    0.00   29.15
>> 02:00:03 PM   31   33.85    0.00    9.23    0.51    1.54    0.51    0.00
>>    0.00    0.00   54.36
>> 02:00:03 PM   32   30.15    0.00   41.71    0.50    0.50    1.51    0.00
>>    0.00    0.00   25.63
>>
>> Thanks,
>>
>> James
>>
>> Frits Hoogland <frits.hoogl...@gmail.com> 於 2025年6月25日週三 下午10:27寫道:
>>
>>>
>>>
>>> > On 25 Jun 2025, at 07:59, Laurenz Albe <laurenz.a...@cybertec.at>
>>> wrote:
>>> >
>>> > On Wed, 2025-06-25 at 11:15 +0800, James Pang wrote:
>>> >> pgv14, RHEL8, xfs , we suddenly see tens of sessions waiting on
>>> "DataFileRead" and
>>> >> "extend", it last about 2 seconds(based on pg_stat_activity query) ,
>>> during the
>>> >> waiting time, "%sys" cpu increased to 80% , but from "iostat" , no
>>> high iops and
>>> >> io read/write latency increased either.
>>> >
>>> > Run "sar -P all 1" and see if "%iowait" is high.
>>> I would (strongly) advise against the use of iowait as an indicator. It
>>> is a kernel approximation of time spent in IO from which cannot be use used
>>> in any sensible way other than possibly you're doing IO.
>>> First of all, iowait is not a kernel state, and therefore it's taken
>>> from idle. This means that if there is no, or too little, idle time, iowait
>>> that should be there is gone.
>>> Second, the calculation to transfer idle time to iowait is done for
>>> synchronous IO calls only. Which currently is not a problem for postgres
>>> because it uses exactly that, but in the future it might.
>>> Very roughly put, what the kernel does is keep a counter of tasks
>>> currently in certain system IO calls, and then try to express that using
>>> iowait. The time in IO wait can't be used calculate any IO facts.
>>>
>>> In that sense, it puts it in the same area as the load figure:
>>> indicative, but mostly useless because it doesn't give you any facts about
>>> what it is expressing.
>>> >
>>> > Check if you have transparent hugepages enabled:
>>> >
>>> >  cat /sys/kernel/mm/transparent_hugepage/enabled
>>> >
>>> > If they are enabled, disable them and see if it makes a difference.
>>> >
>>> > I am only guessing here.
>>> Absolutely. Anything that is using signficant amounts of memory and is
>>> not created to take advantage of transparent hugepages will probably
>>> experience more downsides from THP than it helps.
>>> >
>>> >> many sessions were running same "DELETE FROM xxxx" in parallel
>>> waiting on "extend"
>>> >> and "DataFileRead", there are triggers in this table "After delete"
>>> to insert/delete
>>> >> other tables in the tigger.
>>> >
>>> > One thing that almost certainly would improve your situation is to run
>>> fewer
>>> > concurrent statements, for example by using a reasonably sized
>>> connection pool.
>>> This is true if the limits of the IO device, or anything towards to IO
>>> device or devices are hit.
>>> And in general, high "%sys", alias lots of time spent in kernel mode
>>> alias system time indicates lots of time spent in system calls, which is
>>> what the read and write calls in postgres are.
>>> Therefore these figures suggest blocking for IO, for which Laurenz'
>>> advise to lower the amount of concurrent sessions doing IO in general makes
>>> sense.
>>> A more nuanced analysis: if IO requests get queued, these will wait in
>>> 'D' state in linux, which by definition is off cpu, and thus do not spent
>>> cpu (system/kernel) time.
>>>
>>> What sounds suspicious is that you indicate you indicate there is you
>>> see no signficant change in the amount of IO in iostat.
>>>
>>> In order to understand this, you will have to first carefully find the
>>> actual IO physical IO devices that you are using for postgres IO.
>>> In current linux this can be tricky, depending on how the hardware or
>>> virtual machine looks like, and how the disks are arranged in linux.
>>> What you need to determine is which actual disk devices are used, and
>>> what their limits are.
>>> Limits for any disk are IOPS (operations per second) and MBPS (megabytes
>>> per second -> bandwdith).
>>>
>>> There is an additional thing to realize, which makes this really tricky:
>>> postgres for common IO uses buffered IO.
>>> Buffered IO means any read or write will use the linux buffercache, and
>>> read or writes can be served from the buffercache if possible.
>>>
>>> So in your case, if you managed to make the database perform identical
>>> read or write requests, this could result in a difference of amounts of
>>> read and write IOs served from the cache, which can make an enormous
>>> amounts of difference for how fast these requests are served. If somehow
>>> you managed to make the operating system choose to use the physical IO
>>> path, you will see significant amounts time spent on that, which will have
>>> IO related wait events.
>>>
>>> Not a simple answer, but this is how it works.
>>>
>>> So I would suggest checking the difference between the situation of when
>>> it's doing the same which is considered well performing versus badly
>>> performing.
>>>
>>>
>>> >
>>> > Yours,
>>> > Laurenz Albe
>>> >
>>> >
>>>
>>>
>>

Reply via email to