On 8/22/24 10:50 AM, 陈宗志 wrote:
I disagree with the point made in the article. The article mentions that ‘prevents the kernel from reordering reads and writes to optimize performance,’ which might be referring to the file system’s IO scheduling and merging. However, this can be handled within the database itself, where IO scheduling and merging can be done even better.
The database does not have all the information that the OS has, but that said I suspect that the advantages of direct IO outweigh the disadvantages in this regard. But the only way to know for sure would be fore someone to provide a benchmark.
Regarding ‘does not allow free memory to be used as kernel cache,’ I believe the database itself should manage memory well, and most of the memory should be managed by the database rather than handed over to the operating system. Additionally, the database’s use of the page cache should be restricted.
That all depends on you use case. If the database is running alone or almost alone on a machine direct IO is likely the optional strategy but if more services are running on the same machine (e.g. if you run PostgreSQL on your personal laptop) you want to use buffered IO.
But as far as I know the long term plan of the async IO project is to support both direct and buffered IO so people can pick the right choice for their workload.
Andreas