On 2020/07/09 15:17, osumi.takami...@fujitsu.com wrote:
Fujii-san
Thank you for your interest in this idea.
This feature can work safely with wal_level=replica or logical?
Or it can work only with wal_level=minimal?
If yes, what is the main difference
between this method and wal_skip_threshold?
I'm thinking this feature can be used
when you set any parameters of wal_level.
Besides that, data loading into a table *with some data*
should be allowed. This means I don't want to limit
the usage of this feature only for initial load
for empty table or under condition of 'minimal' wal_level in other words.
Let me explain more detail of the background.
I got a report that one of my customers says that
multiple COPY from multiple sessions have a bottleneck to write WAL.
Her use case was DWH system using postgres mainly to load dozens of GB (or
more) log data
from multiple data sources to execute night batch processing everyday.
Her scenario included both initial load to empty table
and load to table that already has records.
Yes, I understand this use case.
In passing, she also used our company's product of parallel loader,
to load data with dozens of, nearly 100, BGWs at the same time.
Through investigation of iostat,
they found the same problem that CPU worked for WAL write intensively.
This could happen after the implementation
of Parallel copy that is really hotly discussed and reviewed in the mailing
lists now.
So I thought it's good to discuss this in advance.
4. Execute the data loading, bypassing WAL generation for data.
5. Sync the data to disk by performing checkpoint.
What happens if the server crashes before #5? Since no WAL for data-loading can
be replayed, the target table should be truncated?
My answer for this is just to load that COPY data again.
It's because the application itself knows what kind of data was loaded
from the command.
When the server crashes before #5, some table and index pages that #4 loaded
the data into might have been partially synced to the disk because of bgwriter
or shared buffer replacement. So ISTM that the target table needs to be
truncated to empty during recovery and users need to load whole the data into
the table again. Is my understanding right? If yes, isn't this the same feature
as that UNLOGGED table provides?
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION