What's also suspicious is that on the destination node, after copying
finishes,

# explain select 1 from mytable;
 Seq Scan on mytable (cost=0.00..6821514.12 rows=250410012 width=4)

# SELECT relname, n_live_tup FROM pg_stat_user_tables WHERE relname =
'mytable';
   relname   | estimated_rows
-------------+----------------
 mytable     |      150342468

Notice the discrepancy between how many rows EXPLAIN thinks there is in the
table, and how many rows pg_stat_user_tables thinks about it (250M vs.
150M).

On the source node, those 2 numbers are almost the same. It's only on the
destination they are different, right after the copying. (There are not a
lot of writes to this table happening on the source happening while copying
BTW.)

Maybe that could hint on why temp files are used?


On Mon, Feb 3, 2025 at 4:21 AM Dmitry Koterov <dmitry.kote...@gmail.com>
wrote:

> Here is the list of tmp files:
>
> postgres@pg-101a:~/17/main/base/pgsql_tmp$ ls -la
> total 5422297
> drwx------ 2 postgres postgres          9 Feb  3 04:08 .
> drwx------ 8 postgres postgres          8 Jan 29 01:27 ..
> -rw------- 1 postgres postgres 1073741824 Feb  3 04:05 pgsql_tmp196534.0
> -rw------- 1 postgres postgres 1073741824 Feb  3 04:05 pgsql_tmp196534.1
> -rw------- 1 postgres postgres 1073741824 Feb  3 04:05 pgsql_tmp196534.2
> -rw------- 1 postgres postgres 1073741824 Feb  3 04:06 pgsql_tmp196534.3
> -rw------- 1 postgres postgres 1073741824 Feb  3 04:07 pgsql_tmp196534.4
> -rw------- 1 postgres postgres 1073741824 Feb  3 04:08 pgsql_tmp196534.5
> -rw------- 1 postgres postgres  819396608 Feb  3 04:08 pgsql_tmp196534.6
>
> With work_mem=4GB, all those files *on the destination node* seemed to
> appear immediately with 4GB size and keep growing since then, while COPY
> progresses on the source node (i.e. it looked like PG tried hard to utilize
> work_mem, but after reaching the limit, dumped everything to pgsql_tmp
> still).
>
> The table structure being copied (just 1 index there):
>
> CREATE TABLE mytable (
>     id bigint NOT NULL PRIMARY KEY,
>     snippet bytea,
>     title bytea,
>     updated_at timestamp with time zone,
>     rich_snippet bytea
> );
>
> Directories sizes on the destination node while tablesync is working (it's
> copied in to an almost empty database):
>
> $ watch du -sh 17/main/base/* 17/main/pg_wal
> 2.2M    17/main/base/1
> 14G     17/main/base/16385
> 2.3M    17/main/base/16387
> 2.2M    17/main/base/4
> 2.3M    17/main/base/5
> 12G     17/main/base/pgsql_tmp
> 6.3G    17/main/pg_wal
>
> So the question, why does it use temp files. Why not just writes directly
> to WAL+data.
>
>
> On Mon, Feb 3, 2025 at 3:04 AM Dmitry Koterov <dmitry.kote...@gmail.com>
> wrote:
>
>> Hi.
>>
>> 1. Those are temp files on the destination node (where the logical
>> subscription exists and tablesync worker runs), not on the source. On the
>> source, it’s all clear.
>>
>> 2. No “spill” suffix/substring in the file names. I tried to look at the
>> content of these temp files, I I saw some text fragments from the original
>> table’s text column there. I.e. it looks like for some reason, the stream
>> received from the source node’s COPY command goes to that temp files (at
>> least partially).
>>
>> 3. I made several more experiments, increasing work_mem to several GB
>> (for the role which tablesync worker uses when copying) definitely helps
>> with temp files.
>>
>> Thanks!
>>
>> On Sun, Feb 2, 2025 at 19:10 Amit Kapila <amit.kapil...@gmail.com> wrote:
>>
>>> On Sun, Feb 2, 2025 at 5:13 PM Dmitry Koterov <dmitry.kote...@gmail.com>
>>> wrote:
>>> >
>>> > Trying to monitor perf during the initial tablesync phase (COPY) right
>>> after CREATE SUBSCRIPTION. I noticed that the size of
>>> 17/main/base/pgsql_tmp on the destination node grows (tens of gigabytes) as
>>> the COPY command (running internally on the publisher) progresses. Then in
>>> the end (when its "EXPLAIN SELECT 1 FROM tbl" on the destination shows the
>>> approximate number of rows equals to the number of rows on the source node)
>>> it hangs for several minutes, and then 17/main/base/pgsql_tmp empties, and
>>> the subscription progresses.
>>> >
>>> > It seems like if I increase work_mem to several GB, then the growth of
>>> 17/main/base/pgsql_tmp becomes less significant.
>>> >
>>> > Questions:
>>> >
>>> > 1. Are there some diagnostics commands that would allow me to figure
>>> out what is in those tmp files? Why does the subscriber create those tmp
>>> files and not just write directly to the data files and WAL? (The table has
>>> 2 bytea columns, i.e. it's TOASTed for sure.)
>>> >
>>>
>>> We do write spill files (ending with '.spill') if the changes are
>>> large. Can you please share the name of tmp files to avoid any
>>> assumptions?
>>>
>>> --
>>> With Regards,
>>> Amit Kapila.
>>>
>>

Reply via email to