What's also suspicious is that on the destination node, after copying finishes,
# explain select 1 from mytable; Seq Scan on mytable (cost=0.00..6821514.12 rows=250410012 width=4) # SELECT relname, n_live_tup FROM pg_stat_user_tables WHERE relname = 'mytable'; relname | estimated_rows -------------+---------------- mytable | 150342468 Notice the discrepancy between how many rows EXPLAIN thinks there is in the table, and how many rows pg_stat_user_tables thinks about it (250M vs. 150M). On the source node, those 2 numbers are almost the same. It's only on the destination they are different, right after the copying. (There are not a lot of writes to this table happening on the source happening while copying BTW.) Maybe that could hint on why temp files are used? On Mon, Feb 3, 2025 at 4:21 AM Dmitry Koterov <dmitry.kote...@gmail.com> wrote: > Here is the list of tmp files: > > postgres@pg-101a:~/17/main/base/pgsql_tmp$ ls -la > total 5422297 > drwx------ 2 postgres postgres 9 Feb 3 04:08 . > drwx------ 8 postgres postgres 8 Jan 29 01:27 .. > -rw------- 1 postgres postgres 1073741824 Feb 3 04:05 pgsql_tmp196534.0 > -rw------- 1 postgres postgres 1073741824 Feb 3 04:05 pgsql_tmp196534.1 > -rw------- 1 postgres postgres 1073741824 Feb 3 04:05 pgsql_tmp196534.2 > -rw------- 1 postgres postgres 1073741824 Feb 3 04:06 pgsql_tmp196534.3 > -rw------- 1 postgres postgres 1073741824 Feb 3 04:07 pgsql_tmp196534.4 > -rw------- 1 postgres postgres 1073741824 Feb 3 04:08 pgsql_tmp196534.5 > -rw------- 1 postgres postgres 819396608 Feb 3 04:08 pgsql_tmp196534.6 > > With work_mem=4GB, all those files *on the destination node* seemed to > appear immediately with 4GB size and keep growing since then, while COPY > progresses on the source node (i.e. it looked like PG tried hard to utilize > work_mem, but after reaching the limit, dumped everything to pgsql_tmp > still). > > The table structure being copied (just 1 index there): > > CREATE TABLE mytable ( > id bigint NOT NULL PRIMARY KEY, > snippet bytea, > title bytea, > updated_at timestamp with time zone, > rich_snippet bytea > ); > > Directories sizes on the destination node while tablesync is working (it's > copied in to an almost empty database): > > $ watch du -sh 17/main/base/* 17/main/pg_wal > 2.2M 17/main/base/1 > 14G 17/main/base/16385 > 2.3M 17/main/base/16387 > 2.2M 17/main/base/4 > 2.3M 17/main/base/5 > 12G 17/main/base/pgsql_tmp > 6.3G 17/main/pg_wal > > So the question, why does it use temp files. Why not just writes directly > to WAL+data. > > > On Mon, Feb 3, 2025 at 3:04 AM Dmitry Koterov <dmitry.kote...@gmail.com> > wrote: > >> Hi. >> >> 1. Those are temp files on the destination node (where the logical >> subscription exists and tablesync worker runs), not on the source. On the >> source, it’s all clear. >> >> 2. No “spill” suffix/substring in the file names. I tried to look at the >> content of these temp files, I I saw some text fragments from the original >> table’s text column there. I.e. it looks like for some reason, the stream >> received from the source node’s COPY command goes to that temp files (at >> least partially). >> >> 3. I made several more experiments, increasing work_mem to several GB >> (for the role which tablesync worker uses when copying) definitely helps >> with temp files. >> >> Thanks! >> >> On Sun, Feb 2, 2025 at 19:10 Amit Kapila <amit.kapil...@gmail.com> wrote: >> >>> On Sun, Feb 2, 2025 at 5:13 PM Dmitry Koterov <dmitry.kote...@gmail.com> >>> wrote: >>> > >>> > Trying to monitor perf during the initial tablesync phase (COPY) right >>> after CREATE SUBSCRIPTION. I noticed that the size of >>> 17/main/base/pgsql_tmp on the destination node grows (tens of gigabytes) as >>> the COPY command (running internally on the publisher) progresses. Then in >>> the end (when its "EXPLAIN SELECT 1 FROM tbl" on the destination shows the >>> approximate number of rows equals to the number of rows on the source node) >>> it hangs for several minutes, and then 17/main/base/pgsql_tmp empties, and >>> the subscription progresses. >>> > >>> > It seems like if I increase work_mem to several GB, then the growth of >>> 17/main/base/pgsql_tmp becomes less significant. >>> > >>> > Questions: >>> > >>> > 1. Are there some diagnostics commands that would allow me to figure >>> out what is in those tmp files? Why does the subscriber create those tmp >>> files and not just write directly to the data files and WAL? (The table has >>> 2 bytea columns, i.e. it's TOASTed for sure.) >>> > >>> >>> We do write spill files (ending with '.spill') if the changes are >>> large. Can you please share the name of tmp files to avoid any >>> assumptions? >>> >>> -- >>> With Regards, >>> Amit Kapila. >>> >>