RE: Implementing Incremental View Maintenance
Hi Nagata-san, Sorry for late reply. > However, even if we create triggers recursively on the parents or children, > we would still > need more consideration. This is because we will have to convert the format > of tuple of > modified table to the format of the table specified in the view for cases > that the parent > and some children have different format. > > I think supporting partitioned tables can be left for the next release. OK. I understand. In the v24-patch, creating IVM on partions or partition table is prohibited. It is OK but it should be documented. Perhaps, the following statement describe this. If so, I think the definition of "simple base table" is ambiguous for some users. + IMMVs must be based on simple base tables. It's not supported to + create them on top of views or materialized views. > DEPENDENCY_IMMV was added to clear that a certain trigger is related to IMMV. > We dropped the IVM trigger and its dependencies from IMMV when REFRESH ... > WITH NO DATA > is executed. Without the new deptype, we may accidentally delete a dependency > created > with an intention other than the IVM trigger. OK. I understand. > I think it is harder than you expected. When an IMMV is switched to a normal > materialized view, we needs to drop hidden columns (__ivm_count__ etc.), and > in > the opposite case, we need to create them again. The former (IMMV->IVM) might > be > easer, but for the latter (IVM->IMMV) I wonder we would need to re-create > IMMV. OK. I understand. Regards, Ryohei Takahashi
RE: Implementing Incremental View Maintenance
Hi Nagata-san, > Ok. I'll fix _copyIntoClause() and _equalIntoClause() as well as > _readIntoClause() > and _outIntoClause(). OK. > > ivm=# create table t (c1 int, c2 int); > > CREATE TABLE > > ivm=# create incremental materialized view ivm_t as select distinct c1 from > > t; > > NOTICE: created index "ivm_t_index" on materialized view "ivm_t" > > SELECT 0 > > > > Then I executed pg_dump. > > > > In the dump, the following SQLs appear. > > > > CREATE INCREMENTAL MATERIALIZED VIEW public.ivm_t AS > > SELECT DISTINCT t.c1 > >FROM public.t > > WITH NO DATA; > > > > ALTER TABLE ONLY public.ivm_t > > ADD CONSTRAINT ivm_t_index UNIQUE (c1); > > > > If I execute psql with the result of pg_dump, following error occurs. > > > > ERROR: ALTER action ADD CONSTRAINT cannot be performed on relation > "ivm_t" > > DETAIL: This operation is not supported for materialized views. > > Good catch! It was my mistake creating unique constraints on IMMV in spite of > we cannot defined them via SQL. I'll fix it to use unique indexes instead of > constraints. I checked the same procedure on v24 patch. But following error occurs instead of the original error. ERROR: relation "ivm_t_index" already exists Regards, Ryohei Takahashi
pg_basebackup -F t fails when fsync spends more time than tcp_user_timeout
Hi pg_basebackup -F t fails when fsync spends more time than tcp_user_timeout in following environment. [Environment] Postgres 13dev (master branch) Red Hat Enterprise Postgres 7.4 [Error] $ pg_basebackup -F t --progress --verbose -h -D pg_basebackup: initiating base backup, waiting for checkpoint to complete pg_basebackup: checkpoint completed pg_basebackup: write-ahead log start point: 0/5A60 on timeline 1 pg_basebackup: starting background WAL receiver pg_basebackup: created temporary replication slot "pg_basebackup_15647" pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. [Analysis] - pg_basebackup -F t creates a tar file and does fsync() for each tablespace. (Otherwise, -F p does fsync() only once at the end.) - While doing fsync() for a tar file for one tablespace, wal sender sends the content of the next tablespace. When fsync() spends long time, the tcp socket of pg_basebackup returns "zero window" packets to wal sender. This means the tcp socket buffer of pg_basebackup is exhausted since pg_basebackup cannot receive during fsync(). - The socket of wal sender retries to send the packet, but resets connection after tcp_user_timeout. After wal sender resets connection, pg_basebackup cannot receive data and fails with above error. [Solution] I think fsync() for each tablespace is not necessary. Like pg_basebackup -F p, I think fsync() is necessary only once at the end. Could you give me any comment? Regards, Ryohei Takahashi
RE: pg_basebackup -F t fails when fsync spends more time than tcp_user_timeout
Hi Michael-san, > Attached is a patch to do that, which should go down to v12 where > tcp_user_timeout has been introduced. Takahashi-san, what do you > think? Thank you for creating the patch. This patch is what I expected. I'm not sure whether this patch should be applied to postgres below 11 since I'm not sure whether the OS parameters (ex. tcp_retries2) cause the same error. Regards, Ryohei Takahashi
RE: Support escape sequence for cluster_name in postgres_fdw.application_name
Hi, Thank you for developing this feature. I think adding escape sequence for cluster_name is useful too. > Is the reason for 'C' in upper-case to avoid possible conflict with > 'c' of log_line_prefix? I'm not sure that preventive measure is worth > doing. Looking the escape-sequence spec alone, it seems to me rather > strange that an upper-case letter is used in spite of its lower-case > is not used yet. I think %c of log_line_prefix (Session ID) is also useful for postgres_fdw.application_name. Therefore, how about adding both %c (Session ID) and %C (cluster_name)? Regards, Ryohei Takahashi
RE: Support escape sequence for cluster_name in postgres_fdw.application_name
Hi, Thank you for updating the patch. I agree with the documentation and program. How about adding the test for %c (Session ID)? (Adding the test for %C (cluster_name) seems difficult.) Regards, Ryohei Takahashi
RE: Support escape sequence for cluster_name in postgres_fdw.application_name
Hi Fujii san, Thank you for updating the patch. I have no additional comments. Regards, Ryohei Takahashi
RE: Transactions involving multiple postgres foreign servers, take 2
Hi, I'm interested in this patch and I also run the same test with Ikeda-san's fxact_update.pgbench. In my environment (poor spec VM), the result is following. * foreign_twophase_commit = disabled 363tps * foreign_twophase_commit = required (It is necessary to set -R ${RATE} as Ikeda-san said) 13tps I analyzed the bottleneck using pstack and strace. I noticed that the open() during "COMMIT PREPARED" command is very slow. In my environment the latency of the "COMMIT PREPARED" is 16ms. (On the other hand, the latency of "COMMIT" and "PREPARE TRANSACTION" is 1ms) In the "COMMIT PREPARED" command, open() for wal segment file takes 14ms. Therefore, open() is the bottleneck of "COMMIT PREPARED". Furthermore, I noticed that the backend process almost always open the same wal segment file. In the current patch, the backend process on foreign server which is associated with the connection from the resolver process always run "COMMIT PREPARED" command. Therefore, the wal segment file of the current "COMMIT PREPARED" command probably be the same with the previous "COMMIT PREPARED" command. In order to improve the performance of the resolver process, I think it is useful to skip closing wal segment file during the "COMMIT PREPARED" and reuse file descriptor. Is it possible? Regards, Ryohei Takahashi
RE: Multi-Master Logical Replication
Hi, In addition to the use cases mentioned above, some users want to use n-way replication of partial database. The following is the typical use case. * There are several data centers. (ex. Japan and India) * The database in each data center has its unique data. (ex. the database in Japan has the data related to Japan) * There are some common data. (ex. the shipment data from Japan to India should be stored on both database) * To replicate common data, users want to use n-way replication. The current POC patch seems to support only n-way replication of entire database, but I think we should support n-way replication of partial database to achieve above use case. > I don't know if it requires the kind of code you are thinking but I > agree that it is worth considering implementing it as an extension. I think the other advantage to implement as an extension is that users could install the extension to older Postgres. As mentioned in previous email, the one use case of n-way replication is migration from older Postgres to newer Postgres. If we implement as an extension, users could use n-way replication for migration from PG10 to PG16. Regards, Ryohei Takahashi
RE: Multi-Master Logical Replication
Hi Kuroda san, > I think even if LRG is implemented as contrib modules or any extensions, > it will deeply depend on the subscription option "origin" proposed in [1]. > So LRG cannot be used for older version, only PG16 or later. Sorry, I misunderstood. I understand now. Regards, Ryohei Takahashi
RE: Implementing Incremental View Maintenance
Hi Nagata-san, I'm still reading the patch. I have additional comments. (1) In v23-0001-Add-a-syntax-to-create-Incrementally-Maintainabl.patch, ivm member is added to IntoClause struct. I think it is necessary to modify _copyIntoClause() and _equalIntoClause() functions. (2) By executing pg_dump with v23-0005-Add-Incremental-View-Maintenance-support-to-pg_d.patch, the constraint which is automatically created during "CREATE INCREMENTAL MATERIALIZED VIEW" is also dumped. This cause error during recovery as follows. ivm=# create table t (c1 int, c2 int); CREATE TABLE ivm=# create incremental materialized view ivm_t as select distinct c1 from t; NOTICE: created index "ivm_t_index" on materialized view "ivm_t" SELECT 0 Then I executed pg_dump. In the dump, the following SQLs appear. CREATE INCREMENTAL MATERIALIZED VIEW public.ivm_t AS SELECT DISTINCT t.c1 FROM public.t WITH NO DATA; ALTER TABLE ONLY public.ivm_t ADD CONSTRAINT ivm_t_index UNIQUE (c1); If I execute psql with the result of pg_dump, following error occurs. ERROR: ALTER action ADD CONSTRAINT cannot be performed on relation "ivm_t" DETAIL: This operation is not supported for materialized views. Regards, Ryohei Takahashi
RE: Transactions involving multiple postgres foreign servers, take 2
Hi Sawada-san, Thank you for your reply. > Not sure but it might be possible to keep holding an xlogreader for > reading PREPARE WAL records even after the transaction commit. But I > wonder how much open() for wal segment file accounts for the total > execution time of 2PC. 2PC requires 2 network round trips for each > participant. For example, if it took 500ms in total, we would not get > benefits much from the point of view of 2PC performance even if we > improved it from 14ms to 1ms. I made the patch based on your advice and re-run the test on the new machine. (The attached patch is just for test purpose.) * foreign_twophase_commit = disabled 2686tps * foreign_twophase_commit = required (It is necessary to set -R ${RATE} as Ikeda-san said) 311tps * foreign_twophase_commit = required with attached patch (It is not necessary to set -R ${RATE}) 2057tps This indicate that if we can reduce the number of times to open() wal segment file during "COMMIT PREPARED", the performance can be improved. This patch can skip closing wal segment file, but I don't know when we should close. One idea is to close when the wal segment file is recycled, but it seems difficult for backend process to do so. BTW, in previous discussion, "Send COMMIT PREPARED remote servers in bulk" is proposed. I imagined the new SQL interface like "COMMIT PREPARED 'prep_1', 'prep_2', ... 'prep_n'". If we can open wal segment file during bulk COMMIT PREPARED, we can not only reduce the times of communication, but also reduce the times of open() wal segment file. Regards, Ryohei Takahashi Hold_xlogreader.patch Description: Hold_xlogreader.patch
RE: Transactions involving multiple postgres foreign servers, take 2
Hi, > Wouldn't it be better to explicitly initialize the pointer with NULL? Thank you for your advice. You are correct. Anyway, I fixed it and re-run the performance test, it of course does not affect tps. Regards, Ryohei Takahashi
Speed up COMMIT PREPARED
Hi, I noticed that COMMIT PREPARED command is slow in the discussion [1]. First, I made the following simple script for pgbench. ``` prepare.pgbench \set id random(1, 100) BEGIN; UPDATE test_table SET md5 = md5(clock_timestamp()::text) WHERE id = :id; PREPARE TRANSACTION 'prep_:client_id'; COMMIT PREPARED 'prep_:client_id'; ``` I run the pgbench as follows. ``` pgbench -f prepare.pgbench -c 1 -j 1 -T 60 -d postgres -r ``` The result is following. tps:287.259 Latency: UPDATE0.207ms PREPARE TRANSACTION 0.212ms COMMIT PREPARED 2.982ms Next, I analyzed the bottleneck using pstack and strace. I noticed that the open() during COMMIT PREPARED takes 2.7ms. Furthermore, I noticed that the backend process almost always open the same wal segment file. When COMMIT PREPARED command, there are two ways to find 2PC state data. - If it is stored in wal segment file, open and read wal segment file. - If not, read 2PC state file The above script runs COMMIT PREPARED command just after PREPARE TRANSACTION command. I think it also won't take long time for XA transaction to run COMMIT PREPARED command after running PREPARE TRANSACTION command. Therefore, I think that the wal segment file which is opened during COMMIT PREPARED probably be the current wal segment file. To speed up COMMIT PREPARED command, I made two patches for test. (1) Hold_xlogreader.patch Skip closing wal segment file after COMMIT PREPARED command completed. If the next COMMIT PREPARED command use the same wal segment file, it is fast since the process need not to open wal segment file. However, I don't know when we should close the wal segment file. Moreover, it might not be useful when COMMIT PREPARED command is run not so often and use different wal segment file each time. tps:1750.81 Latency: UPDATE0.156ms PREPARE TRANSACTION 0.184ms COMMIT PREPARED 0.179ms (2) Read_from_walbuffer.patch Read the data from wal buffer if there are still in wal buffer. If COMMIT PREPARED command is run just after PREPARE TRANSACTION command, the wal may be in the wal buffer. However, the period which the wal is in the wal buffer is not so long since wal writer recycle the wal buffer soon. Moreover, it may affect the other performance such as UPDATE since it needs to take lock on wal buffer. tps:446.371 Latency: UPDATE0.187ms PREPARE TRANSACTION 0.196ms COMMIT PREPARED 1.974ms Which approach do you think better? [1] https://www.postgresql.org/message-id/20191206.173215.1818665441859410805.horikyota.ntt%40gmail.com Regards, Ryohei Takahashi Hold_xlogreader.patch Description: Hold_xlogreader.patch Read_from_walbuffer.patch Description: Read_from_walbuffer.patch
RE: Transactions involving multiple postgres foreign servers, take 2
Hi Sawada-san, Thank you for your reply. > BTW did you test on the local? That is, the foreign servers are > located on the same machine? Yes, I tested on the local since I cannot prepare the good network now. > I guess it would be better to start a new thread for this improvement. Thank you for your advice. I started a new thread [1]. > What if we successfully committed 'prep_1' but an error happened > during committing another one for some reason (i.g., corrupted 2PC > state file, OOM etc)? We might return an error to the client but have > already committed 'prep_1'. Sorry, I don't have good idea now. I imagined the command returns the list of the transaction id which ends with error. [1] https://www.postgresql.org/message-id/OS0PR01MB56828019B25CD5190AB6093282129%40OS0PR01MB5682.jpnprd01.prod.outlook.com Regards, Ryohei Takahashi
RE: Speed up COMMIT PREPARED
I noticed that the previous Read_from_walbuffer.patch has a mistake in xlogreader.c. Could you please use the attached v2 patch? The performance result of the previous mail is the result of v2 patch. Regards, Ryohei Takahashi v2-Read_from_walbuffer.patch Description: v2-Read_from_walbuffer.patch
RE: Speed up COMMIT PREPARED
Hi, I noticed that anti-virus software slow down the open(). I stopped the anti-virus software and re-run the test. (Average of 10 times) master: 1924tps Hold_xlogreader.patch: 1993tps (+3.5%) Read_from_walbuffer.patch: 1954tps(+1.5%) Therefore, the effect of my patch is limited. I'm sorry for the confusion. Regards, Ryohei Takahashi
RE: Implementing Incremental View Maintenance
Hi Nagata-san, I am interested in this patch since it is good feature. I run some simple tests. I found the following problems. (1) Failed to "make world". I think there are extra "" in doc/src/sgml/ref/create_materialized_view.sgml (line 110 and 117) (2) In the case of partition, it seems that IVM does not work well. I run as follows. postgres=# create table parent (c int) partition by range (c); CREATE TABLE postgres=# create table child partition of parent for values from (1) to (100); CREATE TABLE postgres=# create incremental materialized view ivm_parent as select c from parent; NOTICE: could not create an index on materialized view "ivm_parent" automatically HINT: Create an index on the materialized view for efficient incremental maintenance. SELECT 0 postgres=# create incremental materialized view ivm_child as select c from child; NOTICE: could not create an index on materialized view "ivm_child" automatically HINT: Create an index on the materialized view for efficient incremental maintenance. SELECT 0 postgres=# insert into parent values (1); INSERT 0 1 postgres=# insert into child values (2); INSERT 0 1 postgres=# select * from parent; c --- 1 2 (2 rows) postgres=# select * from child; c --- 1 2 (2 rows) postgres=# select * from ivm_parent; c --- 1 (1 row) postgres=# select * from ivm_child; c --- 2 (1 row) I think ivm_parent and ivm_child should return 2 rows. (3) I think IVM does not support foreign table, but try to make IVM. postgres=# create incremental materialized view ivm_foreign as select c from foreign_table; NOTICE: could not create an index on materialized view "ivm_foreign" automatically HINT: Create an index on the materialized view for efficient incremental maintenance. ERROR: "foreign_table" is a foreign table DETAIL: Triggers on foreign tables cannot have transition tables. It finally failed to make IVM, but I think it should be checked more early. Regards, Ryohei Takahashi
RE: Implementing Incremental View Maintenance
Hi Nagata-san, Thank you for your reply. > I'll investigate this more, but we may have to prohibit views on partitioned > table and partitions. I think this restriction is strict. This feature is useful when the base table is large and partitioning is also useful in such case. I have several additional comments on the patch. (1) The following features are added to transition table. - Prolong lifespan of transition table - If table has row security policies, set them to the transition table - Calculate pre-state of the table Are these features only for IVM? If there are other useful case, they should be separated from IVM patch and should be independent patch for transition table. (2) DEPENDENCY_IMMV (m) is added to deptype of pg_depend. What is the difference compared with existing deptype such as DEPENDENCY_INTERNAL (i)? (3) Converting from normal materialized view to IVM or from IVM to normal materialized view is not implemented yet. Is it difficult? I think create/drop triggers and __ivm_ columns can achieve this feature. Regards, Ryohei Takahashi