回复: 回复: [PATCH] BUG FIX: Core dump could happen when VACUUM FULL in standalone mode
I think you are right after reading code in compute_parallel_vacuum_workers() :) 发件人: Tom Lane 发送时间: 2020年12月1日 2:54 收件人: Yulin PEI 抄送: Masahiko Sawada ; pgsql-hackers@lists.postgresql.org 主题: Re: 回复: [PATCH] BUG FIX: Core dump could happen when VACUUM FULL in standalone mode Yulin PEI writes: > Yes, I agree because (IsNormalProcessingMode() ) means that current process > is not in bootstrap mode and postmaster process will not build index. > So my new modified patch is attached. This is a good catch, but the proposed fix still seems pretty random and unlike how it's done elsewhere. It seems to me that since index_build() is relying on plan_create_index_workers() to assess parallel safety, that's where to check IsUnderPostmaster. Moreover, the existing code in compute_parallel_vacuum_workers (which gets this right) associates the IsUnderPostmaster check with the initial check on max_parallel_maintenance_workers. So I think that the right fix is to adopt the compute_parallel_vacuum_workers coding in plan_create_index_workers, and thereby create a model for future uses of max_parallel_maintenance_workers to follow. regards, tom lane
[PATCH] BUG FIX: Core dump could happen when VACUUM FULL in standalone mode
Hi hackers, I found that when running command vaccum full in stand-alone mode there will be a core dump. The core stack looks like this: backend> vacuum full TRAP: FailedAssertion("IsUnderPostmaster", File: "dsm.c", Line: 439) ./postgres(ExceptionalCondition+0xac)[0x55c664c36913] ./postgres(dsm_create+0x3c)[0x55c664a79fee] ./postgres(GetSessionDsmHandle+0xdc)[0x55c6645f8296] ./postgres(InitializeParallelDSM+0xf9)[0x55c6646c59ca] ./postgres(+0x16bdef)[0x55c664692def] ./postgres(+0x16951e)[0x55c66469051e] ./postgres(btbuild+0xcc)[0x55c6646903ec] ./postgres(index_build+0x322)[0x55c664719749] ./postgres(reindex_index+0x2ee)[0x55c66471a765] ./postgres(reindex_relation+0x1e5)[0x55c66471acca] ./postgres(finish_heap_swap+0x118)[0x55c6647c8db1] ./postgres(+0x2a0848)[0x55c6647c7848] ./postgres(cluster_rel+0x34e)[0x55c6647c727f] ./postgres(+0x3414cf)[0x55c6648684cf] ./postgres(vacuum+0x4f3)[0x55c664866591] ./postgres(ExecVacuum+0x736)[0x55c66486609b] ./postgres(standard_ProcessUtility+0x840)[0x55c664ab74fc] ./postgres(ProcessUtility+0x131)[0x55c664ab6cb5] ./postgres(+0x58ea69)[0x55c664ab5a69] ./postgres(+0x58ec95)[0x55c664ab5c95] ./postgres(PortalRun+0x307)[0x55c664ab5184] ./postgres(+0x587ef6)[0x55c664aaeef6] ./postgres(PostgresMain+0x819)[0x55c664ab3271] ./postgres(main+0x2e1)[0x55c6648f9df5] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7f65376192e1] ./postgres(_start+0x2a)[0x55c6645df86a] Aborted (core dumped) I think that when there is a btree index in a table, vacuum full tries to rebuild the btree index in a parallel way. This will launch several workers and each of then will try to apply a dynamic shared memory segment, which is not allowed in stand-alone mode. I think it is better not to use btree index build in parallel in stand-alone mode. My patch is attached below. Best Regards! Yulin PEI standalone_vacuum_full.patch Description: standalone_vacuum_full.patch
回复: [PATCH] BUG FIX: Core dump could happen when VACUUM FULL in standalone mode
Yes, I agree because (IsNormalProcessingMode() ) means that current process is not in bootstrap mode and postmaster process will not build index. So my new modified patch is attached. 发件人: Masahiko Sawada 发送时间: 2020年11月30日 17:27 收件人: Yulin PEI 抄送: pgsql-hackers@lists.postgresql.org 主题: Re: [PATCH] BUG FIX: Core dump could happen when VACUUM FULL in standalone mode On Mon, Nov 30, 2020 at 5:45 PM Yulin PEI mailto:ype...@connect.ust.hk>> wrote: Hi hackers, I found that when running command vaccum full in stand-alone mode there will be a core dump. The core stack looks like this: backend> vacuum full TRAP: FailedAssertion("IsUnderPostmaster", File: "dsm.c", Line: 439) ./postgres(ExceptionalCondition+0xac)[0x55c664c36913] ./postgres(dsm_create+0x3c)[0x55c664a79fee] ./postgres(GetSessionDsmHandle+0xdc)[0x55c6645f8296] ./postgres(InitializeParallelDSM+0xf9)[0x55c6646c59ca] ./postgres(+0x16bdef)[0x55c664692def] ./postgres(+0x16951e)[0x55c66469051e] ./postgres(btbuild+0xcc)[0x55c6646903ec] ./postgres(index_build+0x322)[0x55c664719749] ./postgres(reindex_index+0x2ee)[0x55c66471a765] ./postgres(reindex_relation+0x1e5)[0x55c66471acca] ./postgres(finish_heap_swap+0x118)[0x55c6647c8db1] ./postgres(+0x2a0848)[0x55c6647c7848] ./postgres(cluster_rel+0x34e)[0x55c6647c727f] ./postgres(+0x3414cf)[0x55c6648684cf] ./postgres(vacuum+0x4f3)[0x55c664866591] ./postgres(ExecVacuum+0x736)[0x55c66486609b] ./postgres(standard_ProcessUtility+0x840)[0x55c664ab74fc] ./postgres(ProcessUtility+0x131)[0x55c664ab6cb5] ./postgres(+0x58ea69)[0x55c664ab5a69] ./postgres(+0x58ec95)[0x55c664ab5c95] ./postgres(PortalRun+0x307)[0x55c664ab5184] ./postgres(+0x587ef6)[0x55c664aaeef6] ./postgres(PostgresMain+0x819)[0x55c664ab3271] ./postgres(main+0x2e1)[0x55c6648f9df5] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7f65376192e1] ./postgres(_start+0x2a)[0x55c6645df86a] Aborted (core dumped) I think that when there is a btree index in a table, vacuum full tries to rebuild the btree index in a parallel way. This will launch several workers and each of then will try to apply a dynamic shared memory segment, which is not allowed in stand-alone mode. I think it is better not to use btree index build in parallel in stand-alone mode. My patch is attached below. Good catch. This is a bug in parallel index(btree) creation. I could reproduce this assertion failure with HEAD even by using CREATE INDEX command. - indexRelation->rd_rel->relam == BTREE_AM_OID) + indexRelation->rd_rel->relam == BTREE_AM_OID && IsPostmasterEnvironment && !IsBackendStandAlone()) +#define IsBackendStandAlone() (!IsBootstrapProcessingMode() && !IsPostmasterEnvironment) /* * Auxiliary-process type identifiers. These used to be in bootstrap.h * but it seems saner to have them here, with the ProcessingMode stuff. I think we can use IsUnderPostmaster instead. If it's false we should not enable parallel index creation. Regards, -- Masahiko Sawada EnterpriseDB: https://www.enterprisedb.com/ standalone_vacuum_full.patch Description: standalone_vacuum_full.patch
Core dump happens when execute sql CREATE VIEW v1(c1) AS (SELECT ('4' COLLATE "C")::INT FROM generate_series(1, 10));
HI hackers, I found it could cause a crash when executing sql statement: `CREATE VIEW v1(c1) AS (SELECT ('4' COLLATE "C")::INT FROM generate_series(1, 10)); ` in postgres 13.2 release. The crash happens at view.c:89 and I did some analysis: ``` ColumnDef *def = makeColumnDef(tle->resname, exprType((Node *) tle->expr), exprTypmod((Node *) tle->expr), exprCollation((Node *) tle->expr)); /* * It's possible that the column is of a collatable type but the * collation could not be resolved, so double-check. */ // Here is the analysis: //example : ('4' COLLATE "C")::INT //exprCollation((Node *) tle->expr) is the oid of collate "COLLATE 'C'" so def->collOid is valid //exprType((Node *) tle->expr)) is 23 which is the oid of type int4. //We know that int4 is not collatable by calling type_is_collatable() if (type_is_collatable(exprType((Node *) tle->expr))) { if (!OidIsValid(def->collOid)) ereport(ERROR, (errcode(ERRCODE_INDETERMINATE_COLLATION), errmsg("could not determine which collation to use for view column \"%s\"", def->colname), errhint("Use the COLLATE clause to set the collation explicitly."))); } else // So we are here! int is not collatable and def->collOid is valid. Assert(!OidIsValid(def->collOid)); ``` I am not sure whether to fix this bug in function DefineVirtualRelation or to fix this bug in parse tree and analyze procedure, so maybe we can discuss. Best Regard! Yulin PEI
回复: Core dump happens when execute sql CREATE VIEW v1(c1) AS (SELECT ('4' COLLATE "C")::INT FROM generate_series(1, 10));
After reading the code and the patch, I think the patch is good. If the type is non-collatable, we do not add a CollateExpr node as a 'parent' node to the coerced node. 发件人: Tom Lane 发送时间: 2021年4月13日 0:59 收件人: Yulin PEI 抄送: pgsql-hackers@lists.postgresql.org 主题: Re: Core dump happens when execute sql CREATE VIEW v1(c1) AS (SELECT ('4' COLLATE "C")::INT FROM generate_series(1, 10)); Yulin PEI writes: > I found it could cause a crash when executing sql statement: `CREATE VIEW > v1(c1) AS (SELECT ('4' COLLATE "C")::INT FROM generate_series(1, 10)); ` in > postgres 13.2 release. Nice catch. I don't think the code in DefineVirtualRelation is wrong: exprCollation shouldn't report any collation for an expression of a non-collatable type. Rather the problem is with an old kluge in coerce_type(), which will push a type coercion underneath a CollateExpr ... without any mind for the possibility that the coercion result isn't collatable. So the right fix is more or less the attached. regards, tom lane
回复: Core dump happens when execute sql CREATE VIEW v1(c1) AS (SELECT ('4' COLLATE "C")::INT FROM generate_series(1, 10));
I think it is better to add this test case to regress. 发件人: Tom Lane 发送时间: 2021年4月13日 0:59 收件人: Yulin PEI 抄送: pgsql-hackers@lists.postgresql.org 主题: Re: Core dump happens when execute sql CREATE VIEW v1(c1) AS (SELECT ('4' COLLATE "C")::INT FROM generate_series(1, 10)); Yulin PEI writes: > I found it could cause a crash when executing sql statement: `CREATE VIEW > v1(c1) AS (SELECT ('4' COLLATE "C")::INT FROM generate_series(1, 10)); ` in > postgres 13.2 release. Nice catch. I don't think the code in DefineVirtualRelation is wrong: exprCollation shouldn't report any collation for an expression of a non-collatable type. Rather the problem is with an old kluge in coerce_type(), which will push a type coercion underneath a CollateExpr ... without any mind for the possibility that the coercion result isn't collatable. So the right fix is more or less the attached. regards, tom lane 0001-add-regress.patch Description: 0001-add-regress.patch
回复: Core dump happens when execute sql CREATE VIEW v1(c1) AS (SELECT ('4' COLLATE "C")::INT FROM generate_series(1, 10));
After several tests, I found that this patch do not fix the bug well. I think we should use the same logic to treat parent CollateExpr and child CollateExpr. In your patch, if the parent node is CollateExpr and the target type is non-collatable, we coerce CollateExpr->arg. If the child node is CollateExpr and the target type is non-collatable, we just skip. Some types can be casted to another type even if type_is_collatable returns false. Like bytea to int (It depends on the content of the string). If we simply skip, bytea will never be casted to int even if the content is all digits. So the attachment is my patch and it works well as far as I tested. 发件人: Tom Lane 发送时间: 2021年4月13日 0:59 收件人: Yulin PEI 抄送: pgsql-hackers@lists.postgresql.org 主题: Re: Core dump happens when execute sql CREATE VIEW v1(c1) AS (SELECT ('4' COLLATE "C")::INT FROM generate_series(1, 10)); Yulin PEI writes: > I found it could cause a crash when executing sql statement: `CREATE VIEW > v1(c1) AS (SELECT ('4' COLLATE "C")::INT FROM generate_series(1, 10)); ` in > postgres 13.2 release. Nice catch. I don't think the code in DefineVirtualRelation is wrong: exprCollation shouldn't report any collation for an expression of a non-collatable type. Rather the problem is with an old kluge in coerce_type(), which will push a type coercion underneath a CollateExpr ... without any mind for the possibility that the coercion result isn't collatable. So the right fix is more or less the attached. regards, tom lane fix-collation-coercion.patch Description: fix-collation-coercion.patch
回覆: 回复: Core dump happens when execute sql CREATE VIEW v1(c1) AS (SELECT ('4' COLLATE "C")::INT FROM generate_series(1, 10));
Consider the SQL statement 'SELECT (('1' COLLATE "C") ||(B'1'));' . Intuitively, the result will be '11' and the result is '11' in pg 13.2 release as well. The function stack is make_fn_arguments -> coerce_type, which means that the param "Node *node" of function coerce_type could be a CollateExpr Node. Let's look at your patch: ``` // node is ('1' COLLATE "C") // targetType is varbit and it is non-collatable if (IsA(node, CollateExpr) && type_is_collatable(targetTypeId)) { // we will not reach here. CollateExpr *coll = (CollateExpr *) node; CollateExpr *newcoll = makeNode(CollateExpr); // An error will be generated. "failed to find conversion function" } ``` So I suggest: ``` // node is ('1' COLLATE "C") if (IsA(node, CollateExpr)) { CollateExpr *coll = (CollateExpr *) node; CollateExpr *newcoll = makeNode(CollateExpr); //targetType is varbit and it is non-collatable if (!type_is_collatable(targetTypeId)) { // try to convert '1'(string) to varbit // We do not make a new CollateExpr here, but don't forget to coerce coll->arg. return coerce_type(pstate, (Node *) coll->arg, inputTypeId, targetTypeId, targetTypeMod, ccontext, cformat, location); } ... } ``` 寄件者: Tom Lane 寄件日期: 2021年4月19日 1:46 收件者: Yulin PEI 副本: pgsql-hackers@lists.postgresql.org 主旨: Re: 回复: Core dump happens when execute sql CREATE VIEW v1(c1) AS (SELECT ('4' COLLATE "C")::INT FROM generate_series(1, 10)); Yulin PEI writes: > After several tests, I found that this patch do not fix the bug well. What do you think is wrong with it? > So the attachment is my patch and it works well as far as I tested. This seems equivalent to the already-committed patch [1] except that it wastes a makeNode call in the coerce-to-uncollatable-type case. regards, tom lane [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=c402b02b9fb53aee2a26876de90a8f95f9a9be92