On Tue, Feb 27, 2018 at 6:21 AM, Rajkumar Raghuwanshi <rajkumar.raghuwan...@enterprisedb.com> wrote: > I have applied 0001 on pg-head, and while playing with regression tests, it > crashed with below test case. > > postgres=# SET min_parallel_table_scan_size=0; > SET > postgres=# SELECT * FROM information_schema.foreign_data_wrapper_options > ORDER BY 1, 2, 3; > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > The connection to the server was lost. Attempting reset: Failed.
Hmm, nice catch. I think part of the problem here is that commit 69f4b9c85f168ae006929eec44fc44d569e846b9, wherein Andres introduced the ProjectSet node, didn't really do the right thing in terms of testing parallel-safety. Before that commit, is_parallel_safe(root, (Node *) scanjoin_target->exprs)) was really testing whether the tlist produced during the scan/join phase was parallel-safe. However, after that commit, scanjoin_target->exprs wasn't really the final target for the scan/join phase any more; instead, it was the first of possibly several targets computed by split_pathtarget_at_srfs(). Really, the right thing to do is to test the *last* element in that list for parallel-safety, but as the code stands we end up testing the *first* element. So, if there's a parallel-restricted item in the target list (this query ends up putting a CoerceToDomain in the target list, which we currently consider parallel-restricted), it looks we can nevertheless end up trying to project it in what is supposed to be a partial path. There are a large number of cases where this doesn't end up mattering because the next upper_rel created will not get marked consider_parallel because its target list will also contain the same parallel-restricted construct, and therefore the partial paths generated for the scan/join rel will never get used -- except to generate Gather/Gather Merge paths, which has already happened; but that step didn't know about the rest of the scan/join targets either, so it won't have used them. However, both create_distinct_paths() and the code in grouping_planner that populates final_rel assume that they don't need to retest the target for parallel-safety because no projection is done at those levels; they just inherit the parallel-safety marking of the input rel, so in those cases if the input rel's marking is wrong the result is populated upward. There's another way final_rel->consider_parallel can be wrong, too: if the FROM-list is empty, then we create a join rel and set its consider_parallel flag without regard to the parallel-safety of the target list. There are comments in query_planner() says that this will be dealt with "later", but this seems not to be true. (Before Tom's commit da1c91631e3577ea5818f855ebb5bd206d559006, the comments simply ignored the question of whether a check was needed here, but Tom seems to have inserted an incorrect justification for the already-wrong code.) I'm not sure to what degree, if at all, any of these problems are visible given that we don't use final_rel->consider_parallel for much of anything. Certainly, it gets much easier to trigger a problem with 0001 applied, as the test case shows. But I'm not entirely convinced that there's no problem even without that. It seems like every upper rel that is setting its consider_parallel flag based on the first element of some list of targets rather than the last is potentially vulnerable to ending up with the wrong answer, and I'm afraid that might have some adverse consequence that I haven't quite pinned down yet. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company