Horiguchi-san, On 2017/01/31 12:45, Kyotaro HORIGUCHI wrote: > I noticed that this patch is conflicting with 665d1fa (Logical > replication) so I rebased this. Only executor/Makefile > conflicted.
With the latest set of patches, I observe a crash due to an Assert failure: #0 0x0000003969632625 in *__GI_raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x0000003969633e05 in *__GI_abort () at abort.c:92 #2 0x000000000098b22c in ExceptionalCondition (conditionName=0xb30e02 "!(added)", errorType=0xb30d77 "FailedAssertion", fileName=0xb30d50 "execAsync.c", lineNumber=345) at assert.c:54 #3 0x00000000006883ed in ExecAsyncEventWait (estate=0x13c01b8, timeout=-1) at execAsync.c:345 #4 0x0000000000687ed5 in ExecAsyncEventLoop (estate=0x13c01b8, requestor=0x13c1640, timeout=-1) at execAsync.c:186 #5 0x00000000006a5170 in ExecAppend (node=0x13c1640) at nodeAppend.c:257 #6 0x0000000000692b9b in ExecProcNode (node=0x13c1640) at execProcnode.c:411 #7 0x00000000006bf4d7 in ExecResult (node=0x13c1170) at nodeResult.c:113 #8 0x0000000000692b5c in ExecProcNode (node=0x13c1170) at execProcnode.c:399 #9 0x00000000006a596b in fetch_input_tuple (aggstate=0x13c06a0) at nodeAgg.c:587 #10 0x00000000006a8530 in agg_fill_hash_table (aggstate=0x13c06a0) at nodeAgg.c:2272 #11 0x00000000006a7e76 in ExecAgg (node=0x13c06a0) at nodeAgg.c:1910 #12 0x0000000000692d69 in ExecProcNode (node=0x13c06a0) at execProcnode.c:514 #13 0x00000000006c1a42 in ExecSort (node=0x13c03d0) at nodeSort.c:103 #14 0x0000000000692d3f in ExecProcNode (node=0x13c03d0) at execProcnode.c:506 #15 0x000000000068e733 in ExecutePlan (estate=0x13c01b8, planstate=0x13c03d0, use_parallel_mode=0 '\000', operation=CMD_SELECT, sendTuples=1 '\001', numberTuples=0, direction=ForwardScanDirection, dest=0x7fa368ee1da8) at execMain.c:1609 #16 0x000000000068c751 in standard_ExecutorRun (queryDesc=0x135c568, direction=ForwardScanDirection, count=0) at execMain.c:341 #17 0x000000000068c5dc in ExecutorRun (queryDesc=0x135c568, <snip> I was running a query whose plan looked like: explain (costs off) select tableoid::regclass, a, min(b), max(b) from ptab group by 1,2 order by 1; QUERY PLAN ------------------------------------------------------ Sort Sort Key: ((ptab.tableoid)::regclass) -> HashAggregate Group Key: (ptab.tableoid)::regclass, ptab.a -> Result -> Append -> Foreign Scan on ptab_00001 -> Foreign Scan on ptab_00002 -> Foreign Scan on ptab_00003 -> Foreign Scan on ptab_00004 -> Foreign Scan on ptab_00005 -> Foreign Scan on ptab_00006 -> Foreign Scan on ptab_00007 -> Foreign Scan on ptab_00008 -> Foreign Scan on ptab_00009 -> Foreign Scan on ptab_00010 <snip> The snipped part contains Foreign Scans on 90 more foreign partitions (in fact, I could see the crash even with 10 foreign table partitions for the same query). There is a crash in one more case, which seems related to how WaitEventSet objects are manipulated during resource-owner-mediated cleanup of a failed query, such as after the FDW returned an error like below: ERROR: relation "public.ptab_00010" does not exist CONTEXT: Remote SQL command: SELECT a, b FROM public.ptab_00010 The backtrace in this looks like below: Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00000000009c4c35 in ResourceArrayRemove (resarr=0x7f7f7f7f7f7f80bf, value=20645152) at resowner.c:301 301 lastidx = resarr->lastidx; (gdb) (gdb) bt #0 0x00000000009c4c35 in ResourceArrayRemove (resarr=0x7f7f7f7f7f7f80bf, value=20645152) at resowner.c:301 #1 0x00000000009c6578 in ResourceOwnerForgetWES (owner=0x7f7f7f7f7f7f7f7f, events=0x13b0520) at resowner.c:1317 #2 0x0000000000806098 in FreeWaitEventSet (set=0x13b0520) at latch.c:600 #3 0x00000000009c5338 in ResourceOwnerReleaseInternal (owner=0x12de768, phase=RESOURCE_RELEASE_BEFORE_LOCKS, isCommit=0 '\000', isTopLevel=1 '\001') at resowner.c:566 #4 0x00000000009c5155 in ResourceOwnerRelease (owner=0x12de768, phase=RESOURCE_RELEASE_BEFORE_LOCKS, isCommit=0 '\000', isTopLevel=1 '\001') at resowner.c:485 #5 0x0000000000524172 in AbortTransaction () at xact.c:2588 #6 0x0000000000524854 in AbortCurrentTransaction () at xact.c:3016 #7 0x0000000000836aa6 in PostgresMain (argc=1, argv=0x12d7b08, dbname=0x12d7968 "postgres", username=0x12d7948 "amit") at postgres.c:3860 #8 0x00000000007a49d8 in BackendRun (port=0x12cdf00) at postmaster.c:4310 #9 0x00000000007a4151 in BackendStartup (port=0x12cdf00) at postmaster.c:3982 #10 0x00000000007a0885 in ServerLoop () at postmaster.c:1722 #11 0x000000000079febf in PostmasterMain (argc=3, argv=0x12aacc0) at postmaster.c:1330 #12 0x00000000006e7549 in main (argc=3, argv=0x12aacc0) at main.c:228 There is a segfault when accessing the events variable, whose members seem to be pfreed: (gdb) f 2 #2 0x0000000000806098 in FreeWaitEventSet (set=0x13b0520) at latch.c:600 600 ResourceOwnerForgetWES(set->resowner, set); (gdb) p *set $5 = { nevents = 2139062143, nevents_space = 2139062143, resowner = 0x7f7f7f7f7f7f7f7f, events = 0x7f7f7f7f7f7f7f7f, latch = 0x7f7f7f7f7f7f7f7f, latch_pos = 2139062143, epoll_fd = 2139062143, epoll_ret_events = 0x7f7f7f7f7f7f7f7f } Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers