At Sat, 12 Dec 2020 18:25:57 +0900, Etsuro Fujita <etsuro.fuj...@gmail.com> wrote in > On Fri, Nov 20, 2020 at 3:51 PM Kyotaro Horiguchi > <horikyota....@gmail.com> wrote: > > At Tue, 17 Nov 2020 18:56:02 +0900, Etsuro Fujita <etsuro.fuj...@gmail.com> > > wrote in > > > * In Robert's patch [1] (and Horiguchi-san's, which was created based > > > on Robert's), ExecAppend() was modified to retrieve tuples from > > > async-aware children *before* the tuples will be needed, but I don't > > > > The "retrieve" means the move of a tuple from fdw to executor > > (ExecAppend or ExecAsync) layer? > > Yes, that's what I mean. > > > > think that's really a good idea, because the query might complete > > > before returning the tuples. So I modified that function so that a > > > > I'm not sure how it matters. Anyway the fdw holds up to tens of tuples > > before the executor actually make requests for them. The reason for > > the early fetching is letting fdw send the next request as early as > > possible. (However, I didn't measure the effect of the > > nodeAppend-level prefetching.) > > I agree that that would lead to an improved efficiency in some cases, > but I still think that that would be useless in some other cases like > SELECT * FROM sharded_table LIMIT 1. Also, I think the situation > would get worse if we support Append on top of joins or aggregates > over ForeignScans, which would be more expensive to perform than these > ForeignScans.
I'm not sure which gain we weigh on, but if doing "LIMIT 1" on Append for many times is more common than fetching all or "LIMIT <many multiples of fetch_size>", that discussion would be convincing... Is it really the case? Since core knows of async execution, I think if we disable async exection, it should be decided by planner, which knows how many tuples are expected to be returned. On the other hand the most apparent criteria for whether to enable async or not would be fetch_size, which is fdw's secret. Thus we could rename ForeignPathAsyncCapable() to something like ForeignPathRunAsync(), true from which means "the FDW is telling that it can run async and is thinking that the given number of tuples will be fetched at once.". > If we do prefetching, I think it would be better that it’s the > responsibility of the FDW to do prefetching, and I think that that > could be done by letting the FDW to start another data fetch, > independently of the core, in the ForeignAsyncNotify callback routine, FDW does prefetching (if it means sending request to remote) in my patch, so I agree to that. It suspect that you were intended to say the opposite. The core (ExecAppendAsyncGetNext()) controls prefetching in your patch. > which I revived from Robert's original patch. I think that that would > be more efficient, because the FDW would no longer need to wait until > all buffered tuples are returned to the core. In the WIP patch, I I don't understand. My patch sends a prefetch-query as soon as all the tuples of the last remote-request is stored into FDW storage. The reason for removing ExecAsyncNotify() was it is just redundant as far as concerning Append asynchrony. But I particulary oppose to revive the function. > only allowed the callback routine to put the corresponding ForeignScan > node into a state where it’s either ready for a new request or needing > a callback for another data fetch, but I think we could probably relax > the restriction so that the ForeignScan node can be put into another > state where it’s ready for a new request while needing a callback for > the prefetch. I don't understand this, too. ExecAsyncNotify() doesn't touch any of the bitmaps, as_needrequest, callback_pending nor as_asyncpending in your patch. Am I looking into something wrong? I'm looking async-wip-2020-11-17.patch. (By the way, it is one of those that make the code hard to read to me that the "callback" means "calling an API function". I think none of them (ExecAsyncBegin, ExecAsyncRequest, ExecAsyncNotify) are a "callback".) > > > tuple is retrieved from an async-aware child *when* it is needed, like > > > Thomas' patch. I used FDW callback functions proposed by Robert, but > > > introduced another FDW callback function ForeignAsyncBegin() for each > > > async-aware child to start an asynchronous data fetch at the first > > > call to ExecAppend() after ExecInitAppend() or ExecReScanAppend(). > > > > Even though the terminology is not officially determined, in the past > > discussions "async-aware" meant "can handle async-capable subnodes" > > and "async-capable" is used as "can run asynchronously". > > Thanks for the explanation! > > > Likewise you > > seem to have changed the meaning of as_needrequest from "subnodes that > > needs to request for the next tuple" to "subnodes that already have > > got query-send request and waiting for the result to come". > > No. I think I might slightly change the original definition of > as_needrequest, though. Mmm, sorry. I may have been perplexed by the comment below, which is also added to ExecAsyncNotify(). ExecAppendAsyncRequest: > Assert(bms_is_member(i, node->as_needrequest)); > > /* Perform the actual callback. */ > ExecAsyncRequest(areq); > if (ExecAppendAsyncResponse(areq)) > { > Assert(!TupIsNull(areq->result)); > *result = areq->result; > return true; > } > > I would > > argue to use the words and variables (names) in such meanings. > > I think the word "aware" has a broader meaning, so the naming as > proposed would be OK IMO. But actually, I don't have any strong > opinion about that, so I'll change it as explained. Thanks. > > > * For EvalPlanQual, I modified the patch so that async-aware children > > > are treated as if they were synchronous when executing EvalPlanQual. > > > > Doesn't async execution accelerate the epq-fetching? Or does > > async-execution goes into trouble in the EPQ path? > > The reason why I disabled async execution when executing EPQ is to > avoid sending asynchronous queries to the remote sides, which would be > useless, because scan tuples for an EPQ recheck are obtained in a > dedicated way. If EPQ is performed onto Append, I think it should gain from asynchronous execution since it is going to fetch *a* tuple from several partitions or children. I believe EPQ doesn't contain Append in major cases, though. (Or I didn't come up with the steps for the case to happen...) > > > * In Robert's patch, all async-aware children below Append nodes in > > > the query waiting for events to occur were managed by a single EState, > > > but I modified the patch so that such children are managed by each > > > Append node, like Horiguchi-san's patch and Thomas'. > > > > Managing in Estate give advantage for push-up style executor but > > managing in node_state is simpler. > > What do you mean by "push-up style executor"? The reverse of the volcano-style executor, which enters from the topmost node and down to the bottom. In the "push-up stule executor", the bottom-most nodes fires by a certain trigger then every intermediate nodes throws up the result to the parent until reaching the topmost node. regards. -- Kyotaro Horiguchi NTT Open Source Software Center