On 1/4/19 8:08 AM, Surafel Temesgen wrote: > > > > On Tue, Jan 1, 2019 at 10:08 PM Tomas Vondra > <tomas.von...@2ndquadrant.com <mailto:tomas.von...@2ndquadrant.com>> wrote: > > The execution part of the patch seems to be working correctly, but I > think there's an improvement - we don't need to execute the outer plan > to completion before emitting the first row. For example, let's say the > outer plan produces 10000 rows in total and we're supposed to return the > first 1% of those rows. We can emit the first row after fetching the > first 100 rows, we don't have to wait for fetching all 10k rows. > > > my other concern is in this case we are going to determine limit and > safe row to return on the fly > which have additional computation and i fair that it will not perform > better on most of the case >
I very much doubt recomputing the number of rows to return will be measurable at all, considering those are int64 values. It's likely negligible compared to all the other overhead (reading rows from the subplan, etc.). And even if it turns out expensive, there are ways to make it less expensive (e.g. not computing it every time, but only when it actually can increment the value - for example with 1% rows to return, it's enough to recompute it every 100 rows). It also very much depends on which use cases you consider important. Clearly, you're only thinking about the case that actually requires fetching everything. But there's also the use case where you have another limit node somewhere above. So let's do a simple "cost" analysis of these two use cases: 1) case #1 (no LIMIT above, requires fetching everything) Let's assume the "cost" of the current approach (fetching everything in advance, computing node->count once) is the baseline here. With the incremental approach, there likely is some extra overhead. Let's overshoot it a bit - with 5% overhead, it's 1.05 of the baseline. 2) case #2 (LIMIT/EXISTS above, only needs to fetch the first few rows, perhaps just a single one) Here, we take the incremental approach as the baseline. With the "fetch everything first" approach we need to evaluate the whole subplan and fetch all the results, instead of just some small fraction. But the exact ratio is pretty much arbitrary - I can easily construct queries that need to fetch 100x or 100000x more data from the subplan. So essentially using the "incremental fetch" approach means we may pay 5% overhead in cases where we need to fetch everything, while the "fetch everything first" approach means we'll add arbitrary amount of overhead to cases where we don't need to fetch everything. Of course, you one way to deal with this would be to argue the cases benefiting from incremental approach are very rare / non-existent. Perhaps that's the case, although I don't see why it would be. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services