On Mon, Nov 20, 2017 at 12:05 PM, Antonin Houska <a...@cybertec.at> wrote: > Robert Haas <robertmh...@gmail.com> wrote: >> On Wed, Aug 30, 2017 at 10:12 PM, Tatsuro Yamada >> <yamada.tats...@lab.ntt.co.jp> wrote: >> > 1. scanning heap >> > 2. sort tuples >> >> These two phases overlap, though. I believe progress reporting for >> sorts is really hard. In the simple case where the data fits in >> work_mem, none of the work of the sort gets done until all the data is >> read. Once you switch to an external sort, you're writing batch >> files, so a lot of the work is now being done during data loading. >> But as the number of batch files grows, the final merge at the end >> becomes an increasingly noticeable part of the cost, and eventually >> you end up needing multiple merge passes. I think we need some smart >> way to report on sorts so that we can tell how much of the work has >> really been done, but I don't know how to do it. > > Whatever complexity is hidden in the sort, cost_sort() should have taken it > into consideration when called via plan_cluster_use_sort(). Thus I think that > once we have both startup and total cost, the current progress of the sort > stage can be estimated from the current number of input and output > rows. Please remind me if my proposal appears to be too simplistic.
I think it is far too simplistic. If the sort is being fed by a sequential scan, reporting the number of blocks scanned so far as compared to the total number that will be scanned would be a fine way of reporting on the progress of the sequential scan -- and it's better to use blocks, which we know for sure about, than rows, at which we can only guess. But that's the *scan* progress, not the *sort* progress. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company