On Fri, Dec 11, 2015 at 12:59 AM, Robert Haas <robertmh...@gmail.com> wrote: > On Thu, Dec 10, 2015 at 9:49 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: >> Robert Haas <robertmh...@gmail.com> writes: >>> Oh, please, no. Gosh, this is supposed to be a lightweight facility! >>> Just have a chunk of shared memory and write the data in there. If >>> you try to feed this through the stats collector you're going to >>> increase the overhead by 100x or more, and there's no benefit. We've >>> got to do relation stats that way because there's no a priori bound on >>> the number of relations, so we can't just preallocate enough shared >>> memory for all of them. But there's no similar restriction here: the >>> number of backends IS fixed at startup time. As long as we limit the >>> amount of progress information that a backend can supply to some fixed >>> length, which IMHO we definitely should, there's no need to add the >>> expense of funneling this through the stats collector. >> >> I agree with this, and I'd further add that if we don't have a >> fixed-length progress state, we've overdesigned the facility entirely. >> People won't be able to make sense of anything that acts much more >> complicated than "0% .. 100% done". So you need to find a way of >> approximating progress of a given command in terms more or less >> like that, even if it's a pretty crude approximation.
Check. My opinion is based on the fact that most of the backends are not going to use the progress facility at all, and we actually do not need a high level of precision for VACUUM reports: we could simply send messages with a certain delay between two messages. And it looks like a waste to allocate that for all the backends. But I am going to withdraw here, two committers is by far too much pressure. > That I don't agree with. Even for something like VACUUM, it's pretty > hard to approximate overall progress - because, for example, normally > we'll only have 1 index scan per index, but we might have multiple > index scans or none if maintenance_work_mem is too small or if there > aren't any dead tuples after all. I don't want our progress reporting > facility to end up with this reputation: > > https://xkcd.com/612/ This brings memories. Who has never faced that... > This point has already been discussed rather extensively upthread, but > to reiterate, I think it's much better to report slightly more > detailed information and let the user figure out what to do with it. > For example, for a VACUUM, I think we should report something like > this: > 1. The number of heap pages scanned thus far. > 2. The number of dead tuples found thus far. > 3. The number of dead tuples we can store before we run out of > maintenance_work_mem. > 4. The number of index pages processed by the current index vac cycle > (or a sentinel value if none is in progress). > 5. The number of heap pages for which the "second heap pass" has been > completed > Now, if the user wants to flatten this out to a progress meter, they > can write an SQL expression which does that easily enough, folding the > sizes of the table and its indices and whatever assumptions they want > to make about what will happen down the road. If we all agree on how > that should be done, it can even ship as a built-in view. But I > *don't* think we should build those assumptions into the core progress > reporting facility. For one thing, that would make updating the > progress meter considerably more expensive - you'd have to recompute a > new completion percentage instead of just saying "heap pages processed > went up by one". This stuff I agree. Having global counters, and have user compute any kind of percentage or progress bar is definitely the way to go. > For another thing, there are definitely going to be > some people that want the detailed information - and I can practically > guarantee that if we don't make it available, at least one person will > write a tool that tries to reverse-engineer the detailed progress > information from whatever we do report. OK, so this justifies the fact of having detailed information, but does it justify the fact of having precise and accurate data? ISTM that having detailed information and precise information are two different things. The level of details is defined depending on how verbose we want the information to be, and the list you are giving would fulfill this requirement nicely for VACUUM. The level of precision/accuracy at which this information is provided though depends at which frequency we want to send this information. For long-running VACUUM it does not seem necessary to update the fields of the progress tracker each time a counter needs to be incremented. CLUSTER has been mentioned as well as a potential target for the progress facility, but it seems that it enters as well in the category of things that would need to be reported on a slower frequency pace than "each-time-a-counter-is-incremented". My impression is just based on the needs of VACUUM and CLUSTER. Perhaps I am lacking imagination regarding the potential use cases of the progress facility though in cases where we'd want to provide extremely precise progress information :) It just seems to me that this is not a requirement for VACUUM or CLUSTER. That's all. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers