Re: [HACKERS] ANALYZE command progress checker

Ashutosh Sharma Sun, 19 Mar 2017 01:57:01 -0700

Hi,

I didn't find any major issues with the patch. It works as expected.
However, I have few minor comments that I would like to share,


+     <entry>
+       Total number of sample rows. The sample it reads is taken randomly.
+       Its size depends on the <varname>default_statistics_target</>
parameter value.
+     </entry>

1) I think it would be better if you could specify reference link to
the guc variable 'default_statistics_target'. Something like this,

<xref linkend='guc-default-statistics-target'>.

If you go through monitoring.sgml, you would find that there is
reference link to all guc variables being used.

2) I feel that the 'computing_statistics' phase could have been
splitted into 'computing_column_stats', 'computing_index_stats'....
Please let me know your thoughts on this.


+   certain commands during command execution.  Currently, the command
+   which supports progress reporting is <command>VACUUM</> and
<command>ANALYZE</>.  This may be
    expanded in the future.

3) I think above needs to be rephrased. Something like...Currently,
the supported progress reporting commands are 'VACUUM' and
'ANALYZE'.

Moreover, I also ran your patch on Windows platform and didn't find
any issues. Thanks.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

On Sat, Mar 18, 2017 at 5:30 PM, Ashutosh Sharma <ashu.coe...@gmail.com> wrote:
> Hi Vinayak,
>
> Here are couple of review comments that may need your attention.
>
> 1. Firstly, I am seeing some trailing whitespace errors when trying to
> apply your v3 patch using git apply command.
>
> [ashu@localhost postgresql]$ git apply pg_stat_progress_analyze_v3.patch
> pg_stat_progress_analyze_v3.patch:155: trailing whitespace.
> CREATE VIEW pg_stat_progress_analyze AS
> pg_stat_progress_analyze_v3.patch:156: trailing whitespace.
>     SELECT
> pg_stat_progress_analyze_v3.patch:157: trailing whitespace.
>         S.pid AS pid, S.datid AS datid, D.datname AS datname,
> pg_stat_progress_analyze_v3.patch:158: trailing whitespace.
>         S.relid AS relid,
> pg_stat_progress_analyze_v3.patch:159: trailing whitespace.
>         CASE S.param1 WHEN 0 THEN 'initializing'
> error: patch failed: doc/src/sgml/monitoring.sgml:521
>
> 2) The test-case 'rules' is failing.  I think it's failing because in
> rules.sql 'ORDERBY viewname' is used which will put
> 'pg_stat_progress_analyze' ahead of 'pg_stat_progress_vacuum' in the
> list of catalog views. You may need to correct rules.out file
> accordingly. This is what i could see in rules.sql,
>
> SELECT viewname, definition FROM pg_views WHERE schemaname <>
> 'information_schema' ORDER BY viewname;
>
> I am still reviewing your patch and doing some testing. Will update if
> i find any issues. Thanks.
>
> --
> With Regards,
> Ashutosh Sharma
> EnterpriseDB:http://www.enterprisedb.com
>
> On Fri, Mar 17, 2017 at 3:16 PM, vinayak
> <pokale_vinayak...@lab.ntt.co.jp> wrote:
>>
>> On 2017/03/17 10:38, Robert Haas wrote:
>>>
>>> On Fri, Mar 10, 2017 at 2:46 AM, vinayak
>>> <pokale_vinayak...@lab.ntt.co.jp> wrote:
>>>>
>>>> Thank you for reviewing the patch.
>>>>
>>>> The attached patch incorporated Michael and Amit comments also.
>>>
>>> I reviewed this tonight.
>>
>> Thank you for reviewing the patch.
>>>
>>> +        /* Report compute index stats phase */
>>> +        pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
>>> +
>>> PROGRESS_ANALYZE_PHASE_COMPUTE_INDEX_STATS);
>>>
>>> Hmm, is there really any point to this?  And is it even accurate?  It
>>> doesn't look to me like we are computing any index statistics here;
>>> we're only allocating a few in-memory data structures here, I think,
>>> which is not a statistics computation and probably doesn't take any
>>> significant time.
>>>
>>> +        /* Report compute heap stats phase */
>>> +        pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
>>> +
>>> PROGRESS_ANALYZE_PHASE_COMPUTE_HEAP_STATS);
>>>
>>> The phase you label as computing heap statistics also includes the
>>> computation of index statistics.  I wouldn't bother separating the
>>> computation of heap and index statistics; I'd just have a "compute
>>> statistics" phase that begins right after the comment that starts
>>> "Compute the statistics."
>>
>> Understood. Fixed in the attached patch.
>>>
>>>
>>> +    /* Report that we are now doing index cleanup */
>>> +    pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
>>> +                                PROGRESS_ANALYZE_PHASE_INDEX_CLEANUP);
>>>
>>> OK, this doesn't make any sense either.  We are not cleaning up the
>>> indexes here.  We are just closing them and releasing resources.  I
>>> don't see why you need this phase at all; it can't last more than some
>>> tiny fraction of a second.  It seems like you're trying to copy the
>>> exact same phases that exist for vacuum instead of thinking about what
>>> makes sense for ANALYZE.
>>
>> Understood. I have removed this phase.
>>>
>>>
>>> +        /* Report number of heap blocks scaned so far*/
>>> +        pgstat_progress_update_param(PROGRESS_ANALYZE_HEAP_BLKS_SCANNED,
>>> targblock);
>>>
>>> While I don't think this is necessarily a bad thing to report, I find
>>> it very surprising that you're not reporting what seems to me like the
>>> most useful thing here - namely, the number of blocks that will be
>>> sampled in total and the number of those that you've selected.
>>> Instead, you're just reporting the length of the relation and the
>>> last-sampled block, which is a less-accurate guide to total progress.
>>> There are enough counters to report both things, so maybe we should.
>>>
>>> +    /* Report total number of sample rows*/
>>> +    pgstat_progress_update_param(PROGRESS_ANALYZE_TOTAL_SAMPLE_ROWS,
>>> numrows);
>>>
>>> As an alternative to the previous paragraph, yet another thing you
>>> could report is number of rows sampled out of total rows to be
>>> sampled.  But this isn't the way to do it: you are reporting the
>>> number of rows you sampled only once you've finished sampling.  The
>>> point of progress reporting is to report progress -- that is, to
>>> report this value as it's updated, not just to report it when ANALYZE
>>> is practically done and the value has reached its maximum.
>>
>> Understood.
>> I have reported number of rows sampled out of total rows to be sampled.
>> I have not reported the number of blocks that will be sampled in total.
>> So currect pg_stat_progress_analyze view looks like:
>>
>> =# \d+ pg_stat_progress_analyze
>>                          View "pg_catalog.pg_stat_progress_analyze"
>>          Column         |  Type   | Collation | Nullable | Default | Storage
>> | Descripti
>> on
>> ------------------------+---------+-----------+----------+---------+----------+----------
>> ---
>>  pid                    | integer |           |          |         | plain
>> |
>>  datid                  | oid     |           |          |         | plain
>> |
>>  datname                | name    |           |          |         | plain
>> |
>>  relid                  | oid     |           |          |         | plain
>> |
>>  phase                  | text    |           |          |         |
>> extended |
>>  num_target_sample_rows | bigint  |           |          |         | plain
>> |
>>  num_rows_sampled       | bigint  |           |          |         | plain
>> |
>> View definition:
>>  SELECT s.pid,
>>     s.datid,
>>     d.datname,
>>     s.relid,
>>         CASE s.param1
>>             WHEN 0 THEN 'initializing'::text
>>             WHEN 1 THEN 'collecting sample rows'::text
>>             WHEN 2 THEN 'computing statistics'::text
>>             ELSE NULL::text
>>         END AS phase,
>>     s.param2 AS num_target_sample_rows,
>>     s.param3 AS num_rows_sampled
>>    FROM pg_stat_get_progress_info('ANALYZE'::text) s(pid, datid, relid,
>> param1, param2, p
>> aram3, param4, param5, param6, param7, param8, param9, param10)
>>      LEFT JOIN pg_database d ON s.datid = d.oid;
>>
>> Comment?
>>>
>>> The documentation for this patch isn't very good, either.  You forgot
>>> to update the part of the documentation that says that VACUUM is the
>>> only command that currently supports progress reporting, and the
>>> descriptions of the phases and progress counters are brief and not
>>> entirely accurate - e.g. heap_blks_scanned is not actually the number
>>> of heap blocks scanned, because we are sampling, not reading all the
>>> blocks.
>>
>> Understood. I have updated the documentation.
>> I will also try to improve documentation.
>>>
>>> When adding calls to the progress reporting functions, please consider
>>> whether a blank line should be added before or after the new code, or
>>> both, or neither.  I'd say some blank lines are needed in a few places
>>> where you didn't add them.
>>
>> +1. I have added blank lines in a few places.
>>
>> Regards,
>> Vinayak Pokale
>>
>>
>> --
>> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-hackers
>>


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ANALYZE command progress checker

Reply via email to