[PG13] Planning (time + buffers) data structure in explain plan (format text)

2020-08-07 Thread Pierre Giraud
Hi all,

As far as I understand, in the upcoming version 13, information about
buffers used during planning is now available in the explain plan.

[…]
 Planning Time: 0.203 ms
   Buffers: shared hit=14
[…]

In the JSON format, the data structure is a bit different:

[…]
 "Planning": {
   "Planning Time": 0.533,
   "Shared Hit Blocks": 14,
   "Shared Read Blocks": 0,
   "Shared Dirtied Blocks": 0,
   "Shared Written Blocks": 0,
   "Local Hit Blocks": 0,
   "Local Read Blocks": 0,
   "Local Dirtied Blocks": 0,
   "Local Written Blocks": 0,
   "Temp Read Blocks": 0,
   "Temp Written Blocks": 0
 },
[…]

For a matter of consistency, I wonder if it would be possible to format
it like the following:

[…]
 Planning:
   Planning Time: 0.203 ms
   Buffers: shared hit=14
[…]


Note: a similar way to format information is already used for JIT.

[…]
 JIT:
   Functions: 3
   Options: Inlining false, Optimization false, Expressions true,
Deforming true
   Timing: Generation 0.340 ms, Inlining 0.000 ms, Optimization 0.168
ms, Emission 1.907 ms, Total 2.414 ms
[…]

Regards,
Pierre




Re: [PG13] Planning (time + buffers) data structure in explain plan (format text)

2020-08-07 Thread Pierre Giraud



Le 07/08/2020 à 14:52, Julien Rouhaud a écrit :
> Hi,
> 
> On Fri, Aug 7, 2020 at 2:30 PM Pierre Giraud  wrote:
>>
>> Hi all,
>>
>> As far as I understand, in the upcoming version 13, information about
>> buffers used during planning is now available in the explain plan.
> 
> Indeed.
> 
>> […]
>>  Planning Time: 0.203 ms
>>Buffers: shared hit=14
>> […]
>>
>> In the JSON format, the data structure is a bit different:
>>
>> […]
>>  "Planning": {
>>"Planning Time": 0.533,
>>"Shared Hit Blocks": 14,
>>"Shared Read Blocks": 0,
>>"Shared Dirtied Blocks": 0,
>>"Shared Written Blocks": 0,
>>"Local Hit Blocks": 0,
>>"Local Read Blocks": 0,
>>"Local Dirtied Blocks": 0,
>>"Local Written Blocks": 0,
>>"Temp Read Blocks": 0,
>>"Temp Written Blocks": 0
>>  },
>> […]
>>
>> For a matter of consistency, I wonder if it would be possible to format
>> it like the following:
>>
>> […]
>>  Planning:
>>Planning Time: 0.203 ms
>>Buffers: shared hit=14
>> […]
> 
> I agree that this output looks more consistent with other output,
> including JIT as you mentioned.  I'll send a patch for that if there's
> no objection.

Thanks a lot!

> 
> Out of curiosity, is the current text output actually harder to parse
> than the one you're suggesting?
> 

I don't want to speak in the name of developers of others parsing tools
but this should not require a lot of work to parse the output I'm proposing.
It would be nice to have their opinion though, especially Hubert depesz
Lubaczewski's since he already integrated the change:
https://gitlab.com/depesz/Pg--Explain/-/commit/4a760136ee69ee4929625d4e4022f79ac60b763f

However, as far as I know, he's not doing anything with the buffers
information with the "Planning" section yet.

To answer your question, I think that the new output would make the
parser a little bit easier to write because it would make things a bit
clearer (ie. more separated) so less prone to errors.




Re: [PG13] Planning (time + buffers) data structure in explain plan (format text)

2020-08-20 Thread Pierre Giraud
Can you please show what the plan would look like for?

=# explain (buffers on, summary on, format JSON) select * from t;



Le 20/08/2020 à 09:58, Fujii Masao a écrit :
> 
> 
> On 2020/08/20 13:00, David Rowley wrote:
>> On Thu, 20 Aug 2020 at 03:31, Fujii Masao
>>  wrote:
>>> With the patch, for example, whatever "summary" settng is, "buffers on"
>>> displays the planner's buffer usage if it happens.
>>
>> I forgot to mention earlier, you'll also need to remove the part in
>> the docs that mentions BUFFERS requires ANALYZE.
> 
> Thanks for the review! I removed that.
> Attached is the updated version of the patch.
> I also added the regression test performing "explain (buffers on)"
> without "analyze" option.
> 
> Regards,
> 




Re: [PG13] Planning (time + buffers) data structure in explain plan (format text)

2020-08-20 Thread Pierre Giraud
Le 20/08/2020 à 17:41, Fujii Masao a écrit :
> 
> 
> On 2020/08/20 22:34, Julien Rouhaud wrote:
>> On Thu, Aug 20, 2020 at 12:29 PM David Rowley 
>> wrote:
>>>
>>> On Thu, 20 Aug 2020 at 19:58, Fujii Masao
>>>  wrote:

 On 2020/08/20 13:00, David Rowley wrote:
> I forgot to mention earlier, you'll also need to remove the part in
> the docs that mentions BUFFERS requires ANALYZE.

 Thanks for the review! I removed that.
 Attached is the updated version of the patch.
 I also added the regression test performing "explain (buffers on)"
 without "analyze" option.
>>>
>>> Looks good to me.
>>
>> Looks good to me too.
> 
> David and Julien, thanks for the review! I'd like to wait for
> Pierre's opinion about this change before committing the patch.
> 
> Pierre,
> could you share your opinion about this change?

It looks good to me too. Thanks a lot!
Let's not forget to notify Hubert (depesz) once integrated.

> 
> Regards,
> 




Explain: Duplicate key "Workers" in JSON format

2019-08-23 Thread Pierre Giraud
Hi,

I'm currently working on a tool to visualize an execution plan [1]. For
those who know PEV, it's actually a fork of this great tool since it
hasn't been active for more than 2 years.

Among other things, I'd like to show information for the parallel
queries. First of which is information about the workers.

I'm facing a problem when I am trying to parse a plan in the JSON
format. The "Workers" key may be duplicated.

While it's not invalid to have keys with the same name several times at
the same level in a JSON object, it makes it almost impossible to get
the full info when parsed. Indeed when parsing such a JSON string only
the last key is kept. Part of the information is lost.

JSON validators warn us with the following message : "Duplicate key,
names should be unique."

Here's an example of a plan in VERBOSE mode.

[
  {
"Plan": {
  "Node Type": "Gather Merge",
  "Parallel Aware": false,
  "Actual Startup Time": 1720.052,
  "Actual Total Time": 4252.290,
  "Actual Rows": 1000,
  "Actual Loops": 1,
  "Output": ["c1", "c2"],
  "Workers Planned": 2,
  "Workers Launched": 2,
  "Plans": [
{
  "Node Type": "Sort",
  "Parent Relationship": "Outer",
  "Parallel Aware": false,
  "Actual Startup Time": 1558.638,
  "Actual Total Time": 2127.522,
  "Actual Rows": 333,
  "Actual Loops": 3,
  "Output": ["c1", "c2"],
  "Sort Key": ["t1.c1"],
  "Sort Method": "external merge",
  "Sort Space Used": 126152,
  "Sort Space Type": "Disk",
  "Workers": [
{
  "Worker Number": 0,
  "Sort Method": "external merge",
  "Sort Space Used": 73552,
  "Sort Space Type": "Disk"
},
{
  "Worker Number": 1,
  "Sort Method": "external merge",
  "Sort Space Used": 73320,
  "Sort Space Type": "Disk"
}
  ],
  "Workers": [
{
  "Worker Number": 0,
  "Actual Startup Time": 1487.846,
  "Actual Total Time": 1996.879,
  "Actual Rows": 2692973,
  "Actual Loops": 1
},
{
  "Worker Number": 1,
  "Actual Startup Time": 1468.256,
  "Actual Total Time": 2012.744,
  "Actual Rows": 2684443,
  "Actual Loops": 1
}
  ],
  "Plans": [
{
  "Node Type": "Seq Scan",
  "Parent Relationship": "Outer",
  "Parallel Aware": true,
  "Relation Name": "t1",
  "Schema": "public",
  "Alias": "t1",
  "Actual Startup Time": 0.211,
  "Actual Total Time": 372.858,
  "Actual Rows": 333,
  "Actual Loops": 3,
  "Output": ["c1", "c2"],
  "Workers": [
{
  "Worker Number": 0,
  "Actual Startup Time": 0.029,
  "Actual Total Time": 368.356,
  "Actual Rows": 2692973,
  "Actual Loops": 1
},
{
  "Worker Number": 1,
  "Actual Startup Time": 0.033,
  "Actual Total Time": 368.874,
  "Actual Rows": 2684443,
  "Actual Loops": 1
}
  ]
}
  ]
}
  ]
},
"Planning Time": 0.170,
"Triggers": [
],
"Execution Time": 4695.141
  }
]

As you can see, the "Workers" key is duplicated in the Sort node.

Here's the equivalent in TEXT format:

-
 Gather Merge  (cost=735306.27..1707599.95 rows=864 width=17)
(actual time=1560.468..3749.583 rows=1000 loops=1)
   Output: c1, c2
   Workers Planned: 2
   Workers Launched: 2
   ->  Sort  (cost=734306.25..744722.95 rows=4166682 width=17) (actual
time=1474.182..1967.788 rows=333 loops=3)
 Output: c1, c2
 Sort Key: t1.c1
 Sort Method: external merge  Disk: 125168kB
 Worker 0:  Sort Method: external merge  Disk: 73768kB
 Worker 1:  Sort Method: external merge  Disk: 74088kB
 Worker 0: actual time=1431.136..1883.370 rows=2700666 loops=1
 Worker 1: actual time=1431.175..1891.630 rows=2712505 loops=1
 ->  Parallel Seq Scan on public.t1  (cost=0.00..105264.82
rows=4166682 width=17) (actual time=0.214..386.014 rows=333 loops=3)
   Output: c1, c2
   Worker 0: actual time=0.027..382.325 rows=2700666 loops=1
   Worker 1: actual time=0.038..384.951 rows=2712505 loops=1
 Planning Time: 0.180 ms
 Execution Time: 4166.867 ms
(18 rows)
-

I think that the text format should stay as is.

For the JSON format however it would be better in my opinion if
"Workers" data i

Re: Poll: are people okay with function/operator table redesign?

2020-04-15 Thread Pierre Giraud


Le 16/04/2020 à 00:18, Tom Lane a écrit :
> As I threatened to do earlier, I made a pass at converting table 9.10
> to a couple of the styles under discussion.  (This is just a
> draft-quality patch, so it might have some minor bugs --- the point
> is just to see what these styles look like.)
> 
> I've concluded after looking around that the ideas involving not having
> a  at all, but just a  or the like, are not very
> well-advised.  That would eliminate, or at least greatly degrade, the
> visual distinction between the per-function material and the surrounding
> commentary.  Which does not seem like a winner to me; for example it
> would make it quite hard to skip over the detailed material when you're
> just trying to skim the docs.
> 
> We did have a number of people suggesting that just reordering things as
> "description, signature, examples" might be a good idea, so I gave that
> a try; attached is a rendition of a portion of 9.10 in that style (the
> "v1" image).  It's not bad, but there's still going to be a lot of
> wasted whitespace in tables that include even one long function name.
> (9.10's longest is "regexp_split_to_array", so it's showing this problem
> significantly.)
> 
> I also experimented with Jonathan's idea of dropping the separate
> function name and allowing the function signature to span left into
> that column -- see "v2" images.  This actually works really well,
> and would work even better (IMO) if we could get rid of the inter-row
> and inter-column rules within a function entry.  I failed to
> accomplish that with rowsep/colsep annotations, but from remarks
> upthread I suppose there might be a CSS way to accomplish it.  (But
> the rowsep/colsep annotations *do* work in PDF output, so I kept them;
> that means we only need a CSS fix and not some kind of flow-object
> magic for PDF.)
> 
> To allow direct comparison of these 9.10 images against the situation
> in HEAD, I've also attached an extract of 9.10 as rendered by my
> browser with "STYLE=website".  As you can see this is *not* quite
> identical to how it renders on postgresql.org, so there is still some
> unexplained differential in font or margins or something.  But if you
> look at those three PNGs you can see that either v1 or v2 has a pretty
> substantial advantage over HEAD in terms of the amount of space
> needed.  v2 would be even further ahead if we could eliminate some of
> the vertical space around the intra-function row split, which again
> might be doable with CSS magic.
> 
> The main disadvantage I can see to the v2 design is that we're back
> to having two  per function, which is inevitably going to result
> in PDF builds putting page breaks between those rows.  But you can't
> have everything ... and maybe we could find a way to discourage such
> breaks if we tried.

What about putting everything into one  and use a block with
some left padding/margin for description + example.
This would solve the PDF page break issue as well as the column
separation border one.

The screenshot attached uses a  tag for the descrition/example block.

> 
> Another issue is that v2 won't adapt real well to operator tables;
> the operator name won't be at the left.  I don't have a lot of faith
> in the proposal to fix that with font tricks.  Maybe we could stick
> to something close to the layout that table 9.30 has in HEAD (ie
> repeating the operator name in column 1), since we won't have long
> operator names messing up the format.  Again, CSS'ing our way
> out of the internal lines and extra vertical space within a single
> logical table cell would make that layout look nicer.
> 
> On balance I quite like the v2 layout and would prefer to move forward
> with that, assuming we can solve the remaining issues via CSS or style
> sheets.
> 
> In addition to screenshots, I've attached patches against HEAD that
> convert both tables 9.10 and 9.33 into v1 and v2 styles.
> 
>   regards, tom lane
> 


Re: Poll: are people okay with function/operator table redesign?

2020-04-16 Thread Pierre Giraud


Le 16/04/2020 à 16:43, Tom Lane a écrit :
> Pierre Giraud  writes:
>> Le 16/04/2020 à 00:18, Tom Lane a écrit :
>>> The main disadvantage I can see to the v2 design is that we're back
>>> to having two  per function, which is inevitably going to result
>>> in PDF builds putting page breaks between those rows.  But you can't
>>> have everything ... and maybe we could find a way to discourage such
>>> breaks if we tried.
> 
> Further experimentation shows that the PDF toolchain is perfectly willing
> to put a page break *within* a multi-line ; if there is any
> preference to break between rows instead, it's pretty weak.  So that
> argument is a red herring and we shouldn't waste time chasing it.
> However, there'd still be some advantage in not being dependent on CSS
> hackery to make it look nice in HTML.
> 
> What we're down to wanting, at this point, is basically a para with
> hanging indent.
> 
>> What about putting everything into one  and use a block with
>> some left padding/margin for description + example.
>> This would solve the PDF page break issue as well as the column
>> separation border one.
>> The screenshot attached uses a  tag for the descrition/example block.
> 
> That looks about right, perhaps, but could you be a little clearer about
> how you accomplished that?

Attached you will find the HTML structure with associated styles.
Sorry I haven't tried to do this from the DocBook sources.
I hope this helps though.

Regards
# HTML

  

  

  Function
  
Description 
Example
  

  


  

  age(timestamp, timestamp) → interval
  
Subtract arguments, producing a “symbolic” result that uses years and months, rather 
than just days



  age(timestamp '2001-04-10', timestamp 
'1957-06-13') → 43 years 9 mons 27 days


  

  

  

# CSS

#docContent table.table code.literal {
font-style: italic;
}

code.returnvalue {
font-weight: bold;
}

code.function {
font-weight: bold;
}

.replaceable > code {
font-weight: normal !important;
}

#docContent table.table dl {
margin-left: 30px;
margin-top: 5px;
}

#docContent table.table dl > dd {
margin-bottom: 0;
}


Re: [PATCH] Add extra statistics to explain for Nested Loop

2020-10-19 Thread Pierre Giraud



Le 17/10/2020 à 06:26, Julien Rouhaud a écrit :
> On Sat, Oct 17, 2020 at 12:15 PM Pavel Stehule  
> wrote:
>>
>> so 17. 10. 2020 v 0:11 odesílatel Anastasia Lubennikova 
>>  napsal:
>>>
>>> On 16.10.2020 12:07, Julien Rouhaud wrote:
>>>
>>> Le ven. 16 oct. 2020 à 16:12, Pavel Stehule  a 
>>> écrit :



 pá 16. 10. 2020 v 9:43 odesílatel  napsal:
>
> Hi, hackers.
> For some distributions of data in tables, different loops in nested loop
> joins can take different time and process different amounts of entries.
> It makes average statistics returned by explain analyze not very useful
> for DBA.
> To fix it, here is the patch that add printing of min and max statistics
> for time and rows across all loops in Nested Loop to EXPLAIN ANALYSE.
> Please don't hesitate to share any thoughts on this topic!


 +1

 This is great feature - sometimes it can be pretty messy current limited 
 format
>>>
>>>
>>> +1, this can be very handy!
>>>
>>> Cool.
>>> I have added your patch to the commitfest, so it won't get lost.
> 
> Thanks!  I'll also try to review it next week.
> 
>>> https://commitfest.postgresql.org/30/2765/
>>>
>>> I will review the code next week.  Unfortunately, I cannot give any 
>>> feedback about usability of this feature.
>>>
>>> User visible change is:
>>>
>>> -   ->  Nested Loop (actual rows=N loops=N)
>>> +  ->  Nested Loop (actual min_rows=0 rows=0 max_rows=0 loops=2)
>>
>>
>> This interface is ok - there is not too much space for creativity.
> 
> Yes I also think it's ok. We should also consider usability for tools
> like explain.depesz.com, I don't know if the current output is best.
> I'm adding Depesz and Pierre which are both working on this kind of
> tool for additional input.

Same for me and PEV2. It should be fairly easy to parse this new format.

> 
>> I can imagine displaying variance or average - but I am afraid about very 
>> bad performance impacts.
> 
> The original counter (rows here) is already an average right?
> Variance could be nice too.  Instrumentation will already spam
> gettimeofday() calls for nested loops, I don't think that computing
> variance would add that much overhead?

Thus, it's an average value. And to be mentioned: a rounded one! Which I
found a bit tricky to figure out.




doc: Remove LC_COLLATE and LC_CTYPE from SHOW command

2024-12-04 Thread Pierre Giraud

Hi,

Since version 16, the read-only server settings lc_collate and lc_ctype 
have been removed [1]. Unfortunately, the documentation has been updated 
only partially.


Attached is a patch to also update the page for the "SHOW" SQL command 
and remove the corresponding "LC_COLLATE" and "LC_CTYPE" parameters. I 
successfully built the doc in HTML locally.


This patch has been created against the current master branch but it 
should equally be applied to REL_16_STABLE and REL_17_STABLE.


Thanks,
Pierre

[1] 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=b0f6c437160db640d4ea3e49398ebc3ba39d1982
From 32575b9478ecff4dd27a7215f9d34e6eb2abf6d0 Mon Sep 17 00:00:00 2001
From: Pierre GIRAUD 
Date: Wed, 4 Dec 2024 09:03:02 +0100
Subject: [PATCH v1] doc: remove LC_COLLATE and LC_CTYPE from SHOW command

The corresponding read-only server settings have been removed since
in PG16. See commit b0f6c437160db6.
---
 doc/src/sgml/ref/show.sgml | 24 
 1 file changed, 24 deletions(-)

diff --git a/doc/src/sgml/ref/show.sgml b/doc/src/sgml/ref/show.sgml
index b3747b119f..5fbb5bbe01 100644
--- a/doc/src/sgml/ref/show.sgml
+++ b/doc/src/sgml/ref/show.sgml
@@ -77,30 +77,6 @@ SHOW ALL
 

 
-   
-LC_COLLATE
-
- 
-  Shows the database's locale setting for collation (text
-  ordering).  At present, this parameter can be shown but not
-  set, because the setting is determined at database creation
-  time.
- 
-
-   
-
-   
-LC_CTYPE
-
- 
-  Shows the database's locale setting for character
-  classification.  At present, this parameter can be shown but
-  not set, because the setting is determined at database creation
-  time.
- 
-
-   
-

 IS_SUPERUSER
 
-- 
2.43.0