Fix incorrect UUID index entry in function documentation

2025-06-20 Thread Fujii Masao

Hi,

Both the UUID data type and UUID functions pages define an index entry
for "UUID" that points to the data type section. As a result, the index
includes two identical entries linking to the UUID type docs,
which seems strange.

I believe the UUID functions page should instead define its own index
entry that links to itself. Currently, the indexterm is written as:


 
  UUID Functions

  
   UUID
   generating
  


I suspect that "datatype-uuid" is a copy-paste error and should be
"functions-uuid" to reflect the correct section. The attached patch
updates this accordingly.

Thoughts?

Regards,

--
Fujii Masao
NTT DATA Japan Corporation
From e4fbb1929ccb43ff3adc282893167a2df6cbc1d2 Mon Sep 17 00:00:00 2001
From: Fujii Masao 
Date: Fri, 20 Jun 2025 22:54:52 +0900
Subject: [PATCH v1] doc: Fix incorrect UUID index entry in function
 documentation.

Previously, the UUID functions documentation defined the "UUID" index entry
to link to the UUID data type page, even though that entry already exists there.
Instead, the UUID functions page should define its own index entry linking
to itself.

This commit updates the UUID index entry in the UUID functions documentation
to point to the correct section, improving navigation and avoiding duplication.

Back-patch to all supported versions.
---
 doc/src/sgml/func.sgml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 8d7d9a2f3e8..a0b9044e358 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14374,7 +14374,7 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 
'green', 'blue', 'purple
  
   UUID Functions
 
-  
+  
UUID
generating
   
-- 
2.49.0



Cleanup of syntax.sgml

2025-06-20 Thread Joshua Drake
To make it more consumable.

-- 

   - Founder - https://commandprompt.com/ - 24x7x365 Postgres since 1997
   - Founder and Co-Chair - https://postgresconf.org/
   - Founder - https://postgresql.us - United States PostgreSQL
   - Public speaker, published author, postgresql expert, and people
   believer.
   - Host - More than a refresh
   : A podcast about
   data and the people who wrangle it.
diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index 916189a7d68..900f8f8e441 100644
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -15,10 +15,10 @@
  
 
  
-  We also advise users who are already familiar with SQL to read this
-  chapter carefully because it contains several rules and concepts that
-  are implemented inconsistently among SQL databases or that are
-  specific to PostgreSQL.
+  We advise users who to read this chapter carefully because it 
+  contains several rules and concepts that are implemented 
+  inconsistently among SQL databases or that are specific to 
+  PostgreSQL.
  
 
  
@@ -29,53 +29,48 @@
   
 
   
-   SQL input consists of a sequence of
+   SQL consists of a sequence of
commands.  A command is composed of a
sequence of tokens, terminated by a
-   semicolon (;).  The end of the input stream also
-   terminates a command.  Which tokens are valid depends on the syntax
-   of the particular command.
+   semicolon (;). Which tokens are valid depends on 
+   the syntax of the particular command.
   
 
   
A token can be a key word, an
identifier, a quoted
identifier, a literal (or
-   constant), or a special character symbol.  Tokens are normally
-   separated by whitespace (space, tab, newline), but need not be if
-   there is no ambiguity (which is generally only the case if a
-   special character is adjacent to some other token type).
+   constant), or a special character symbol.
   
 

-For example, the following is (syntactically) valid SQL input:
+The following is (syntactically) valid SQL input:
 
 SELECT * FROM MY_TABLE;
 UPDATE MY_TABLE SET A = 5;
 INSERT INTO MY_TABLE VALUES (3, 'hi there');
 
-This is a sequence of three commands, one per line (although this
-is not required; more than one command can be on a line, and
-commands can usefully be split across lines).
+This is a sequence of three commands, one per line. More
+than one command can be on a line, and commands can also be split 
+across lines.

 
   
-   Additionally, comments can occur in SQL
-   input.  They are not tokens, they are effectively equivalent to
-   whitespace.
+   Additionally, comments are not tokens and 
+   can occur in SQL input.
   
 
   
-   The SQL syntax is not very consistent regarding what tokens
-   identify commands and which are operands or parameters.  The first
-   few tokens are generally the command name, so in the above example
-   we would usually speak of a SELECT, an
-   UPDATE, and an INSERT command.  But
-   for instance the UPDATE command always requires
-   a SET token to appear in a certain position, and
-   this particular variation of INSERT also
-   requires a VALUES in order to be complete.  The
-   precise syntax rules for each command are described in .
+   SQL is not consistent regarding what tokens identify commands and 
+   which are operands or parameters.  The first few tokens are 
+   generally the command name, so in the above example we would 
+   speak of a SELECT, an UPDATE, and 
+   an INSERT command.  But for the 
+   UPDATE command always requires a SET 
+   token to appear in a certain position, and this particular 
+   variation of INSERT also requires a 
+   VALUES in order to be complete.  The precise syntax 
+   rules for each command are described in .
   
 
   
@@ -98,11 +93,10 @@ INSERT INTO MY_TABLE VALUES (3, 'hi there');
 

 Tokens such as SELECT, UPDATE, or
-VALUES in the example above are examples of
-key words, that is, words that have a fixed
-meaning in the SQL language.  The tokens MY_TABLE
-and A are examples of
-identifiers.  They identify names of
+VALUES in the example above are 
+key words and they have a fixed meaning in 
+SQL.  The tokens MY_TABLE and A are
+examples of identifiers.  They identify names of
 tables, columns, or other database objects, depending on the
 command they are used in.  Therefore they are sometimes simply
 called names.  Key words and identifiers have the
@@ -119,24 +113,12 @@ INSERT INTO MY_TABLE VALUES (3, 'hi there');
 (_).  Subsequent characters in an identifier or
 key word can be letters, underscores, digits
 (0-9), or dollar signs
-($).  Note that dollar signs are not allowed in identifiers
-according to the letter of the SQL standard, so their use might render
-applications less portable.
-The SQL standard will not define a key word that contains
-digits or starts or e

Re: Cleanup of syntax.sgml

2025-06-20 Thread David G. Johnston
On Fri, Jun 20, 2025 at 12:33 PM Joshua Drake  wrote:

> To make it more consumable.
>

Overall I'm good with the attempt to trim, and most of the changes, but
feel it tries to hard and ends up being to "matter-of-fact"; the
conjunctions that exist make reading a wall of text easier.  I agree that
some of them could be removed as being more judgemental than mechanical.

Reviewing this reminds me we are inconsistent regarding "key word" vs.
"keyword".

"We advise users who to read this chapter carefully  ..." ? botched surgery
on this one

Not sure I agree with removing the comment regarding "end of the input
stream".

I think I'm ok with leaving token separation unspecified here, especially
since it isn't totally accurate (at least in regards to "special character
symbol" which often are grouped together).

Why leave "(syntactically)" in parentheses?  Also, you got rid of the word
"input" in SQL input above but left it here.  I think leaving "SQL input
consists of..." is better.

For the examples, I would put "values" on its own line.  And I would add a
delete command on the same line as the update command.  Then just describe
that.

Select...;
update...; delete...;
insert ...
values ...;

I really don't like the re-wording regarding comments.

"But for the UPDATE command always ..." ? another
botched surgery
I'm not sure what the entire paragraph really gives the reader though,
besides a pointer to the reference chapter.  It needs more pruning than
given here IMO.


I feel like if we want to enhance clarity about where we differ from the
standard that we use callouts for those items instead of burying the
information in walls of text.  Like the point about accepting dollar signs
in unquoted identifiers.


-A convention often used is to write key words in upper
+The recommened convention is to write key words in upper  [recommended
needs a d]
Both should be avoided.  We can say "It is the convention in this
documentation to write key words in upper case and names in lower case."
Let other places than our syntax reference speak to real-world conventions
besides ours.

Where we introduce "quoted identifiers" link to the description for the
formal syntax - then it's ok to remove discussions of minutia like
including double quotes in a quoted identifier.

punctuation:
+Inside the quotes, Unicode characters can be specified in escaped
+form by writing a backslash followed by the four-digit hexadecimal
+code point number or[,] alternatively[,] a backslash followed by a
plus
+sign [(+)] followed by a six-digit hexadecimal code point number.


I've kind of grown fond of "This slightly bizarre behavior"... ;)


+ If you can use Unicode escapes or the alternative Unicode escape
syntax,
+ explained in ; then the
server

Prefer the existing.  This lacks commas or other ways to make it read
well.  Removing "useful" judgement is probably sufficient.  Or maybe try a
different approach.

I concur we should remove the discussion regarding the GUCs at this point.

Maybe also include the correct way of writing the U & 'foo' operation in
the ambiguity discussion?

"optional tag of zero or more characters" is redundant.  Optional is
sufficient.

But much more concisely:
''""
A dollar-quoted string surrounds the content with user-specified tags of
the form  $label$ instead of quotation marks.  The label may be the empty
string.  For example, here are two different ways...
"""

- used without needing to be escaped.  Indeed, no characters inside
+ used without needing to be escaped. No characters inside
- Here, the sequence $q$[\t\r\n\v\\]$q$ represents a
+ The sequence $q$[\t\r\n\v\\]$q$ represents a
- PostgreSQL.  But since the sequence does
not match
+ PostgreSQL.  Since the sequence does not
match
Removing the word "Indeed, " isn't an improvement.  I get the desire to
remove the "commentary" filler fragments but this one isn't a judgement but
a highlight and seems quite appropriate.  Same goes for removing "Here" and
"But" - conjunctions are good.


"Bit-string constants is a string constant with a  "  plural needs "are",
not "is"

- described below.  Note that any leading plus or minus sign is not
actually
+ described below.  Any leading plus or minus sign is not considered
part of
"Note" is also a perfectly fine conjunction, and you haven't claimed your
fixes are to bring things in line with a style guideline, which I don't
think exists at this level of specificity.

- These are some examples of valid non-decimal integer constants:
+ Examples of valid non-decimal integer constants:
Status quo preferred.


Note, the stuff I'm not calling out does seem ok to remove in context.

 A comment is removed from the input stream before further syntax
-analysis and is effectively replaced by whitespace.
+analysis and is replaced by whitespace.

This seems repetitive with an earlier change...also, is a 20 character
comment replaced with 20 spaces?  Why 

Re: Document if width_bucket's low and high are inclusive/exclusive

2025-06-20 Thread Tom Lane
I wrote:
> Another thing I just remembered (think I knew it once) is the
> behavior of the first form when low > high.  It's not an error!

So concretely, how about the attached?  In addition to what we
mentioned so far, I made the sentence about out-of-range cases
more explicit.

regards, tom lane

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 8d7d9a2f3e8..11676b63c82 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -1824,13 +1824,24 @@ SELECT NOT(ROW(table.*) IS NOT NULL) FROM TABLE; -- detect at least one null in
 which operand falls in a histogram
 having count equal-width buckets spanning the
 range low to high.
-Returns 0
+The buckets have inclusive lower bounds, and therefore exclusive
+upper bounds.
+Returns 0 for an input less
+than low,
 or count+1 for an input
-outside that range.
+greater than or equal to high.
+If low > high,
+the behavior is mirror-reversed, with bucket 1
+now being the one just below low, and the
+inclusive bounds now being on the upper side.


 width_bucket(5.35, 0.024, 10.06, 5)
 3
+   
+   
+width_bucket(9, 10, 0, 10)
+2

   
 
@@ -1842,8 +1853,8 @@ SELECT NOT(ROW(table.*) IS NOT NULL) FROM TABLE; -- detect at least one null in

 Returns the number of the bucket in
 which operand falls given an array listing the
-lower bounds of the buckets.  Returns 0 for an
-input less than the first lower
+inclusive lower bounds of the buckets.
+Returns 0 for an input less than the first lower
 bound.  operand and the array elements can be
 of any type having standard comparison operators.
 The thresholds array must be


Re: Cleanup of syntax.sgml

2025-06-20 Thread Joshua Drake
>
> Overall I'm good with the attempt to trim, and most of the changes, but
> feel it tries to hard and ends up being to "matter-of-fact"; the
> conjunctions that exist make reading a wall of text easier.  I agree that
> some of them could be removed as being more judgemental than mechanical.
>

Fair enough and granted some of this is subjective. I went matter-of-fact
because the less text to make the point, is IMO always better.


>  Reviewing this reminds me we are inconsistent regarding "key word" vs.
> "keyword".
>
>
It is funny you bring that up. I actually googled the difference and just
stared at it because well nowadays they are the same thing and yes we
should be consistent.



> "We advise users who to read this chapter carefully  ..." ? botched
> surgery on this one
>

Not sure what you mean here? I reviewed the source sgml (that I modified):

We advise users who to read this chapter carefully because it
 contains several rules and concepts that are implemented
 inconsistently among SQL databases or that are specific to
 PostgreSQL.

If anything, I missed the overall paragraph. I would have removed the word
carefully.

>
>
> Not sure I agree with removing the comment regarding "end of the input
> stream".
>

It seemed unnecessary as well as potentially confusing to a newer user.
What is the end of an input stream? How do we know... etc?

>
> I think I'm ok with leaving token separation unspecified here, especially
> since it isn't totally accurate (at least in regards to "special character
> symbol" which often are grouped together).
>
> Why leave "(syntactically)" in parentheses?
>

Oversight. I agree that it shouldn't be in ().


> Also, you got rid of the word "input" in SQL input above but left it
> here.  I think leaving "SQL input consists of..." is better.
>

Sure

>
> For the examples, I would put "values" on its own line.  And I would add a
> delete command on the same line as the update command.  Then just describe
> that.
>
> Select...;
> update...; delete...;
> insert ...
> values ...;
>
> I really don't like the re-wording regarding comments.
>
> "But for the UPDATE command always ..." ? another
> botched surgery
>

Yep, that's bad. Will fix it.


> I'm not sure what the entire paragraph really gives the reader though,
> besides a pointer to the reference chapter.  It needs more pruning than
> given here IMO.
>

I will take a look.

>
>
> I feel like if we want to enhance clarity about where we differ from the
> standard that we use callouts for those items instead of burying the
> information in walls of text.  Like the point about accepting dollar signs
> in unquoted identifiers.
>
>
> -A convention often used is to write key words in upper
> +The recommened convention is to write key words in upper
> [recommended needs a d]
> Both should be avoided.  We can say "It is the convention in this
> documentation to write key words in upper case and names in lower case."
> Let other places than our syntax reference speak to real-world conventions
> besides ours.
>

agreed.


>
> Where we introduce "quoted identifiers" link to the description for the
> formal syntax - then it's ok to remove discussions of minutia like
> including double quotes in a quoted identifier.
>
> punctuation:
> +Inside the quotes, Unicode characters can be specified in escaped
> +form by writing a backslash followed by the four-digit hexadecimal
> +code point number or[,] alternatively[,] a backslash followed by a
> plus
> +sign [(+)] followed by a six-digit hexadecimal code point number.
>
>
> I've kind of grown fond of "This slightly bizarre behavior"... ;)
>

I don't disagree :). I was trying to remove the subjectiveness of it.

I review the rest of what you said.


Re: Fix incorrect UUID index entry in function documentation

2025-06-20 Thread Masahiko Sawada
On Fri, Jun 20, 2025 at 11:33 PM Fujii Masao
 wrote:
>
> Hi,
>
> Both the UUID data type and UUID functions pages define an index entry
> for "UUID" that points to the data type section. As a result, the index
> includes two identical entries linking to the UUID type docs,
> which seems strange.
>
> I believe the UUID functions page should instead define its own index
> entry that links to itself. Currently, the indexterm is written as:
>
> 
>   
>UUID Functions
>
>
> UUID
> generating
>
> 
>
> I suspect that "datatype-uuid" is a copy-paste error and should be
> "functions-uuid" to reflect the correct section. The attached patch
> updates this accordingly.
>
> Thoughts?

+1. I think it also makes sense that "UUID generating" has the link to
"UUID Functions".

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Cleanup of syntax.sgml

2025-06-20 Thread David G. Johnston
On Fri, Jun 20, 2025 at 2:21 PM Joshua Drake  wrote:

>
>
>> "We advise users who to read this chapter carefully  ..." ? botched
>> surgery on this one
>>
>
> Not sure what you mean here? I reviewed the source sgml (that I modified):
>
> We advise users who to read this chapter carefully because it
>  contains several rules and concepts that are implemented
>  inconsistently among SQL databases or that are specific to
>  PostgreSQL.
>
> We advise users who to read this chapter carefully... is not proper
English - drop the "who"?

We advise users to read this chapter carefully

Honestly, though, that is just bad advice.  It is not possible to read it
carefully, it's too much material.  Apply ruthless subjective-ness removal
protocol here too.  Just state what the chapter covers.

David J.