Optimizing the documentation

2020-12-14 Thread Joshua Drake
-hackers,

The community has spent a lot of time optimizing features over the years.
Excellent examples include parallel query and partitioning which have been
multi-year efforts to increase the quality,  performance, and extend
features of the original commit. We should consider the documentation in a
similar manner. Just like code, documentation can sometimes use a bug fix,
optimization, and/or new features added to the original implementation.

Technical documentation should only be as verbose as needed to illustrate
the concept or task that we are explaining. It should not be redundant, nor
should it use .50 cent words when a .10 cent word would suffice. I would
like to put effort into optimizing the documentation and am requesting
general consensus that this would be a worthwhile effort before I begin to
dust off my Docbook skills.

I have provided an example below:

Original text (79 words):

This book is the official documentation of PostgreSQL. It has been written
by the PostgreSQL developers and other volunteers in parallel to the
development of the PostgreSQL software. It describes all the functionality
that the current version of PostgreSQL officially supports.

To make the large amount of information about PostgreSQL manageable, this
book has been organized in several parts. Each part is targeted at a
different class of users, or at users in different stages of their
PostgreSQL experience:

Optimized text (35 words):

This is the official PostgreSQL documentation. It is written by the
PostgreSQL community in parallel with the development of the software. We
have organized it by the type of user and their stages of experience:

Issues that are resolved with the optimized text:

   -

   Succinct text is more likely to be read than skimmed
   -

   Removal of extraneous mentions of PostgreSQL
   -

   Removal of unneeded justifications
   -

   Joining of two paragraphs into one that provides only the needed
   information to the user
   -

   Word count decreased by over 50%. As changes such as these are adopted
   it would make the documentation more consumable.

Thanks,
JD

-- 
Founder - https://commandprompt.com/ - 24x7x365 Postgres since 1997
Co-Chair - https://postgresconf.org/ - Postgres Education at its finest
People, Postgres, Data


Re: Optimizing the documentation

2020-12-14 Thread Joshua Drake
>
>
>>
>> Technical documentation should only be as verbose as needed to illustrate
>> the concept or task that we are explaining. It should not be redundant, nor
>> should it use .50 cent words when a .10 cent word would suffice. I would
>> like to put effort into optimizing the documentation and am requesting
>> general consensus that this would be a worthwhile effort before I begin to
>> dust off my Docbook skills.
>>
>>
> As a quick observation, it would be more immediately helpful to add to the
> existing proposal to add more details about architecture and get that
> committed before embarking on a new documentation project.
>
> https://commitfest.postgresql.org/31/2541/
>

I considered just starting to review patches as such but even with that,
doesn't it make sense that if I am going to be putting a particular thought
process into my efforts that there is a general consensus? For example,
what would be exceedly helpful would be a documentation style guide that is
canonical and we can review documentation against. Currently our
documentation is all over the place. It isn't that it is not technically
accurate or comprehensive


> Optimized text (35 words):
>>
>> This is the official PostgreSQL documentation. It is written by the
>> PostgreSQL community in parallel with the development of the software. We
>> have organized it by the type of user and their stages of experience:
>>
>> Issues that are resolved with the optimized text:
>>
>>-
>>
>>Succinct text is more likely to be read than skimmed
>>-
>>
>>Removal of extraneous mentions of PostgreSQL
>>-
>>
>>Removal of unneeded justifications
>>-
>>
>>Joining of two paragraphs into one that provides only the needed
>>information to the user
>>-
>>
>>Word count decreased by over 50%. As changes such as these are
>>adopted it would make the documentation more consumable.
>>
>> That actually exists in our documentation?
>

Yes. https://www.postgresql.org/docs/13/preface.html


> I suspect changing it isn't all that worthwhile as the typical user isn't
> reading the documentation like a book and with the entry point being the
> table of contents most of that material is simply gleaned from observing
> the presented structure without words needed to describe it.
>

It is a matter of consistency.


>
> While I don't think making readability changes is a bad thing, and maybe
> my perspective is a bit biased and negative right now, but the attention
> given to the existing documentation patches in the commitfest isn't that
> great - so adding another mass of patches fixing up items that haven't
> provoked complaints seems likely to just make the list longer.
>

One of the issues is that editing documentation with patches is a pain. It
is simpler and a lower barrier of effort to pull up an existing section of
Docbook and edit that (just like code) than it is to break out specific
text within a patch. Though I would be happy to take a swipe at reviewing a
specific documentation patch (as you linked).


>
> In short, I don't think optimization should be a goal in its own right;
> but rather changes should mostly be driven by questions asked by our
> users.  I don't think reading random chapters of the documentation to find
> non-optimal exposition is going to be a good use of time.
>

I wasn't planning on reading random chapters. I was planning on walking
through the documentation as it is written and hopefully others would join.
This is a monumental effort to perform completely. Also consider the
overall benefit, not just one specific piece. Would you not consider it a
net win if certain questions were being answered in a succinct way as to
allow users to use the documentation instead of asking the most novice of
questions on various channels?

JD

>
>


Re: Optimizing the documentation

2020-12-14 Thread Joshua Drake
>
>
>
> > This is the official PostgreSQL documentation. It is written by the
> > PostgreSQL community in parallel with the development of the software.
> > We have organized it by the type of user and their stages of experience:
>
> Some thoughts on this example:
>
> - Changing "has been" to "is" changes the tone here. "Is" implies that
> it is being written continuously, whereas "has been" implies that it's
> finished. We do update the docs continuously, but point of the sentence
> is that the docs were developed together with the features, so "has
> been" seems more accurate.
>

No argument.


>
> ´- I like "PostgreSQL developers and other volunteers" better than the
> "PostgreSQL community". This is the very first introduction to
> PostgreSQL, so we can't expect the reader to know what the "PostgreSQL
> community" is. I like the "volunteers" word here a lot.
>
>
There is a huge community for PostgreSQL, the developers are only a
small (albeit critical) part of it. By using the term "PostgreSQL
community" we are providing equity to all those who participate in the
success of the project. I could definitely see saying "PostgreSQL
volunteers".



> - I think a little bit of ceremony is actually OK in this particular
> paragraph, since it's the very first one in the docs.
>
> - I agree with dropping the "to make the large amount of information
> manageable".
>
> So I would largely keep this example unchanged, changing it into:
>
> ---
> This book is the official documentation of PostgreSQL. It has been
> written by the PostgreSQL developers and other volunteers in parallel to
> the development of the PostgreSQL software. It describes all the
> functionality that the current version of PostgreSQL officially supports.
>
> This book has been organized in several parts. Each part is targeted at
> a different class of users, or at users in different stages of their
> PostgreSQL experience:
> ---
>
>
I appreciate the feedback and before we get too far down the rabbit hole, I
would like to note that I am not tied to an exact wording as my post was
more about the general goal and results based on that goal.


> I agree with these goals in general. I like to refer to
> http://www.plainenglish.co.uk/how-to-write-in-plain-english.html when
> writing documentation. Or anything else, really.
>

Great resource!

JD


>
> - Heikki
>


Re: Optimizing the documentation

2020-12-14 Thread Joshua Drake
>
>
>
> In short, the devil's in the details.  Maybe there are lots of
> places where this type of approach would help, but I think it's
> going to be a case-by-case discussion not something where there's
> a clear win overall.
>

Certainly and I didn't want to just start dumping patches. Part of this is
just style, for example:

Thus far, our queries have only accessed one table at a time. Queries can
access multiple tables at once, or access the same table in such a way that
multiple rows of the table are being processed at the same time. A query
that accesses multiple rows of the same or different tables at one time is
called a join query. As an example, say you wish to list all the weather
records together with the location of the associated city. To do that, we
need to compare the city column of each row of the weather table with the
name column of all rows in the cities table, and select the pairs of rows
where these values match.

It isn't "terrible" but can definitely be optimized. In a quick review, I
would put it something like this:

Queries can also access multiple tables at once, or access the same table
in a way that multiple rows are processed. A query that accesses multiple
rows of the same or different tables at one time is a join. For example, if
you wish to list all of the weather records with the location of the
associated city, we would compare the city column of each row of the weather
table with the name column of all rows in the cities table, and select the
rows *WHERE* the values match.

The reason I bolded and capitalized WHERE was to provide a visual signal to
the example that is on the page. I could also argue that we could remove
"For example," though I understand its purpose here.

Again, this was just a quick review.

JD


Re: Optimizing the documentation

2020-12-14 Thread Joshua Drake
>
>
>
> > Queries can also access multiple tables at once, or access the same table
> > in a way that multiple rows are processed. A query that accesses multiple
> > rows of the same or different tables at one time is a join. For example,
> if
> > you wish to list all of the weather records with the location of the
> > associated city, we would compare the city column of each row of the
> weather
> > table with the name column of all rows in the cities table, and select
> the
> > rows *WHERE* the values match.
>
> TBH, I'm not sure that that is an improvement at all.  I'm constantly
> reminded that for many of our users, English is not their first language.
> A little bit of redundancy in wording is often helpful for them.
>

Interesting point, it is certainly true that many of our users are ESL
folks. I would expect a succinct version to be easier to understand but I
have no idea.


>
> The places where I think the docs need help tend to be places where
> assorted people have added information over time, such that there's
> not a consistent style throughout a section; or maybe the information
> could be presented in a better order.  We don't need to be taking a
> hacksaw to text that's perfectly clear as it stands.
>

The term perfectly clear is part of the problem I am trying to address. I
can pick and pull at the documentation all day long and show things that
are not perfectly clear. They are clear to you, myself and I imagine most
of the readers on this list. Generally speaking we are not the target of
the documentation and we may easily get pulled into the "good enough" when
in reality it could be so much better. I have gotten so used to our
documentation that I literally skip over unneeded words to get to the
answer I am looking for. I don't think that is the target we want to hit.

Wouldn't we want the least amount of mental energy to understand the
concept as possible for the reader? Every extra word that isn't needed,
every extra adjective, repeated term or "very unique" that exists is extra
energy spent to understand what the writer is trying to say. That mental
energy can be exhausted quickly, especially when considering dense
technical topics.



> (If I were thinking of rewriting this text, I'd probably think of
> removing the references to self-joins and covering that topic
> in a separate para.  But that's because self-joins aren't basic
> usage, not because I think the text is unreadable.)
>

That makes sense. I was just taking the direct approach of making existing
content better as an example. I would agree with your assessment if it were
to be submitted as a patch.


> > The reason I bolded and capitalized WHERE was to provide a visual signal
> to
> > the example that is on the page.
>
> IMO, typographical tricks are not something to lean on heavily.
>

Fair enough.

JD


Re: Proposed patch for key management

2020-12-31 Thread Joshua Drake
On Wed, Dec 30, 2020 at 3:47 PM Bruce Momjian  wrote:

>
> I will say that if the community feels external-only should be the only
> option, I will stop working on this feature because I feel the result
> would be too fragile to be reliable, and I would not want to be
> associated with it.
>
>
I can say that the people that I work with would prefer an "officially"
supported mechanism from Postgresql.org. The use of a generic API would be
useful for the ecosystem who wants to build on core functionality but
Postgresql should have competent built-in support as well.

JD

>
>


Re: Proposed patch for key management

2020-12-31 Thread Joshua Drake
>
>
> > >I will say that if the community feels external-only should be the only
> > >option, I will stop working on this feature because I feel the result
> > >would be too fragile to be reliable,
> >
> > I'm do not see why it would be the case. I'm just arguing to have key
> > management in a separate, possibly suid something-else, process, which
> given
> > the security concerns which dictates the feature looks like a must have,
> or
> > at least must be possible. From a line count point of view, it should be
> a
> > small addition to the current code.
>
> All of this hand-waving really isn't helping.
>
> If it's a small addition to the current code then it'd be fantastic if
> you'd propose a specific patch which adds what you're suggesting.  I
> don't think either Bruce or I would have any issue with others helping
> out on this effort, but let's be clear- we need something that *is* part
> of core PG, even if we have an ability to have other parts exist outside
> of PG.
>

+1

JD


Re: Parser Hook

2021-03-15 Thread Joshua Drake
>
>
>
> Also, I'm not sure that many extensions would really benefit from custom
> utility command, as you can already do pretty much anything you want using
> SQL
> functions.  For instance it would be nice for hypopg to be able to support
>
> CREATE HYPOTHETICAL INDEX ...
>
> rather than
>
> SELECT hypopg_create_index('CREATE INDEX...')
>
> But really the only benefit would be autocompletion, which still wouldn't
> be
> possible as psql autocompletion won't be extended.  And even if it somehow
> was,
> I wouldn't expect all psql clients to be setup as needed.
>

"technically" speaking you are correct, usability speaking you are not. We
ran into this discussion previously when dealing with replication. There is
certainly a history to calling functions to do what the grammar (from a
usability perspective) should do and that is not really a good history. It
is just what we are all used to. Looking at what you wrote above as a DBA
or even an average developer: CREATE HYPOTHETICAL INDEX makes much more
sense than the SELECT execution.

JD

P.S. I had to write HYPOTHETICAL 4 times, I kept typing HYPOTECHNICAL :/


Re: Should we document IS [NOT] OF?

2020-11-19 Thread Joshua Drake
Howdy,

Well I certainly wasn't trying to make work out of that blog but I am glad
to see it was productive.

JD

On Thu, Nov 19, 2020 at 2:43 PM Tom Lane  wrote:

> After digging a bit more I noticed that we'd discussed removing
> IS OF in the 2007 thread, but forebore because there wasn't an easy
> replacement.  pg_typeof() was added a year later (b8fab2411), so we
> could have done this at any point since then.
>
> Pushed.
>
> regards, tom lane
>
>
>


Re: Extensibility of the PostgreSQL wire protocol

2021-02-11 Thread Joshua Drake
On Wed, Feb 10, 2021 at 11:04 AM Tom Lane  wrote:

> "Jonah H. Harris"  writes:
> > On Wed, Feb 10, 2021 at 1:10 PM Tom Lane  wrote:
> >> ...  If we start having
> >> modes for MySQL identifier quoting, Oracle outer join syntax, yadda
> >> yadda, it's going to be way more of a maintenance nightmare than some
> >> hook functions.  So if we accept any patch along this line, I want to
> >> drive a hard stake in the ground that the answer to that sort of thing
> >> will be NO.
>
> > Actually, a substantial amount can be done with hooks. For Oracle, which
> is
> > substantially harder than MySQL, I have a completely separate parser that
> > generates a PG-compatible parse tree packaged up as an extension. To
> handle
> > autonomous transactions, database links, hierarchical query conversion,
> > hints, and some execution-related items requires core changes.
>
> That is a spot-on definition of where I do NOT want to end up.  Hooks
> everywhere and enormous extensions that break anytime we change anything
> in the core.  It's not really clear that anybody is going to find that
> more maintainable than a straight fork, except to the extent that it
> enables the erstwhile forkers to shove some of their work onto the PG
> community.
>
> My feeling about this is if you want to use Oracle, go use Oracle.
> Don't ask PG to take on a ton of maintenance issues so you can have
> a frankenOracle.
>

PostgreSQL over the last decade spent a considerable amount of time
allowing it to become extensible outside of core. We are now useful in
workloads nobody would have considered in 2004 or 2008.

The more extensibility we add, the LESS we maintain. It is a lot easier to
maintain an API than it is an entire kernel. When I look at all the
interesting features coming from the ecosystem, they are all built on the
hooks that this community worked so hard to create. This idea is an
extension of that and a result of the community's success.

The more extensible we make PostgreSQL, the more the hacker community can
innovate without damaging the PostgreSQL reputation as a rock solid
database system.

Features like these only enable the entire community to innovate. Is the
real issue that the more extensible PostgreSQL is, the more boring it will
become?

JD



>
> regards, tom lane
>
>
>


Re: making relfilenodes 56 bits

2022-07-28 Thread Joshua Drake
On Thu, Jul 28, 2022 at 9:52 AM Robert Haas  wrote:

> On Thu, Jul 28, 2022 at 11:59 AM Alvaro Herrera 
> wrote:
> > I do wonder why do we keep relfilenodes limited to decimal digits.  Why
> > not use hex digits?  Then we know the limit is 14 chars, as in
> > 0x00FF in the MAX_RELFILENUMBER definition.
>
> Hmm, but surely we want the error messages to be printed using the
> same format that we use for the actual filenames. We could make the
> filenames use hex characters too, but I'm not wild about changing
> user-visible details like that.
>

>From a DBA perspective this would be a regression in usability.

JD

-- 

   - Founder - https://commandprompt.com/ - 24x7x365 Postgres since 1997
   - Founder and Co-Chair - https://postgresconf.org/
   - Founder - https://postgresql.us - United States PostgreSQL
   - Public speaker, published author, postgresql expert, and people
   believer.
   - Host - More than a refresh
   : A podcast about
   data and the people who wrangle it.


Serializable wrong?

2020-06-12 Thread Joshua Drake
-Hackers,

I came across this today [1], "
3 Results

In most respects, PostgreSQL behaved as expected: both read uncommitted and
read committed prevent write skew and aborted reads. We observed no
internal consistency violations. However, we have two surprising results to
report. The first is that PostgreSQL’s “repeatable read” is weaker than
repeatable read, at least as defined by Berenson, Adya, Bailis, et al. This
is not necessarily wrong: the ANSI SQL standard is ambiguous. The second
result, which is definitely wrong, is that PostgreSQL’s “serializable”
isolation level isn’t serializable: it allows G2-item during normal
operation. "

Thanks!

JD

1. https://jepsen.io/analyses/postgresql-12.3


Re: Just for fun: Postgres 20?

2020-02-11 Thread Joshua Drake
>
>
> From: Jose Luis Tallon 
>
> >  Musing some other date-related things I stumbled upon the thought
> > that naming the upcoming release PostgreSQL 20 might be preferrable to
> > the current/expected "PostgreSQL 13".
>
> +1
>
> Users can easily know how old/new the release is that they are using.
>
>
There are multiple pros and cons to this idea. There is an argument since
we are on annual releases that 20 makes sense, and (14) would be 21 etc...
However, there is a significant problem with that. Our annual releases are
a relatively new thing and I can definitely see a situation in the future
where we move back to non-annual releases to a more conservative timeline.
Further, the jump of the number is going to be seen as a marketing ploy and
if we are going to be doing marketing ploys, then we should have the new
feature set to back it up upon release.

JD