date:20241028

Re: sunsetting md5 password support

2024-10-28 Thread Nathan Bossart

On Mon, Oct 28, 2024 at 04:10:29PM -0500, Jim Nasby wrote:
> Patch itself looks good, but it does leave me wondering if cleartext
> should also be deprecated?

I see that Tom has already chimed in on this point.  In any case, this is
probably a topic for another thread.

> Might also be worth mentioning deprecation in pg_hba.conf.

Yeah.  I vaguely recall waffling on whether to add one there, and for
whatever reason, I decided against it.  I've added it in v4.

-- 
nathan
>From 716ca7332ed3e9e7e23146c926f93cb759a0d227 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Mon, 28 Oct 2024 19:54:02 -0500
Subject: [PATCH v4 1/1] Deprecate MD5 passwords.

MD5 has been considered to be unsuitable for use as a cryptographic
hash algorithm for some time.  Furthermore, MD5 password hashes in
PostgreSQL are vulnerable to pass-the-hash attacks, i.e., knowing
the username and hashed password is sufficient to authenticate.
The SCRAM-SHA-256 method added in v10 is not subject to these
problems and is considered to be superior to MD5.

This commit marks MD5 password support in PostgreSQL as deprecated
and to be removed in a future release.  The documentation now
contains several deprecation notices, and CREATE ROLE and ALTER
ROLE now emit deprecation warnings when setting MD5 passwords.  The
warnings can be disabled by setting the md5_password_warnings
parameter to "off".

Reviewed-by: Greg Sabino Mullane, Jim Nasby
Discussion: https://postgr.es/m/ZwbfpJJol7lDWajL%40nathan
---
 .../passwordcheck/expected/passwordcheck.out  |  1 +
 .../expected/passwordcheck_1.out  |  1 +
 contrib/passwordcheck/sql/passwordcheck.sql   |  1 +
 doc/src/sgml/catalogs.sgml|  9 +++
 doc/src/sgml/client-auth.sgml | 17 +
 doc/src/sgml/config.sgml  | 24 +++
 doc/src/sgml/libpq.sgml   |  9 +++
 doc/src/sgml/protocol.sgml|  8 +++
 doc/src/sgml/ref/create_role.sgml |  8 +++
 doc/src/sgml/runtime.sgml | 10 
 src/backend/libpq/crypt.c | 10 
 src/backend/utils/misc/guc_tables.c   |  9 +++
 src/backend/utils/misc/postgresql.conf.sample |  1 +
 src/include/libpq/crypt.h |  3 +++
 src/test/regress/expected/password.out| 15 
 src/test/regress/expected/password_1.out  |  9 +++
 16 files changed, 135 insertions(+)

diff --git a/contrib/passwordcheck/expected/passwordcheck.out 
b/contrib/passwordcheck/expected/passwordcheck.out
index 2027681daf..dfb2ccfe00 100644
--- a/contrib/passwordcheck/expected/passwordcheck.out
+++ b/contrib/passwordcheck/expected/passwordcheck.out
@@ -1,3 +1,4 @@
+SET md5_password_warnings = off;
 LOAD 'passwordcheck';
 CREATE USER regress_passwordcheck_user1;
 -- ok
diff --git a/contrib/passwordcheck/expected/passwordcheck_1.out 
b/contrib/passwordcheck/expected/passwordcheck_1.out
index 5d8d5dcc1c..9519d60a49 100644
--- a/contrib/passwordcheck/expected/passwordcheck_1.out
+++ b/contrib/passwordcheck/expected/passwordcheck_1.out
@@ -1,3 +1,4 @@
+SET md5_password_warnings = off;
 LOAD 'passwordcheck';
 CREATE USER regress_passwordcheck_user1;
 -- ok
diff --git a/contrib/passwordcheck/sql/passwordcheck.sql 
b/contrib/passwordcheck/sql/passwordcheck.sql
index 1fbd6b0e96..5953ece5c2 100644
--- a/contrib/passwordcheck/sql/passwordcheck.sql
+++ b/contrib/passwordcheck/sql/passwordcheck.sql
@@ -1,3 +1,4 @@
+SET md5_password_warnings = off;
 LOAD 'passwordcheck';
 
 CREATE USER regress_passwordcheck_user1;
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 964c819a02..0b9ca087c8 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -1618,6 +1618,15 @@
will store the md5 hash of xyzzyjoe.
   
 
+  
+   
+Support for MD5-encrypted passwords is deprecated and will be removed in a
+future release of PostgreSQL.  Refer to
+ for details about migrating to another
+password type.
+   
+  
+
   
If the password is encrypted with SCRAM-SHA-256, it has the format:
 
diff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml
index 51343de7ca..782b49c85a 100644
--- a/doc/src/sgml/client-auth.sgml
+++ b/doc/src/sgml/client-auth.sgml
@@ -531,6 +531,15 @@ include_dir directory
   user's password. See 
   for details.
  
+ 
+  
+   Support for MD5-encrypted passwords is deprecated and will be
+   removed in a future release of
+   PostgreSQL.  Refer to
+for details about migrating to
+   another password type.
+  
+ 
 

 
@@ -1260,6 +1269,14 @@ omicron bryanh  guest1
server is encrypted for SCRAM (see below), then SCRAM-based
authentication will automatically be chosen instead.
   
+
+  
+   
+Support for MD5-encrypted

freespace.c modifies buffer without any locks

2024-10-28 Thread Andres Freund

Hi,

I just noticed that fsm_vacuum_page() modifies a buffer without even holding a
shared lock.  That quite obviously seems like a violation of the buffer
locking protocol:

/*
 * Try to reset the next slot pointer. This encourages the use of
 * low-numbered pages, increasing the chances that a later vacuum can
 * truncate the relation.  We don't bother with a lock here, nor with
 * marking the page dirty if it wasn't already, since this is just a 
hint.
 */
if (BufferPrepareToSetHintBits(buf))
{
((FSMPage) PageGetContents(page))->fp_next_slot = 0;
BufferFinishSetHintBits(buf);
}


In the commit (15c121b3ed7) adding the current freespace code, there wasn't
even a comment remarking upon that oddity.  10 years later Tom added a
comment, in 2b1759e2675f.


I noticed this while adding a debug mode in which buffers are mprotected
PROT_NONE/PROT_READ/PROT_READ|PROT_WRITE depending on the buffer's state.


Is there any good reason to avoid a lock here? Compared to the cost of
exclusively locking buffers during RecordAndGetPageWithFreeSpace() the cost of
doing so during FreeSpaceMapVacuum*() seems small?




Somewhat relatedly, but I don't think I understand why it's a good idea to
reset fp_next_slot to 0 in fsm_vacuum_page(). At least doing so
unconditionally.

When extending a relation, it seems we'll constantly reset the search back to
the start of the range, even though we pretty much know that there's no space
earlier in the relation - otherwise we'd not have extended.

And when called from FreeSpaceMapVacuumRange() we'll reset fp_next_slot to
somewhere that wasn't actually vacuumed, afaict?

Greetings,

Andres Freund

Re: EXPLAIN IndexOnlyScan shows disabled when enable_indexonlyscan=on

2024-10-28 Thread David G. Johnston

On Mon, Oct 28, 2024 at 6:03 PM David Rowley  wrote:

> We don't seem to be agreeing on much here... :-(
>
> On Tue, 29 Oct 2024 at 13:30, David G. Johnston
>  wrote:
> >
> > On Mon, Oct 28, 2024 at 3:54 PM David Rowley 
> wrote:
> >> I'm concerned about the wording "all index-scan related".  It's not
> >> that clear if that would include Bitmap Index Scans or not.
> >
> >
> > That was partially the point of writing "all" there - absent other
> information, and seeing how index-only scans were treated, I presumed it
> was indeed actually or effectively a switch for all.  If it is not it
> should be made clear which node types with the word index in them are not
> affected.
>
> I'm very much against mentioning which things are *not* affected by
> settings. It doesn't seem like a very sustainable way to write
> documentation.
>

The documentation presently uses the term "index-scan related" and it is
unclear what exactly that is supposed to cover.  My addition of the word
"all" doesn't materially change this other than for certain covering the
"index-only-scan related" nodes that gets clarified and is
cross-referenced.  If you are uncertain whether adding "all" is meant to
cover Bitmap Index Scans then your uncertainty still exists in the current
wording.  I just added "all" to be explicit about that fact, or at least
that is what I thought I did.

For me, the answer to "are bitmap index scans disabled" by setting
enable_indexscans to off is "yes" and does not require explanation.  If the
real answer is "no" then please propose a change that can disabuse me of my
belief.

> > Is there a listing of all node types produced by PostgreSQL (with the
> explain output naming) along with which ones are affected by which enable_*
> knobs (possibly multiple for something like Bitmap Index Scan)?
>
> No. We purposefully do our best not to document executor nodes. The
> enable_* GUCs is one place where it's hard to avoid.
>

For education, mainly mine, not to add to the documentation; though our
lack of detail here for what are user-facing things is IMO unfortunate.

> >>
> >> Could we just add "The  setting
> >> must also be enabled to have the query planner consider
> >> index-only-scans"?
> >
> >
> > I'd like to stick with a conjunction there but agree the "must be
> enabled" wording is preferrable, avoiding the double-negative.
> >
> > "The default is on, but the  setting must also be enabled."
> >
> > The 'to have the...' part seems to just be redundant.
>
> I think it's confusing to include this as part of the mention of what
> the default value is. The default value and enable_indexscans being
> the master switch aren't at all related.
>
>
Fair point.  I'm good with your proposed change here.

David J.

Re: detoast datum into the given buffer as a optimization.

2024-10-28 Thread Andy Fan

Hi Tom,

> Andy Fan  writes:
>>  *   Note if caller provides a non-NULL buffer, it is the duty of caller
>>  * to make sure it has enough room for the detoasted format (Usually
>>  * they can use toast_raw_datum_size to get the size)
>
> This is a pretty awful, unsafe API design.

Sorry that I expressed my thoughts incorrectly. I'm not going to
refactor "detoast_attr", I'm going to add a new API named
"detoast_attr_buffer", which very similar with text_to_cstring and
text_to_string_buffer. Most of the user still can using detoast_attr,
only the user who care about the MemoryContext or memcpy issue, they can
the deotast_attr_buffer variant. 

> It puts it on the caller
> to know how to get the detoasted length, and it implies double
> decoding of the toast datum.

I really nearly give up this idea today because of this, later I
realized something which was known when I was writing the first message
is forgotten by me now. Since it is really confused, I want to highlight it
here for double check from more people.

I thought 'toast_raw_datum_size' is the existing function to get the
"detoasted length" for the caller of detoast_attr_buffer, so it looks
reasonable for me to assume the caller knows how to get the detoast
length. 

What really confused me suddenly is it is really correct to use
"toast_raw_datum_size". In "toast_raw_datum_size":

if (VARATT_IS_EXTERNAL_ONDISK(attr))
{
/* va_rawsize is the size of the original datum -- including 
header */
struct varatt_external toast_pointer;

VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
result = toast_pointer.va_rawsize;
}

  We just return the va_rawsize directly. Does it work for a datum which
  is compressed first and then stored on external ondisk? After some
  more research, it is correct since the rawsize is the size of
  uncompressed data. and we also use the fact in
  1b393f4e5db4fd6bbc86a4e88785b6945a1541d0. This is why I said:

 > One of the key point is we can always get the varlena rawsize cheaply
 > without any real detoast activity in advance, thanks to the existing
 > varlena design.

 This is the thing I forget today.

Since user can know the size easily now, is it cheap to use it? By
looking the code in toast_raw_datum_size, I really hard to say it is
expensive. 

>
> How about a variant like
>
> struct varlena *
> detoast_attr_cxt(struct varlena *attr, MemoryContext cxt)
>
> which promises to allocate the result in the specified context?
> That would cover most of the practical use-cases, I think.

Yes, it work for some use case, and it is similar with what I did in
[1] (search detoast_attr_ext).  However it can't support the case where
user want to detoast the data into the given buffer (to avoid the later
memcpy), so detoast_attr_buffer is the favorite API right now. If it is
not doable, detoast_attr_ctx is also works for me.

Would you still think detoast_attr_buffer is not a acceptable API now at
the high level design. I'm working on an implementation, but I want have
some high-level designment agreement first. 

[1]
https://www.postgresql.org/message-id/attachment/160491/v10-0001-shared-detoast-datum.patch

-- 
Best Regards
Andy Fan

Re: ActiveState Perl is not valid anymore to build PG17 on the Windows 10/11 platforms, So Documentation still suggesting it should be updated

2024-10-28 Thread Michael Paquier

On Mon, Oct 28, 2024 at 01:07:16PM +0100, Daniel Gustafsson wrote:
> +1 for applying backpatched to at least 17 but possibly further down judging 
> by
> the linked threads.

When using ActiveState perl, being able to call `perl` from a PATH
requires one to register into a central service related to the company
that provides these binaries.  Still recommending it even on stable
branches makes me really uneasy.
--
Michael


signature.asc
Description: PGP signature

Re: Why don't we consider explicit Incremental Sort?

2024-10-28 Thread Andrei Lepikhov


On 10/10/24 09:18, Richard Guo wrote:

On Sun, Sep 22, 2024 at 1:38 PM David Rowley  wrote:
I've pushed this patch after tweaking this part in the commit message.
Thank you both for your reviews.

My apologies for the late review, but IMO there are some minor weaknesses.
Working on the improvement of the cost_sort model [1], adding into the 
model the  number of columns to be sorted and the number of comparisons 
to be made  (through the stadistinct of the first column), I found out 
that one test doesn't like IncrementalSort in aggregates.sql:


'Utilise the ordering of merge join to avoid a Sort operation'

After discovering a little bit, I realised that although the idea that 
the IncrementalSort is better than the plain Sort is generally correct, 
the assumption that it is better all the time seems wrong.
For example, IncrementalSort, following an index, can choose a sort 
order with a low level of distinct values of the first columns, causing 
more comparisons - remember, we still sort tuples inside a group; or 
using an index on multiple columns having a need in just the first 
column and additional Sort on many others from the heap, etc.


Of course, I provide highly skewed cases and Sort + SeqScan basically 
will resolve the issue. But I think you may discover both possible sort 
ways; don't heuristically give a chance only to IncrementalSort. At 
least, IndexScan can beat SeqScan because of the index clause selectivity.


[1] 
https://www.postgresql.org/message-id/8742aaa8-9519-4a1f-91bd-364aec65f5cf%40gmail.com


--
regards, Andrei Lepikhov

Re: protocol-level wait-for-LSN

2024-10-28 Thread Tatsuo Ishii

> The patch adds a protocol extension called _pq_.wait_for_lsn as well
> as a libpq connection option wait_for_lsn to activate the same.  (Use
> e.g., psql -d 'wait_for_lsn=1'.)
> 
> With this protocol extension, two things are changed:
> 
> - The ReadyForQuery message sends back the current LSN.

If other protocol extension X tries to add something to the
ReadyForQuery message too, what would happen?
Currently ReadyForQuery message is like this:

Byte1('Z')
Int32
Byte1

With the wait_for_lsn extension, It becomes:

Byte1('Z')
Int32
Byte1
String

Suppose the X extension wants to extend like this:

Byte1('Z')
Int32
Byte1
Int32

It seems impossible to coexist both.

Does this mean once the wait_for_lsn extension is brought into the
frontend/backend protocol specification, no other extensions that touch
ReadyForQuery cannot be defined?

Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: EXPLAIN IndexOnlyScan shows disabled when enable_indexonlyscan=on

2024-10-28 Thread David Rowley

On Wed, 23 Oct 2024 at 13:51, David G. Johnston
wrote:
> Went with a slightly different wording that seems to flow better with the
> xrefs I added between the two options.

-Enables or disables the query planner's use of index-scan plan
-types. The default is on.
+Enables or disables the query planner's use of all index-scan
related plan

I'm concerned about the wording "all index-scan related". It's not
that clear if that would include Bitmap Index Scans or not. I think
it's better to explicitly mention index-only-scans to make it clear
which nodes are affected.

+types. The default is on. The
index-only-scan plan types
+can be independently disabled by setting
+to off.

I wondered if it's better to reference the enable_indexonlyscan GUC
here rather than document what enable_indexonlyscan does from the
enable_indexscan docs. Maybe just a "Also see enable_indexonlyscans."
could be added?

-The default is on.
+The default is on. However, this setting
has no effect if
+ is set to
off.

Could we just add "The setting
must also be enabled to have the query planner consider
index-only-scans"?

I've attached that in patch form.

David
Title: 19.7. Query Planning

19.7. Query PlanningPrev UpChapter 19. Server ConfigurationHome Next19.7. Query Planning #19.7.1. Planner Method Configuration19.7.2. Planner Cost Constants19.7.3. Genetic Query Optimizer19.7.4. Other Planner Options19.7.1. Planner Method Configuration #
These configuration parameters provide a crude method of
influencing the query plans chosen by the query optimizer. If
the default plan chosen by the optimizer for a particular query
is not optimal, a temporary solution is to use one
of these configuration parameters to force the optimizer to
choose a different plan.
Better ways to improve the quality of the
plans chosen by the optimizer include adjusting the planner cost
constants (see Section 19.7.2),
running ANALYZE manually, increasing
the value of the default_statistics_target configuration parameter,
and increasing the amount of statistics collected for
specific columns using ALTER TABLE SET
STATISTICS.
enable_async_append (boolean)

#
Enables or disables the query planner's use of async-aware
append plan types. The default is on.
enable_bitmapscan (boolean)

#
Enables or disables the query planner's use of bitmap-scan plan
types. The default is on.
enable_gathermerge (boolean)

#
Enables or disables the query planner's use of gather
merge plan types. The default is on.
enable_group_by_reordering (boolean)

#
Controls if the query planner will produce a plan which will provide
GROUP BY keys sorted in the order of keys of
a child node of the plan, such as an index scan. When disabled, the
query planner will produce a plan with GROUP BY
keys only sorted to match the ORDER BY clause,
if any. When enabled, the planner will try to produce a more
efficient plan. The default value is on.
enable_hashagg (boolean)

#
Enables or disables the query planner's use of hashed
aggregation plan types. The default is on.
enable_hashjoin (boolean)

#
Enables or disables the query planner's use of hash-join plan
types. The default is on.
enable_incremental_sort (boolean)

#
Enables or disables the query planner's use of incremental sort steps.
The default is on.
enable_indexscan (boolean)

#
Enables or disables the query planner's use of index-scan and
index-only-scan plan types. The default is on.
Also see enable_indexonlyscan.
enable_indexonlyscan (boolean)

#
Enables or disables the query planner's use of index-only-scan plan
types (see Section 11.9).
The default is on. The
enable_indexscan setting must also be
enabled to have the query planner consider index-only-scans.
enable_material (boolean)

#
Enables or disables the query planner's use of materialization.
It is impossible to suppress materialization entirely,
but turning this variable off prevents the planner from inserting
materialize nodes except in cases where it is required for correctness.
The default is on.
enable_memoize (boolean)

#
Enables or disables the query planner's use of memoize plans for
caching results from parameterized scans inside nested-loop joins.
This plan type allows scans to the underlying plans to be skipped when
the results for the current parameters are already in the cache. Less
comm

1 2 >

1 - 100 of 117 matches

Mail list logo