date:20240920

Re: Clock-skew management in logical replication

2024-09-20 Thread shihao zhong

Nisha Moond  writes:
> Thoughts? Looking forward to hearing others' opinions!

Had a productive conversation with Amit Kaplia today about time skew
in distributed systems, and wanted to share some thoughts.
Essentially, we're grappling with the classic distributed snapshot
problem. In a multi-active environment, where multiple nodes can
independently process transactions,  it becomes crucial to determine
the visibility of these transactions across the system.  Time skew,
where different machines have different timestamps make it a hard
problem. How can we ensure consistent transaction ordering and
visibility when time itself is unreliable?

As you mentioned, there are several ways to tackle the time skew
problem in distributed systems. These approaches generally fall into
three main categories:

1. Centralized Timestamps (Timestamp Oracle)

Mechanism: A dedicated server acts as a single source of truth for
time, eliminating skew by providing timestamps to all nodes. Google
Percolator and TiDB use this approach.
Consistency level: Serializable
Pros: Simple to implement.
Cons: High latency for cross-geo transactions due to reliance on a
central server. Can become a bottleneck.

2. Atomic Clocks (True Time)

Mechanism: Utilizes highly accurate atomic clocks to provide a
globally consistent view of time, as seen in Google Spanner.
Consistency level: External Serializable
Pros: Very high consistency level (externally consistent).
Cons: Requires specialized and expensive hardware. Adds some latency
to transactions, though less than centralized timestamps.

3. Hybrid Logical Clocks

Mechanism: CombinesNTP for rough time synchronization with logical
clocks for finer-grained ordering. Yugabyte and CockroachDB employ
this strategy.
Consistency level: Serializable
Pros: Avoids the need for specialized hardware.
Cons: Can introduce significant latency to transactions.

4 Local Clocks

Mechanism: Just use logical clock
Consistency level: Eventual Consistency
Pros: Simple implementation
Cons: The consistency level is very low

Of the four implementations considered, only local clocks and the HLC
approach offer a 'pure database' solution. Given PostgreSQL's
practical use cases, I recommend starting with a local clock
implementation. However, recognizing the increasing prevalence of
distributed clock services, we should also implement a pluggable time
access method. This allows users to integrate with different time
services as needed.

In the mid-term, implementing the HLC approach would provide highly
consistent snapshot reads. This offers a significant advantage for
many use cases.

Long-term, we should consider integrating with a distributed time
service like AWS Time Sync Service. This ensures high accuracy and
scalability for demanding applications.

Thanks,
Shihao

Re: pg_checksums: Reorder headers in alphabetical order

2024-09-20 Thread Fujii Masao





On 2024/09/21 12:09, Tom Lane wrote:

Fujii Masao  writes:

I don’t have any objections to this commit, but I’d like to confirm
whether we really want to proactively reorder #include directives,
even for standard C library headers.


I'm hesitant to do that.  We can afford to insist that our own header
files be inclusion-order-independent, because we have the ability to
fix any problems that might arise.  We have no ability to do something
about it if the system headers on $random_platform have inclusion
order dependencies.  (In fact, I'm fairly sure there are already
places in plperl and plpython where we know we have to be careful
about inclusion order around those languages' headers.)

So I would tread pretty carefully around making changes of this
type, especially in long-established code.  I have no reason to
think that the committed patch will cause any problems, but
I do think it's mostly asking for trouble.


Sounds reasonable to me.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: pg_checksums: Reorder headers in alphabetical order

2024-09-20 Thread Tom Lane

Fujii Masao  writes:
> I don’t have any objections to this commit, but I’d like to confirm
> whether we really want to proactively reorder #include directives,
> even for standard C library headers.

I'm hesitant to do that.  We can afford to insist that our own header
files be inclusion-order-independent, because we have the ability to
fix any problems that might arise.  We have no ability to do something
about it if the system headers on $random_platform have inclusion
order dependencies.  (In fact, I'm fairly sure there are already
places in plperl and plpython where we know we have to be careful
about inclusion order around those languages' headers.)

So I would tread pretty carefully around making changes of this
type, especially in long-established code.  I have no reason to
think that the committed patch will cause any problems, but
I do think it's mostly asking for trouble.

regards, tom lane

Re: pg_checksums: Reorder headers in alphabetical order

2024-09-20 Thread Fujii Masao





On 2024/09/21 5:20, Nathan Bossart wrote:

On Fri, Sep 20, 2024 at 01:56:16PM -0500, Nathan Bossart wrote:

On Fri, Sep 20, 2024 at 07:20:15PM +0200, Michael Banck wrote:

I noticed two headers are not in alphabetical order in pg_checkums.c,
patch attached.


This appears to be commit 280e5f1's fault.  Will fix.


Committed, thanks!


I don’t have any objections to this commit, but I’d like to confirm
whether we really want to proactively reorder #include directives,
even for standard C library headers. I’m asking because I know there are
several source files, like xlog.c and syslogger.c, where such #include
directives aren't in alphabetical order. I understand we usually reorder
#include directives for PostgreSQL header files, though.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: meson and check-tests

2024-09-20 Thread jian he

On Fri, Sep 20, 2024 at 6:25 PM Nazir Bilal Yavuz  wrote:
>
> Hi,
>
> I’ve been working on a patch and wanted to share my approach, which
> might be helpful for others. The patch removes the '--schedule' and
> '${schedule_file}' options from the regress/regress test command when
> the TESTS environment variable is set. Instead, it appends the TESTS
> variable to the end of the command.
>
> Please note that setup suite tests (at least tmp_install and
> initdb_cache) should be executed before running these tests. One
> drawback is that while the Meson logs will still show the test command
> with the '--schedule' and '${schedule_file}' options, the actual
> command used will be changed.
>
> Some examples after the patched build:
>
> $ meson test --suite regress -> fails
> $ TESTS="create_table copy jsonb" meson test --suite regress -> fails
> ### run required setup suite tests
> $ meson test tmp_install
> $ meson test initdb_cache
> ###
> $ meson test --suite regress -> passes (12s)
> $ TESTS="copy" meson test --suite regress -> passes (0.35s)
> $ TESTS="copy jsonb" meson test --suite regress -> passes (0.52s)
> $ TESTS='select_into' meson test --suite regress -> fails
> $ TESTS='test_setup select_into' meson test --suite regress -> passes (0.52s)
> $ TESTS='rangetypes multirangetypes' meson test --suite regress -> fails
> $ TESTS='test_setup multirangetypes rangetypes' meson test --suite
> regres -> fails
> $ TESTS='test_setup rangetypes multirangetypes' meson test --suite
> regress -> passes (0.91s)
>
> Any feedback would be appreciated.
>

hi. Thanks for your work!
I do find some issues.

TESTS="copy jsonb jsonb" meson test --suite regress
one will fail. not sure this is expected?

in [1] you mentioned "setup", but that "setup" is more or less like
"meson test  --suite setup --suite regress"
but originally, I thought was about "src/test/regress/sql/test_setup.sql".
for example, now you cannot run src/test/regress/sql/stats_ext.sql
without first running test_setup.sql, because some functions (like fipshash)
live in test_setup.sql.

so
TESTS="copy jsonb stats_ext" meson test --suite regress
will fail.

to make it work we need change it to
TESTS="test_setup copy jsonb stats_ext" meson test --suite regress

Many tests depend on test_setup.sql, maybe we can implicitly prepend it.
Another dependency issue. alter_table depending on create_index.

TESTS="test_setup alter_table" meson test --suite regress
will fail.
TESTS="test_setup create_index alter_table" meson test --suite regress
will work.

[1] 
https://www.postgresql.org/message-id/CAN55FZ3t%2BeDgKtsDoyi0UYwzbMkKDfqJgvsbamar9CvY_6qWPw%40mail.gmail.com

Re: meson and check-tests

2024-09-20 Thread Nazir Bilal Yavuz

Hi,

I’ve been working on a patch and wanted to share my approach, which
might be helpful for others. The patch removes the '--schedule' and
'${schedule_file}' options from the regress/regress test command when
the TESTS environment variable is set. Instead, it appends the TESTS
variable to the end of the command.

Please note that setup suite tests (at least tmp_install and
initdb_cache) should be executed before running these tests. One
drawback is that while the Meson logs will still show the test command
with the '--schedule' and '${schedule_file}' options, the actual
command used will be changed.

Some examples after the patched build:

$ meson test --suite regress -> fails
$ TESTS="create_table copy jsonb" meson test --suite regress -> fails
### run required setup suite tests
$ meson test tmp_install
$ meson test initdb_cache
###
$ meson test --suite regress -> passes (12s)
$ TESTS="copy" meson test --suite regress -> passes (0.35s)
$ TESTS="copy jsonb" meson test --suite regress -> passes (0.52s)
$ TESTS='select_into' meson test --suite regress -> fails
$ TESTS='test_setup select_into' meson test --suite regress -> passes (0.52s)
$ TESTS='rangetypes multirangetypes' meson test --suite regress -> fails
$ TESTS='test_setup multirangetypes rangetypes' meson test --suite
regres -> fails
$ TESTS='test_setup rangetypes multirangetypes' meson test --suite
regress -> passes (0.91s)

Any feedback would be appreciated.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft
From 7c94889b553ffc294ddf9eba7c595ea629d24e91 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz 
Date: Fri, 20 Sep 2024 11:39:20 +0300
Subject: [PATCH v1] Add 'make check-tests' approach to the meson based builds

---
 src/tools/testwrap | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/src/tools/testwrap b/src/tools/testwrap
index 9a270beb72d..9180727b6ff 100755
--- a/src/tools/testwrap
+++ b/src/tools/testwrap
@@ -41,6 +41,17 @@ env_dict = {**os.environ,
 'TESTDATADIR': os.path.join(testdir, 'data'),
 'TESTLOGDIR': os.path.join(testdir, 'log')}
 
+# Symmetric behaviour with make check-tests. If TESTS environment variable is
+# set, only run these regression tests in regress/regress test. Note that setup
+# suite tests (at least tmp_install and initdb_cache tests) need to be run
+# before running these tests.
+if "TESTS" in env_dict and args.testgroup == 'regress' and args.testname == 'regress':
+elem = '--schedule'
+schedule_index = args.test_command.index(elem) if elem in args.test_command else -1
+if schedule_index >= 0:
+del args.test_command[schedule_index : schedule_index + 2]
+args.test_command.extend(env_dict["TESTS"].split(' '))
+
 sp = subprocess.Popen(args.test_command, env=env_dict, stdout=subprocess.PIPE)
 # Meson categorizes a passing TODO test point as bad
 # (https://github.com/mesonbuild/meson/issues/13183).  Remove the TODO
-- 
2.45.2

Clock-skew management in logical replication

2024-09-20 Thread Nisha Moond

Hello Hackers,
(CC people involved in the earlier discussion)

While considering the implementation of timestamp-based conflict
resolution (last_update_wins) in logical replication (see [1]), there
was a feedback at [2] and the discussion on whether or not to manage
clock-skew at database level. We tried to research the history of
clock-skew related discussions in Postgres itself and summarized that
at [3].

We also analyzed how other databases deal with it. Based on our
research, the other classic RDBMS like Oracle and IBM, using similar
timestamp-based resolution methods, do not address clock-skew at the
database level. Instead, they recommend using external time
synchronization solutions, such as NTP.

- Oracle while handling conflicts[2] assumes clocks are synchronized
and relies on external tools like NTP for time synchronization between
nodes[4].
- IBM Informix, similarly, recommends using their network commands to
ensure clock synchronization across nodes[5].

Other postgres dependent databases like EDB-BDR and YugabyteDB provide
GUC parameters to manage clock-skew within the database:

- EDB-BDR allows configuration of parameters like
bdr.maximum_clock_skew and bdr.maximum_clock_skew_action to define
acceptable skew and actions when it exceeds[6].
- YugabyteDB offers a GUC max_clock_skew_usec setting, which causes
the node to crash if the clock-skew exceeds the specified value[7].

There are, of course, other approaches to managing clock-skew used by
distributed systems, such as NTP daemons, centralized logical clocks,
atomic clocks (as in Google Spanner), and time sync services like
AWS[4].

Implementing any of these time-sync services for CDR seems quite a bit
of deviation and a big project in itself, which we are not sure is
really needed. At best, for users' aid, we should provide some GUCs
based implementation to handle clock-skew in logical replication. The
idea is that users should be able to handle clock-skew outside of the
database. But in worst case scenarios, users can rely on these GUCs.

We have attempted to implement a patch which manages clock-skew in
logical replication. It works based on these new GUCs: (see [10] for
detailed discussion)

- max_logical_rep_clock_skew: Defines the tolerable limit for clock-skew.
- max_logical_rep_clock_skew_action: Configures the action when
clock-skew exceeds the limit.
- max_logical_rep_clock_skew_wait: Limits the maximum wait time if the
action is configured as "wait."

The proposed idea is implemented in attached patch v1. Thank you
Shveta for implementing it.
Thanks Kuroda-san for assisting in the research.

Thoughts? Looking forward to hearing others' opinions!

[1]:
https://www.postgresql.org/message-id/CAJpy0uD0-DpYVMtsxK5R%3DzszXauZBayQMAYET9sWr_w0CNWXxQ%40mail.gmail.com
[2]:
https://www.postgresql.org/message-id/CAFiTN-uTycjZWdp1kEpN9w7b7SQpoGL5zyg_qZzjpY_vr2%2BKsg%40mail.gmail.com
[3]:
https://www.postgresql.org/message-id/CAA4eK1Jn4r-y%2BbkW%3DJaKCbxEz%3DjawzQAS1Z4wAd8jT%2B1B0RL2w%40mail.gmail.com
[4]:
https://www.oracle.com/cn/a/tech/docs/technical-resources/wp-oracle-goldengate-activeactive-final2-1.pdf
[5]:
https://docs.oracle.com/en/operating-systems/oracle-linux/8/network/network-ConfiguringNetworkTime.html
[6]:
https://www.ibm.com/docs/en/informix-servers/14.10?topic=environment-time-synchronization
[7]:
https://www.enterprisedb.com/docs/pgd/latest/reference/pgd-settings/#bdrmaximum_clock_skew
[8]:
https://support.yugabyte.com/hc/en-us/articles/4403707404173-Too-big-clock-skew-leading-to-error-messages-or-tserver-crashes
[9]:
https://aws.amazon.com/about-aws/whats-new/2023/11/amazon-time-sync-service-microsecond-accurate-time/
[10]:
https://www.postgresql.org/message-id/CAJpy0uDCW%2BvrBoUZWrBWPjsM%3D9wwpwbpZuZa8Raj3VqeVYs3PQ%40mail.gmail.com

74 matches

Mail list logo