On Wed, Jul 23, 2025, at 04:44, Thomas Munro wrote: > On Wed, Jul 23, 2025 at 1:39 PM Joel Jacobson <j...@compiler.org> wrote: >> In their patch, in asyn.c's SignalBackends(), they do >> SendInterrupt(INTERRUPT_ASYNC_NOTIFY, procno) instead of >> SendProcSignal(pid, PROCSIG_NOTIFY_INTERRUPT, procnos[i]). They don't >> seem to check if the backend is already signalled or not, but maybe >> SendInterrupt() has signal coalescing built-in so it would be a noop >> with almost no cost? > > Yeah: > > + old_pending = pg_atomic_fetch_or_u32(&proc->pendingInterrupts, > interruptMask); > + > + /* > + * If the process is currently blocked waiting for an interrupt to arrive, > + * and the interrupt wasn't already pending, wake it up. > + */ > + if ((old_pending & (interruptMask | SLEEPING_ON_INTERRUPTS)) == > SLEEPING_ON_INTERRUPTS) > + WakeupOtherProc(proc);
Thanks for confirming the coalescing logic in SendInterrupt. That's a great low-level optimization. It's clear we're both targeting the same problem of redundant wake-ups under contention, but approaching it from different architectural levels. The core difference, as I see it, is *where* the state management resides. The "Interrupts vs signals" patch set creates a unified machinery where the 'pending' state for all subsystems is combined into a single atomic bitmask. This is a valid approach. However, I've been exploring an alternative pattern that decouples the state management from the signaling machinery, allowing each subsystem to manage its own state independently. I believe this leads to a simpler, more modular migration path. I've developed a two-patch series for `async.c` to demonstrate this concept. 1. The first patch introduces a lock-free, atomic finite state machine (FSM) entirely within async.c. By using a subsystem-specific atomic integer and CAS operations, async.c can now robustly manage its own listener states (IDLE, SIGNALLED, PROCESSING). This solves the redundant signal problem at the source, as notifiers can now observe a listener's state and refrain from sending a wakeup if one is already pending. 2. The second patch demonstrates that once state is managed locally, the wakeup mechanism becomes trivial.** The expensive `SendProcSignal` call is replaced with a direct `SetLatch`. This leverages the existing, highly-optimized `WaitEventSet` infrastructure as a simple, efficient "poke." This suggests a powerful, incremental migration pattern: first, fix a subsystem's state management internally; second, replace its wakeup mechanism. This vertical, module-by-module approach seems complementary to the horizontal, layer-by-layer refactoring in the "Interrupts vs signals" thread. I'll post a more detailed follow-up in that thread to discuss the broader architectural implications. Attached are the two patches, reframed to better illustrate this two-step pattern. /Joel
#!/bin/bash # Configuration for the PostgreSQL instances using absolute paths. # This script does NOT modify the shell's PATH variable. # --- Master Config --- MASTER_NAME="master" MASTER_PORT=5432 MASTER_BIN_PATH="$HOME/pg-master/bin" MASTER_DATA="$HOME/pg-master-data" MASTER_LOG="/tmp/pg-master.log" # --- Patch v1 Config --- PATCH_NAME="patch-v1" PATCH_PORT=5432 PATCH_BIN_PATH="$HOME/pg-patch-v1/bin" PATCH_DATA="$HOME/pg-patch-v1-data" PATCH_LOG="/tmp/pg-patch-v1.log" # --- Patch v2 Config --- PATCH_V2_NAME="patch-v2" PATCH_V2_PORT=5432 PATCH_V2_BIN_PATH="$HOME/pg-patch-v2/bin" PATCH_V2_DATA="$HOME/pg-patch-v2-data" PATCH_V2_LOG="/tmp/pg-patch-v2.log" # Benchmark settings CHANNEL_NAME="mychannel" CONNECTIONS=(64 128) DURATION=10 # Benchmark duration in seconds for each run MEASUREMENTS=3 # Number of measurements per configuration # CSV output file CSV_OUTPUT="benchmark_results.csv" # Temporary files PGBENCH_SCRIPT=$(mktemp) # --- Cleanup Function --- # Ensures that servers are stopped and temp files are removed on script exit. cleanup() { echo "" echo "Cleaning up..." # Ensure all servers are stopped, silencing errors if they are not running. # Use absolute paths and explicit data directories. "$MASTER_BIN_PATH/pg_ctl" -D "$MASTER_DATA" -m fast stop &> /dev/null "$PATCH_BIN_PATH/pg_ctl" -D "$PATCH_DATA" -m fast stop &> /dev/null "$PATCH_V2_BIN_PATH/pg_ctl" -D "$PATCH_V2_DATA" -m fast stop &> /dev/null rm -f "$PGBENCH_SCRIPT" echo "Cleanup complete." } # Trap the script's exit (normal or interrupted) to run the cleanup function trap cleanup EXIT # Initialize CSV file with headers # echo "version,connections,jobs,tps,run" > "$CSV_OUTPUT" # --- Benchmark Function --- # A generic function to run the benchmark for a given configuration. # It starts, benchmarks, and then stops the specified server instance. run_benchmark() { local name=$1 local port=$2 local bin_path=$3 local data_path=$4 local log_file=$5 echo "--- Starting benchmark for: $name ---" # Set PGPORT for client tools (pgbench, psql) for this run export PGPORT=$port # 1. Start the server using absolute path and explicit data directory echo "Starting $name server on port $port..." "$bin_path/pg_ctl" -D "$data_path" -l "$log_file" -o "-p $port" start sleep 2 # Give server a moment to become available # Create the pgbench script content cat > "$PGBENCH_SCRIPT" << EOF NOTIFY ${CHANNEL_NAME}; EOF # 2. Start the listener in the background for this server (echo "LISTEN ${CHANNEL_NAME};"; sleep 100000) | "$bin_path/psql" -d postgres &> /dev/null & local listener_pid=$! # 3. Run the benchmark loop echo "Running pgbench for connection counts: ${CONNECTIONS[*]}" for c in "${CONNECTIONS[@]}"; do echo " Testing with $c connections ($MEASUREMENTS measurements per run)..." # Run multiple measurements for each connection count for m in $(seq 1 $MEASUREMENTS); do # Run pgbench and extract TPS value tps=$("$bin_path/pgbench" -d postgres -f "$PGBENCH_SCRIPT" -c "$c" -j "$c" -T "$DURATION" -n \ | grep -E '^tps' \ | awk '{printf "%.0f", $3}') # Write to CSV: version,connections,jobs,tps,run echo "$name,$c,$c,$tps,$m" >> "$CSV_OUTPUT" done done # 4. Stop the listener and the server kill "$listener_pid" &> /dev/null echo "Stopping $name server..." "$bin_path/pg_ctl" -D "$data_path" -m fast stop &> /dev/null echo "--- Benchmark for $name complete ---" echo "" } # --- Main Execution --- # 1. Run benchmark for master # run_benchmark "$MASTER_NAME" "$MASTER_PORT" "$MASTER_BIN_PATH" "$MASTER_DATA" "$MASTER_LOG" # 2. Run benchmark for patch-v1 # run_benchmark "$PATCH_NAME" "$PATCH_PORT" "$PATCH_BIN_PATH" "$PATCH_DATA" "$PATCH_LOG" # 3. Run benchmark for patch-v2 run_benchmark "$PATCH_V2_NAME" "$PATCH_V2_PORT" "$PATCH_V2_BIN_PATH" "$PATCH_V2_DATA" "$PATCH_V2_LOG" # 4. Generate report using PostgreSQL echo "" echo "# BENCHMARK" echo "" echo "## TPS" # Start the master server to run the analysis export PGPORT=$MASTER_PORT "$MASTER_BIN_PATH/pg_ctl" -D "$MASTER_DATA" -l "$MASTER_LOG" -o "-p $MASTER_PORT" start &> /dev/null sleep 2 # Create analysis database and load data "$MASTER_BIN_PATH/psql" -d postgres -q << EOF -- Create a temporary database for analysis DROP DATABASE IF EXISTS bench_analysis; CREATE DATABASE bench_analysis; \c bench_analysis -- Create table for benchmark results CREATE TABLE benchmark_results ( version TEXT, connections INT, jobs INT, tps NUMERIC, run INT ); -- Load CSV data \COPY benchmark_results FROM '$CSV_OUTPUT' CSV HEADER -- Generate comparison report WITH avg_results AS ( SELECT version, connections, AVG(tps) AS avg_tps, STDDEV(tps) AS stddev_tps, COUNT(*) AS runs FROM benchmark_results GROUP BY version, connections ), comparison AS ( SELECT m.connections, m.avg_tps AS master_tps, p1.avg_tps AS patch_v1_tps, p2.avg_tps AS patch_v2_tps, CASE WHEN m.avg_tps > 0 THEN ((p1.avg_tps - m.avg_tps) / m.avg_tps * 100) ELSE 0 END AS relative_diff_patch_v1_pct, CASE WHEN m.avg_tps > 0 THEN ((p2.avg_tps - m.avg_tps) / m.avg_tps * 100) ELSE 0 END AS relative_diff_patch_v2_pct FROM avg_results m JOIN avg_results p1 ON m.connections = p1.connections JOIN avg_results p2 ON m.connections = p2.connections WHERE m.version = 'master' AND p1.version = 'patch-v1' AND p2.version = 'patch-v2' ORDER BY m.connections ) SELECT connections AS "N backends", ROUND(master_tps) AS "master", ROUND(patch_v1_tps) AS "patch-v1", ROUND(patch_v2_tps) AS "patch-v2" FROM comparison ORDER BY connections; EOF echo "" echo "## TPS speed-up vs master" "$MASTER_BIN_PATH/psql" -d bench_analysis -q << EOF SELECT connections AS "N backends", CASE WHEN relative_diff_patch_v1_pct >= 0 THEN '+' ELSE '' END || ROUND(relative_diff_patch_v1_pct) || '%' AS "patch-v1", CASE WHEN relative_diff_patch_v2_pct >= 0 THEN '+' ELSE '' END || ROUND(relative_diff_patch_v2_pct) || '%' AS "patch-v2" FROM ( WITH avg_results AS ( SELECT version, connections, AVG(tps) AS avg_tps FROM benchmark_results GROUP BY version, connections ) SELECT m.connections, CASE WHEN m.avg_tps > 0 THEN ((p1.avg_tps - m.avg_tps) / m.avg_tps * 100) ELSE 0 END AS relative_diff_patch_v1_pct, CASE WHEN m.avg_tps > 0 THEN ((p2.avg_tps - m.avg_tps) / m.avg_tps * 100) ELSE 0 END AS relative_diff_patch_v2_pct FROM avg_results m JOIN avg_results p1 ON m.connections = p1.connections JOIN avg_results p2 ON m.connections = p2.connections WHERE m.version = 'master' AND p1.version = 'patch-v1' AND p2.version = 'patch-v2' ) AS comparison ORDER BY connections; EOF # Stop the server "$MASTER_BIN_PATH/pg_ctl" -D "$MASTER_DATA" -m fast stop &> /dev/null echo "" echo "CSV results saved to: $CSV_OUTPUT"
# BENCHMARK A single backend does `LISTEN mychannel;` and stays idle, then pgbench is run 3 times for each <N backends>. script.sql: NOTIFY mychannel; % pgbench" -f script.sql -c <N backends> -j <N backends> -T 10 -n ## TPS N backends | master | patch-v1 | patch-v2 ------------+--------+----------+---------- 1 | 117343 | 151422 | 150735 2 | 158427 | 236705 | 239004 4 | 177454 | 250783 | 250782 8 | 116521 | 155466 | 180418 16 | 45627 | 144740 | 163491 32 | 37281 | 135602 | 146659 64 | 36608 | 123870 | 131202 128 | 34798 | 120302 | 119041 (8 rows) ## TPS speed-up vs master N backends | patch-v1 | patch-v2 ------------+----------+---------- 1 | +29% | +28% 2 | +49% | +51% 4 | +41% | +41% 8 | +33% | +55% 16 | +217% | +258% 32 | +264% | +293% 64 | +238% | +258% 128 | +246% | +242% (8 rows)
0001-Optimize-LISTEN-NOTIFY-signaling-with-a-lock-free-at.patch
Description: Binary data
0002-Optimize-LISTEN-NOTIFY-wakeup-by-replacing-signal-wi.patch
Description: Binary data