Hi Hackers,

I propose adding trackable information in postgres_fdw, in order to track 
remote query correctly.

## Background and motivation

Currently postgres_fdw connects remote servers by using connect_pg_server(). 
However the function just calls PQconnectdbParams() with 
fallback_application_name = "postgres_fdw."
Therefore, if two or more servers connect to one data server and two queries 
arrive at the data server, the database administrator cannot determine which 
queries came from which server.
This problem prevents some workload analysis because it cannot track the flow 
of queries.

## Implementation

I just added local backend's pid to fallback_application_name. This is the key 
for seaching and matching two logs.
In order to use the feature and track remote transactions, user must add 
backend-pid and application_name to log_line_prefix, like

```
log_line_prefix = '%m [%p] [%a] '
```

Here is the output example. Assume that remote server has a table "foo," and 
local server imports the schema.
When local server executes foregin scan, the following line was output in the 
local's logfile.

```
2021-07-29 03:18:50.630 UTC [21572] [psql] LOG:  duration: 23.366 ms  
statement: select * from foo;
```

And in the remote's one, the following lines were appered.

```
2021-07-29 03:18:50.628 UTC [21573] [postgres_fdw for remote PID: 21572] LOG:  
duration: 0.615 ms  parse <unnamed>: DECLARE c1 CURSOR FOR
        SELECT id FROM public.foo
```

Two lines have same pid, so we can track the log and analyze workloads 
correctly.
I will write docs later, but now I want you to review the motivation and 
implementation.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachment: add_pid.patch
Description: add_pid.patch

Reply via email to