Currently in the PgOutput messages for Insert statements, there is no way
to differentiate if the column was explicitly set to null in the insert
statement, or whether that column was omitted in the insert statement and
thus set to null (or default value)

For example, consider the below table:




*CREATE TABLE test_table (    id int primary key,    text_col_1 text
default 'default_text_1',    text_col_2 text default 'default_text_2');*

Now if we insert a row using:

*INSERT INTO test_table (id, text_col_1) VALUES (1, null);*
In the above query I am explicitly setting *text_col_1* to null, and I am
omitting *text_col_2* which will be populated with the default value.
The PgOutput replication message classifies both *text_col_1* and
*text_col_2* values as NULL. (Refer Logical Replication Message Format
<https://www.postgresql.org/docs/current/protocol-logicalrep-message-formats.html>
)
Thus we are not able to identify which column was explicitly set to null
and which column was set to the default value.
This causes issues when we are parsing the message, especially in libraries
such as Debezium, where it returns the default value for the column, even
though in the actual SQL query we set that column to be explicitly null.
Thus if there was a way to differentiate between the two cases in the
replication message itself, then the appropriate action can be taken in
downstream pipelines (set to null or set to default value)

Was wondering if there is a way to overcome this issue in the current
pgoutput plugin or the code needs to be changed to account for this case.
Any help would be appreciated

Reply via email to