Hi hackers, While I was writing a test for PSQL, I faced a weird scenario. Depending on how I build PSQL (enabling or not debug options), I saw different results for the following query.
Steps to reproduce: - OS: Ubuntu 20.04 - PSQL version 14.4 CREATE TABLE test (single_byte_col "char"); INSERT INTO test (single_byte_col) VALUES ('🀆'); In case where the following query runs in debug mode, I got this result: SELECT left(single_byte_col, 1) l FROM test; l ----------- ~\x7F\x7F (1 row) Once I turn off debug mode, I got: SELECT left(single_byte_col, 1) l FROM test; l --- (1 row) That triggered me to use Valgrind, which reported the following error: ==00:00:03:22.867 1171== VALGRINDERROR-BEGIN ==00:00:03:22.867 1171== Invalid read of size 1 ==00:00:03:22.867 1171== at 0x4C41209: memmove (vg_replace_strmem.c:1382) ==00:00:03:22.868 1171== by 0xC8D8D0: text_substring (../src/backend/utils/adt/varlena.c:1050) ==00:00:03:22.868 1171== by 0xC9AC78: text_left (../src/backend/utils/adt/varlena.c:5614) ==00:00:03:22.868 1171== by 0x7FBB75: ExecInterpExpr (../src/backend/executor/execExprInterp.c:749) ==00:00:03:22.868 1171== by 0x7FADA6: ExecInterpExprStillValid (../src/backend/executor/execExprInterp.c:1824) ==00:00:03:22.868 1171== by 0x81C0AA: ExecEvalExprSwitchContext (../src/include/executor/executor.h:339) ==00:00:03:22.868 1171== by 0x81BD96: ExecProject (../src/include/executor/executor.h:373) ==00:00:03:22.868 1171== by 0x81B8BE: ExecScan (../src/backend/executor/execScan.c:238) ==00:00:03:22.868 1171== by 0x8638BA: ExecSeqScan (../src/backend/executor/nodeSeqscan.c:112) ==00:00:03:22.868 1171== by 0x8170D4: ExecProcNodeFirst (../src/backend/executor/execProcnode.c:465) ==00:00:03:22.868 1171== by 0x80F291: ExecProcNode (../src/include/executor/executor.h:257) ==00:00:03:22.868 1171== by 0x80A8F0: ExecutePlan (../src/backend/executor/execMain.c:1551) ==00:00:03:22.868 1171== by 0x80A7A3: standard_ExecutorRun (../src/backend/executor/execMain.c:361) ==00:00:03:22.868 1171== by 0x136231BC: pgss_ExecutorRun (../contrib/pg_stat_statements/pg_stat_statements.c:1001) ==00:00:03:22.868 1171== by 0x80A506: ExecutorRun (../src/backend/executor/execMain.c:303) ==00:00:03:22.868 1171== by 0xAD71B0: PortalRunSelect (../src/backend/tcop/pquery.c:921) ==00:00:03:22.868 1171== by 0xAD6B75: PortalRun (../src/backend/tcop/pquery.c:765) ==00:00:03:22.868 1171== by 0xAD1B3C: exec_simple_query (../src/backend/tcop/postgres.c:1214) ==00:00:03:22.868 1171== by 0xAD0D7F: PostgresMain (../src/backend/tcop/postgres.c:4496) ==00:00:03:22.868 1171== by 0x9D6C79: BackendRun (../src/backend/postmaster/postmaster.c:4530) ==00:00:03:22.868 1171== by 0x9D61A3: BackendStartup (../src/backend/postmaster/postmaster.c:4252) ==00:00:03:22.868 1171== by 0x9D4F56: ServerLoop (../src/backend/postmaster/postmaster.c:1745) ==00:00:03:22.868 1171== by 0x9D23A4: PostmasterMain (../src/backend/postmaster/postmaster.c:1417) ==00:00:03:22.868 1171== by 0x8AA0D2: main (../src/backend/main/main.c:209) ==00:00:03:22.868 1171== Address 0x3ed1f3c5 is 293 bytes inside a block of size 8,192 alloc'd ==00:00:03:22.868 1171== at 0x4C37135: malloc (vg_replace_malloc.c:381) ==00:00:03:22.868 1171== by 0xD1C3C0: AllocSetContextCreateInternal (../src/backend/utils/mmgr/aset.c:469) ==00:00:03:22.868 1171== by 0x821CB4: CreateExprContextInternal (../src/backend/executor/execUtils.c:253) ==00:00:03:22.868 1171== by 0x821BE2: CreateExprContext (../src/backend/executor/execUtils.c:303) ==00:00:03:22.868 1171== by 0x822078: ExecAssignExprContext (../src/backend/executor/execUtils.c:482) ==00:00:03:22.868 1171== by 0x8637D6: ExecInitSeqScan (../src/backend/executor/nodeSeqscan.c:147) ==00:00:03:22.868 1171== by 0x816B2D: ExecInitNode (../src/backend/executor/execProcnode.c:211) ==00:00:03:22.868 1171== by 0x80A336: InitPlan (../src/backend/executor/execMain.c:936) ==00:00:03:22.868 1171== by 0x809BE7: standard_ExecutorStart (../src/backend/executor/execMain.c:263) ==00:00:03:22.868 1171== by 0x1362303C: pgss_ExecutorStart (../contrib/pg_stat_statements/pg_stat_statements.c:963) ==00:00:03:22.868 1171== by 0x809881: ExecutorStart (../src/backend/executor/execMain.c:141) ==00:00:03:22.868 1171== by 0xAD636A: PortalStart (../src/backend/tcop/pquery.c:514) ==00:00:03:22.868 1171== by 0xAD1A31: exec_simple_query (../src/backend/tcop/postgres.c:1175) ==00:00:03:22.868 1171== by 0xAD0D7F: PostgresMain (../src/backend/tcop/postgres.c:4496) ==00:00:03:22.868 1171== by 0x9D6C79: BackendRun (../src/backend/postmaster/postmaster.c:4530) ==00:00:03:22.868 1171== by 0x9D61A3: BackendStartup (../src/backend/postmaster/postmaster.c:4252) ==00:00:03:22.868 1171== by 0x9D4F56: ServerLoop (../src/backend/postmaster/postmaster.c:1745) ==00:00:03:22.868 1171== by 0x9D23A4: PostmasterMain (../src/backend/postmaster/postmaster.c:1417) ==00:00:03:22.868 1171== by 0x8AA0D2: main (../src/backend/main/main.c:209) ==00:00:03:22.868 1171== ==00:00:03:22.868 1171== VALGRINDERROR-END Then I attached a debugger to inspect the variables involved in the error reported by Valgrind for file /src/backend/utils/adt/varlena.c: 1045 for (i = S1; i < E1; i++) 1046 p += pg_mblen(p); 1047 1048 ret = (text *) palloc(VARHDRSZ + (p - s)); 1049 SET_VARSIZE(ret, VARHDRSZ + (p - s)); 1050 memcpy(VARDATA(ret), s, (p - s)); The column "single_byte_col" is supposed to store only 1 byte. Nevertheless, the INSERT command implicitly casts the '🀆' text into "char". This means that only the first byte of '🀆' ends up stored in the column. gdb reports that "pg_mblen(p) = 4" (line 1046), which is expected since the pg_mblen('🀆') is indeed 4. Later at line 1050, the memcpy will copy 4 bytes instead of 1, hence an out of bounds memory read happens for pointer 's', which effectively copies random bytes. -- Spiros (ServiceNow)