Hi Hackers. I have a test table with multiple (10) columns defined as TIME WITHOUT TIME ZONE.
When loading this table with a lot of data (e.g. "COPY tbl FROM /my/path/2GB.csv WITH (FORMAT CSV)") I observed it was spending an excessive amount of time within the function GetCurrentDateTime. IIUC the code is calling GetCurrentDateTime only to acquire the current TX timestamp as a struct pg_tm in order to derive some timezone information. My test table has 10 x TIME columns. My test data has 22.5 million rows (~ 2GB) So that's 225 million times the GetCurrentDateTime function is called to populate the struct with the same values. I have attached a patch which caches this struct, so now those 225 million calls are reduced to just 1 call. ~ Test Results: Copy 22.5 million rows data (~ 2GB) BEFORE Run 1 = 4m 36s Run 2 = 4m 30s Run 3 = 4m 32s perf showed 20.95% time in GetCurrentDateTime AFTER (cached struct) Run 1 = 3m 44s Run 2 = 3m 44s Run 3 = 3m 45s perf shows no time in GetCurrentDateTime ~17% performance improvement in my environment. YMMV. ~ Thoughts? Kind Regards Peter Smith. Fujitsu Australia.
DecodeTimeOnly_GetCurrentDateTime.patch
Description: Binary data