Hi Arrow Friends, On the Arrow call today, I brought up an issue of the TIME type not supporting 24:00:00.
I'm doing compatibility checking against various databases (but mostly DuckDB) and this complicates the ability to round-trip data through IPC and back while preserving full fidelity at the edges of the time type. If a database contains a value of `24:00:00` and that value cannot be represented in Arrow, serializing to IPC and reading it back will necessarily lose information. My bias here is simple: users should be able to move their data through Arrow without alteration. I'm thinking we might want to revisit the decision to exclude 24:00:00. Because if databases have data at 24:00:00 (i.e. an event happened at a leap second) that information would be lost if they serialized it to Arrow then read the data back. 24:00:00 is supported by these databases: PostgreSQL, DB2, Cockroach, Redshift, DuckDB, MySQL, MariaDB, TiDB, ClickHouse It is not supported by: SQL Server, Snowflake, BigQuery, Firebird, H2. You can find my analysis and links to the docs here: https://rusty.today/blog/time-data-type-compatibility-across-databases/ There have been previous discussions (and votes) about this ~2021: "Clarifying interpretation of Time32/Time64 past 24 hours" - https://lists.apache.org/thread/ckms85lx5649z84jqhtowl7zvr71kxkr And there was a vote to exclude 24:00:00. I don’t have a strong opinion about leap seconds specifically, but I do care deeply about data fidelity and avoiding avoidable precision loss in IPC. Given the real-world database landscape, it may be time to reconsider whether excluding `24:00:00` best serves us all now. Cheers, Rusty https://query.farm
