Max Gekk created SPARK-57838:
--------------------------------

             Summary: Harden overflow and calendar-range handling for 
nanosecond-precision timestamps
                 Key: SPARK-57838
                 URL: https://issues.apache.org/jira/browse/SPARK-57838
             Project: Spark
          Issue Type: Sub-task
          Components: SQL
    Affects Versions: 4.3.0
            Reporter: Max Gekk


This sub-task is part of the umbrella SPARK-56822 (timestamps with nanosecond 
precision).

h2. Problem
The SPIP flags overflow/range as a top risk, and the audit confirmed residual 
gaps. {{TimestampNanosVal}} validates {{nanosWithinMicro}} but does not 
normalize carries ({{fromParts}} throws {{INTERNAL_ERROR}} on denormalized 
input). Parse/cast overflow is swallowed as {{None}} or surfaced as 
{{CAST_INVALID_INPUT}} rather than {{DATETIME_FIELD_OUT_OF_BOUNDS}}; there is 
no explicit 0001-9999 validation on cast/parse/interval-add paths; 
{{timestampNanosAddDayTime}} has no overflow wrapper (unlike {{timestampAdd}}). 
A single int64 epoch-nanos cannot represent the full 0001-9999 range 
(~1677-2262 only) - Parquet/Arrow/Avro sinks fail loudly (good), but this 
composite-vs-int64 split is under-documented.

h2. Goal
Consistent, well-typed overflow/range behavior with correct error classes, plus 
boundary test coverage.

h2. Scope
Add explicit representable-range validation on parse/cast/interval-add raising 
{{DATETIME_FIELD_OUT_OF_BOUNDS}} / {{ARITHMETIC_OVERFLOW}}; audit remaining 
non-exact {{* NANOS_PER_MICROS}} multiplications on unbounded {{epochMicros}}; 
add min ({{0001-01-01T00:00:00.000000000}}) and max 
({{9999-12-31T23:59:59.999999999}}) boundary tests across 
parse/format/cast/arithmetic; document the int64-epoch-nanos vs {{(micros, 
nanos)}} composite range split for format consumers.

h2. Acceptance criteria
* Out-of-range parse/cast raise the datetime bounds error (not generic invalid 
input); arithmetic near the boundaries raises overflow; boundary tests pass; no 
silent wrap.

h2. Testing
{{DateTimeUtilsSuite}}, {{CastSuiteBase}}, {{DateExpressionsSuite}}, 
{{TimestampNanosParseSuite}}.

h2. Dependencies
Cross-cutting - coordinate with the timestampadd/timestampdiff, to_timestamp*, 
sequence, and timestamp-subtraction sub-tasks (the arithmetic/parse paths it 
hardens); no hard blocker.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to