[
https://issues.apache.org/jira/browse/CALCITE-7204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18023341#comment-18023341
]
Mihai Budiu commented on CALCITE-7204:
--------------------------------------
I want to point out a possible problem (which exists theoretically in the
previous PR about redundant casts between integer types): the fact that you
declare in SQL a certain type does not require the compiler to actually use
that exact type at runtime. For example, SQL allows arbitrary precision and
scale for DOUBLE, but in practice there is only one DOUBLE type, matching the
IEEE 754 standard. (I think that this happens because SQL predates the IEEE
standard).
The problem exists elsewhere; for example, I believe that Calcite's runtime
cannot handle TIMESTAMP values with a precision larger than 3. Runtime values
for INTERVAL types have problem representing sub-second values:
https://issues.apache.org/jira/browse/CALCITE-6752.
In our compiler we allow users to write TIMESTAMP(N) but only implement
TIMESTAMP(3) and warn users that their N value is ignored.
I am not aware of any rewriting pass which would rewrite plans to reflect such
limitations.
In consequence, if these optimization rules you describe will be implemented, I
think there should be a way to opt out of having them applied.
In general, I think the validator and RelBuilder should abstain from optimizing
the program, that should be left to optimization rules, which are opt-in.
> Add support for lossless cast detection for DATETIME types
> ----------------------------------------------------------
>
> Key: CALCITE-7204
> URL: https://issues.apache.org/jira/browse/CALCITE-7204
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Affects Versions: 1.40.0
> Reporter: Alessandro Solimando
> Priority: Major
>
> _RexUtil.isLosslessCast_ doesn't currently support date/time types at all and
> defaults to considering casts always lossy, leading to missed opportunities
> and potential suboptimal planning.
> The current ticket aims at adding support for the DATETIME family types.
> A proposal of what to handle, which should be re-verified precisely by the
> implementer:
> * *TIME(p) -> TIME(p')*
> lossless iff p' >= p (widening fractional-second precision)
> (Reverse is not guaranteed: narrowing can round away sub-second units)
> * *TIMESTAMP(p) → TIMESTAMP(p')* (without time zone): lossless iff p' >= p
> * *DATE → TIMESTAMP(p)* (without time zone)
> lossless (round-trip {{DATE -> TIMESTAMP -> DATE}} always recovers the
> original date, the first cast adds padding like {{00:00:00[.000…]}} which can
> be truncated in the second cast)
> * *TIME(p) → TIMESTAMP(p')* (without time zone)
> lossless iff p' >= p (round-trip {{TIME -> TIMESTAMP -> TIME}} preserves the
> time component, the synthetic date part is discarded on the second cast)
> * *TIMESTAMP WITH LOCAL TIME ZONE (TSLTZ)*
> ** {*}TSLTZ(p) -> TSLTZ(p'){*}: lossless iff p' >= p
> ** {*}TSLTZ <=> TIMESTAMP (without TZ){*}: *conservatively not lossless*
> (semantics differ: instant vs local wall-time, the DST/offset transitions can
> change wall-time on round-trip)
> * Optional / out-of-scope (separate ticket/tickets):
> ** *DATE/TIME/TIMESTAMP <=> CHARACTER*
> ** *INTERVAL* types:
> *** YEAR-MONTH intervals: widening fields/precision is lossless
> *** DAY-SECOND intervals: widening fractional-second precision and/or field
> range is lossless
> Type precision: for integers types we had surprising effects (see
> [here|https://github.com/apache/calcite/pull/4557#discussion_r2379703479]),
> take special care in its handling and verify precisely assumptions, in doubt
> be conservative as it's critical for the method to not return false positives
> as it immediately affects correctness.
> Tests and impact:
> * The newly supported cases must (at least) be covered appropriately in
> _RexLosslessCastTest_ (positive and negative tests, see CALCITE-7174 for an
> example)
> * When existing plans change due to further simplifications/rule firing,
> group changes by "patterns" and a justification for non-trivial cases
> Note: if implementing all this at once is too much, we can break it into
> multiple tickets (for instance, TZ-aware cases can become a separate ticket,
> in case it's fine to detect them and return false for now)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)