[
https://issues.apache.org/jira/browse/CALCITE-7204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18023525#comment-18023525
]
Alessandro Solimando commented on CALCITE-7204:
-----------------------------------------------
Thanks [~mbudiu] for pointing that out, I wasn't aware of this situation. Just
to make sure I understand it fully, when you say Calcite's runtime you mean the
_Enumerable_ implementation or there is more to it like _RexExecutor_ or other
components?
At the moment this _RexUtil.isLosslessCast_ method is used for NDV estimation
(due to CALCITE-7173) which wasn't accurate and might still not be due to this
issue, but I don't consider it critical.
However, but the method is also [in
use|https://github.com/search?q=repo%3Aapache/calcite%20isLosslessCast&type=code]
in other places like {_}RexSimplify{_}, and while that's neither the validator
nor the _RelBuilder_ it will affect semantics and possibly correctness, from
what you say.
It's scary that the type system and the type information shipped with the data
types aren't reflecting the real values might not be accurate, if we can't
trust this "oracle" components, then there is very little we can do safely in
terms of simplification/constant folding etc.
Should we, instead, have this handled in the type factory adjusting types so
that when you try to create a _TIMESTAMP(9)_ it does create a _TIMESTAMP(3)_ so
that you can rely on the data types info?
Or even explicitly cast the sought types to the real runtime types, like
_CAST($time AS TIME(9))_ wherever _$time_ of type _TIME_ occurs, so that the
simplification will understand the implicit context?
This last option aligns with your general suggestion I saw a few times that
there should be a mode where type coercion is made fully explicit, which we
don't have at the moment (btw, having such a mode, would have made my life a
lot easier in a few occasions).
The most dangerous place where this method is plugged-in now is
{_}RexSimplify{_}, shall we add a new parameter to the [internal
constructor|https://github.com/apache/calcite/blob/83882017b188222a06916145b0f4867b6a7e2b1f/core/src/main/java/org/apache/calcite/rex/RexSimplify.java#L106]
to prevent using loss-less cast detection? Do you have other suggestions on
how to handle/limit the blast radius?
> Add support for lossless cast detection for DATETIME types
> ----------------------------------------------------------
>
> Key: CALCITE-7204
> URL: https://issues.apache.org/jira/browse/CALCITE-7204
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Affects Versions: 1.40.0
> Reporter: Alessandro Solimando
> Priority: Major
>
> _RexUtil.isLosslessCast_ doesn't currently support date/time types at all and
> defaults to considering casts always lossy, leading to missed opportunities
> and potential suboptimal planning.
> The current ticket aims at adding support for the DATETIME family types.
> A proposal of what to handle, which should be re-verified precisely by the
> implementer:
> * *TIME(p) -> TIME(p')*
> lossless iff p' >= p (widening fractional-second precision)
> (Reverse is not guaranteed: narrowing can round away sub-second units)
> * *TIMESTAMP(p) → TIMESTAMP(p')* (without time zone): lossless iff p' >= p
> * *DATE → TIMESTAMP(p)* (without time zone)
> lossless (round-trip {{DATE -> TIMESTAMP -> DATE}} always recovers the
> original date, the first cast adds padding like {{00:00:00[.000…]}} which can
> be truncated in the second cast)
> * *TIME(p) → TIMESTAMP(p')* (without time zone)
> lossless iff p' >= p (round-trip {{TIME -> TIMESTAMP -> TIME}} preserves the
> time component, the synthetic date part is discarded on the second cast)
> * *TIMESTAMP WITH LOCAL TIME ZONE (TSLTZ)*
> ** {*}TSLTZ(p) -> TSLTZ(p'){*}: lossless iff p' >= p
> ** {*}TSLTZ <=> TIMESTAMP (without TZ){*}: *conservatively not lossless*
> (semantics differ: instant vs local wall-time, the DST/offset transitions can
> change wall-time on round-trip)
> * Optional / out-of-scope (separate ticket/tickets):
> ** *DATE/TIME/TIMESTAMP <=> CHARACTER*
> ** *INTERVAL* types:
> *** YEAR-MONTH intervals: widening fields/precision is lossless
> *** DAY-SECOND intervals: widening fractional-second precision and/or field
> range is lossless
> Type precision: for integers types we had surprising effects (see
> [here|https://github.com/apache/calcite/pull/4557#discussion_r2379703479]),
> take special care in its handling and verify precisely assumptions, in doubt
> be conservative as it's critical for the method to not return false positives
> as it immediately affects correctness.
> Tests and impact:
> * The newly supported cases must (at least) be covered appropriately in
> _RexLosslessCastTest_ (positive and negative tests, see CALCITE-7174 for an
> example)
> * When existing plans change due to further simplifications/rule firing,
> group changes by "patterns" and a justification for non-trivial cases
> Note: if implementing all this at once is too much, we can break it into
> multiple tickets (for instance, TZ-aware cases can become a separate ticket,
> in case it's fine to detect them and return false for now)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)