tobixdev opened a new pull request, #14617:
URL: https://github.com/apache/datafusion/pull/14617

   ## Which issue does this PR close?
   
   This change is related to #12622. 
   
   While the PR is not yet complete, I won't be able to work on it for at least 
a few days. Therefore, I'd like to gather some feedback on the current 
direction and spark some further discussions in #12622.
   
   ## Rationale for this change
   
   In #12622, we discussed how can decouple `ScalarValue` from the physical 
`DataType`.
   While there is already work on the `logical-types` branch, this PR explores  
a different approach that may be easier to integrate into the current `main` 
branch.
   
   This PR introduces the enum `LogicalScalar` - a scalar value that is 
decoupled from the physical arrow `DataType`.
   By doing this, `ScalarValue` can remain tightly integrated with `DataType` 
and no existing code breaks while still allowing us to use `LogicalScalar` in 
situations where a coupling to `DataType` is unwanted.
   
   Some design considerations of `LogicalScalar`:
   - Mirrors variants from `NativeType`. There is currently no support for 
extension types. However, adding  a `LogicalScalar::Extension` variant should 
be possible.
   - Non-trivial values (e.g., `Timestamp` have a corresponding 
`LogicalTimestamp` concept). For some variants this prevents creating invalid 
values. Furthermore, it allows us to attach logic to these values (e.g., 
transforming `LogicalTimestamp` to `chrono::NaiveDateTime`). 
   - I tried to translate most of the equality/printing/ordering behavior from 
`ScalarValue`. However, some fixes may still be necessary.
   
   `LogicalScalar` is supposed to be used in at least two ways:
   - Providing a *logical view* on a physical `ScalarValue` 
(`LogicalScalar::from(scalar_value)`). The resulting `LogicalScalar` can be 
more ergonomic to work with. To name a few "benefits": no working with arrow 
arrays, no dictionaries, no multiple encodings of `Utf8` strings and hopefully 
more. I have adapted the code in `datafusion/sql/src/unparser/expr.rs` to 
demonstrate such a use case.
   - Migration path for code using `ScalarValue`: As we want to decouple 
logical from physical types, the new `LogicalScalar` can provide a vehicle for 
replacing `ScalarValue` in `LogicalPlan` etc.
   
   The goal of this approach in the foreseeable future is to support the 
migration of `LogicalPlan` to logical types.
   To do this, we must replace the variant `Expr::Literal(ScalarValue)` with 
`Expr::Literal(LogicalScalar)`.
   However, as this breaks quite a few things, doing this will be done in other 
PRs (I think #14609 is exploring this impact)
   
   ## What changes are included in this PR?
   
   - Adds `LogicalScalar` enum and the contained structs and enums.
   - Changes to `datafusion/sql/src/unparser/expr.rs` to demonstrate the use of 
`LogicalScalar`.
   
   ## Are these changes tested?
   
   Some tests via the changes in `datafusion/sql/src/unparser/expr.rs`.
   I would like to improve upon that.
   
   ## Are there any user-facing changes?
   
   Yes Public APIs (`LogicalScalar`) have been added.
   Documentation on `LogicalScalar` is still incomplete (see Open Issues).
   
   ## Open Issues
   
   * [ ] Currently, transforming a physical scalar into a logical scalar does 
not return a `Result<LogicalScalar>`. I think, in most situations, extracting a 
logical value should succeed. However, there may be some special cases that I 
did not consider and there are quite some `expect` calls in 
`scalar/logical/from_scalar_value.rs`. Basically this boils down to ensuring 
that we can always extract a `ScalarValue` from the arrays contained in, for 
example, `ScalarValue::List`. What is your opinion on that?
   * [ ] Breaking change in unparser: Is `DATETIME` equivalent to `TIMESTAMP` 
is this context? See changes in the test suite.
   * [ ] Improve Documentation
   * [ ] Add tests for converting `ScalarValue` to `LogicalScalar`
   * [ ] I am a bit unsure about the value in `LogicalDecimal`. I think it 
makes sense to use a dedicated decimal library here for working with decimal 
values and only convert to the arrow decimals once we need them. Any opinions?  
   * [ ] Polish the implementation
   
   cc @jayzhan211 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to