jayzhan211 commented on issue #12622:
URL: https://github.com/apache/datafusion/issues/12622#issuecomment-2663050052

   > I am relying here that we can extract the necessary type information from 
the schema or record batches
   
   This is probably not true. Scalar is a freely constant unlike column that 
has DataType in the defined table.
   
   
   > However, these are probably subjective and some of them are certainly 
underexplored so I can only make a guess that they would result in a net 
benefit.
   
   A question I have is: When is ``LogicalScalar`` actually helpful?  
   
   ``LogicalScalar`` is used in ``Expr::Literal`` and later converted into 
``ScalarValue`` during physical plan execution. This adds a slight overhead, 
but for most types, ``LogicalScalar`` and ``ScalarValue`` are nearly identical. 
The key differences arise with ``Dictionary``, ``Utf8`` variants, or 
potentially ``REE`` in the future.  
   
   Even if we can convert to `ScalarValue`, since we don't have DataType, 
`LogicalScalar::String` can only be converted to `ScalarValue::Utf8` but not 
`ScalarValue::Utf8View` or `ScalarValue::Diction(_, Utf8)`. Given `type 
coercion` resolve the DataType of the Expr not only the LogicalType, it is 
problematic if we can't convert the scalar string to the specific 
StringViewArray.
   
   Potential direction would be, removing the need to know the `DataType` for 
`Scalar`.
   The reason we need `DataType` for `Scalar` is being able to convert to 
correct ArrayRef for arrow::kernel execution. There are two possible ways to 
avoid this.
   
   1) Support kernel function for Array and Scalar, like 
`comparison(StringViewArray, rust::String)` or `add(Dictionary(k, I32Array), 
i64)`.
   2) Add casting mechanic before calling kernel function in physical layer.
   
   I guess these are close to what #12720 described
   
   if we don't need DataType for ScalarValue then `LogicalScalar` will be easy 
to bring into DataFusion
   
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to