JNSimba commented on issue #341:
URL: 
https://github.com/apache/doris-spark-connector/issues/341#issuecomment-4550744476

   > Hi [@JNSimba](https://github.com/JNSimba), I'm planning take this on, 
because this requires casting in spark and is inconvenient to use. For example, 
cannot directly use `size > 0` in SQL as an example.
   > 
   > However, there's no easy way to allow backward compatibility since `Array` 
is currently converted to a `String` for querying. Not to mention, the real fix 
should ideally come from up-stream to include the element-type instead of 
inferring it.
   > 
   > Do you already have any ideas or suggestions on how to go about solving 
this issue? No worries, if not, I'll get back with a few suggestions before I 
even begin to implement.
   > 
   > On a side note, maybe worth opening up github discussions for such 
conversations?
   
   @addu390 Great, thanks for picking this up!
   
   A couple of things I think are worth considering:
   
   1. **How to obtain the array element type.** I'm not sure whether the Arrow 
result returned from Doris carries the child type directly — if it does, we can 
use it as-is; if not, we'd have to fall back to inferring the element type from 
the data. (A best approach  would be to rely on the `/schema` API from FE to 
get the precise type, but that ties this fix to a Doris-side change/upgrade, 
which probably isn't great for iteration speed on the connector side.)
   
   2. **Backward compatibility via a config option.** Since arrays are 
currently exposed as `String`, switching to a real `ArrayType` is a behavior 
change for existing users (their downstream SQL/casting logic could break). I 
think we should gate the new behavior behind a config (e.g. something like 
`doris.read.array.as-string`, defaulting to the old behavior for now, or with a 
clear deprecation path) so users can opt in.
   
   Happy to discuss further — either by opening GitHub Discussions on the Doris 
repo, or by creating a dedicated design issue, whichever you prefer


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to