JNSimba commented on issue #341: URL: https://github.com/apache/doris-spark-connector/issues/341#issuecomment-4550744476
> Hi [@JNSimba](https://github.com/JNSimba), I'm planning take this on, because this requires casting in spark and is inconvenient to use. For example, cannot directly use `size > 0` in SQL as an example. > > However, there's no easy way to allow backward compatibility since `Array` is currently converted to a `String` for querying. Not to mention, the real fix should ideally come from up-stream to include the element-type instead of inferring it. > > Do you already have any ideas or suggestions on how to go about solving this issue? No worries, if not, I'll get back with a few suggestions before I even begin to implement. > > On a side note, maybe worth opening up github discussions for such conversations? @addu390 Great, thanks for picking this up! A couple of things I think are worth considering: 1. **How to obtain the array element type.** I'm not sure whether the Arrow result returned from Doris carries the child type directly — if it does, we can use it as-is; if not, we'd have to fall back to inferring the element type from the data. (A best approach would be to rely on the `/schema` API from FE to get the precise type, but that ties this fix to a Doris-side change/upgrade, which probably isn't great for iteration speed on the connector side.) 2. **Backward compatibility via a config option.** Since arrays are currently exposed as `String`, switching to a real `ArrayType` is a behavior change for existing users (their downstream SQL/casting logic could break). I think we should gate the new behavior behind a config (e.g. something like `doris.read.array.as-string`, defaulting to the old behavior for now, or with a clear deprecation path) so users can opt in. Happy to discuss further — either by opening GitHub Discussions on the Doris repo, or by creating a dedicated design issue, whichever you prefer -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
