Unsafe loop in CatalystTypeConverters

Edmondo Porcu Fri, 02 Sep 2022 16:44:58 -0700

Hello Spark,

as I tried to implement an improvement on Delta Lake by changing a
signature of a command by making it return some data (see
https://github.com/delta-io/delta/pull/1240/files#diff-f993636fe0ceb5f15de09ed4a0a71ea405a5e89ad66d36047099fcc1387a375aR123)
the code produced a runtime exception because I didn't add override the
output here


https://github.com/delta-io/delta/pull/1240/files#diff-f993636fe0ceb5f15de09ed4a0a71ea405a5e89ad66d36047099fcc1387a375aR123

The loop here makes https://github.com/apache/spark/pull/37778/files about
the length of two collections and uses the index of the first collection to
iterate on the second as well.  A more defensive approach would have been
to zip the two collections or at least check if this pre-requisite is met.

I understand it might not be a big problem, but if you think this is worth
addressing or improving, besides the "throw a more meaningful error"
approach suggested in a PR, other suitable approaches could be:
- removing the default value for the output collection and forcing a clear
decision from the developer upon override, providing a mixable trait
"NoOutputCommand"
- adding scalaDoc
to org.apache.spark.sql.catalyst.plans.logical.Command#output
- overwriting the default output instance in LeafCommand and making it
return a ???


What do you think?
Edmondo

Unsafe loop in CatalystTypeConverters

Reply via email to