Hello everyone,
I had a look at CALCITE-2489
<https://issues.apache.org/jira/browse/CALCITE-2489> (Order of fields in
JavaTypeFactoryImpl#createStructType is unstable) since it was linked to
few PRs I worked on in the last months.

In a nutshell, the output of "Class.getFields()" is JVM
implementation-specific without any guarantees on the field ordering (even
if, at date, the implementation seems rather consistent), we should replace
it with a deterministic and stable ordering instead.

I have created two branches with a proposed implementation for that, one
for Calcite
<https://github.com/asolimando/calcite/tree/master-CALCITE-2489-createStructTypeUnstable>,
and one for Avatica
<https://github.com/asolimando/calcite-avatica/tree/master-CALCITE-2489-createStructTypeUnstable>,
all tests are passing (Calcite needs the changes in Avatica, though). You
can check the diff against the branch and master for both Calcite
<https://github.com/asolimando/calcite/compare/master...asolimando:master-CALCITE-2489-createStructTypeUnstable>
and Avatica
<https://github.com/asolimando/calcite-avatica/compare/master...asolimando:master-CALCITE-2489-createStructTypeUnstable#diff-651ecd95723b9508c17cfd021ec3894dfbdc81ea3ce9384a5d57cf209a214a97>
.

The only rough edge I see with the current proposal is that ordering on the
fields in the Sql type derived from a class "C" must match exactly the
order in the constructor(s) of "C" itself.

This comes from the code generation for instantiating classes (that is,
"new Class(...)" in code generated via Janino), which relies on the order
of the fields (see NewExpression.java#L63
<https://github.com/apache/calcite/blob/c475124ef8c61eb398ee119f27dfe21a14eca32f/linq4j/src/main/java/org/apache/calcite/linq4j/tree/NewExpression.java#L63>
).

The code generation part did not change, and the same "limitation" applies
to original implementation, where the constructor signature had to match
the order of the fields declaration, and in turn this order had to be
respected by the "Class.getFields()" implementation, which apparently is
consistently the case, otherwise the existing implementation would not have
worked.
To test this, I have modified a few classes from ReflectiveSchemaTest.java,
changing the order of either the field declarations, or the constructor,
and the code generation was failing.

What's worrying me with the proposed solution is that now the ordering
logic is a bit more complicated: annotations can dictate the order
explicitly for all the classes involved in the hierarchy, or rely on the
default (from parent to descendants, alphabetical order for fields declared
within the same class). Since the user must write the constructor knowing
exactly the final order the fields will get, it can rapidly become
complicated with a rich class hierarchy, especially if no annotations are
used.

So, in summary, relying on the JVM implementation-specific behaviour of
"Class.getFields()" is risky, but on the other hand I see some potential
friction for the final user of Calcite if we adopt a safer approach. I'd
like to stress that the downside is not linked to my specific choices nor
to the  suggestions in the ticket description, since the constructor
parameters cannot be known even via reflection, there is no way we can
automatically match the constructor, the burden will always fall on the
user.

I'd like to hear your thoughts before eventually moving forward with
opening the two PRs.

Best regards,
Alessandro

Reply via email to