Hello everyone, I had a look at CALCITE-2489 <https://issues.apache.org/jira/browse/CALCITE-2489> (Order of fields in JavaTypeFactoryImpl#createStructType is unstable) since it was linked to few PRs I worked on in the last months.
In a nutshell, the output of "Class.getFields()" is JVM implementation-specific without any guarantees on the field ordering (even if, at date, the implementation seems rather consistent), we should replace it with a deterministic and stable ordering instead. I have created two branches with a proposed implementation for that, one for Calcite <https://github.com/asolimando/calcite/tree/master-CALCITE-2489-createStructTypeUnstable>, and one for Avatica <https://github.com/asolimando/calcite-avatica/tree/master-CALCITE-2489-createStructTypeUnstable>, all tests are passing (Calcite needs the changes in Avatica, though). You can check the diff against the branch and master for both Calcite <https://github.com/asolimando/calcite/compare/master...asolimando:master-CALCITE-2489-createStructTypeUnstable> and Avatica <https://github.com/asolimando/calcite-avatica/compare/master...asolimando:master-CALCITE-2489-createStructTypeUnstable#diff-651ecd95723b9508c17cfd021ec3894dfbdc81ea3ce9384a5d57cf209a214a97> . The only rough edge I see with the current proposal is that ordering on the fields in the Sql type derived from a class "C" must match exactly the order in the constructor(s) of "C" itself. This comes from the code generation for instantiating classes (that is, "new Class(...)" in code generated via Janino), which relies on the order of the fields (see NewExpression.java#L63 <https://github.com/apache/calcite/blob/c475124ef8c61eb398ee119f27dfe21a14eca32f/linq4j/src/main/java/org/apache/calcite/linq4j/tree/NewExpression.java#L63> ). The code generation part did not change, and the same "limitation" applies to original implementation, where the constructor signature had to match the order of the fields declaration, and in turn this order had to be respected by the "Class.getFields()" implementation, which apparently is consistently the case, otherwise the existing implementation would not have worked. To test this, I have modified a few classes from ReflectiveSchemaTest.java, changing the order of either the field declarations, or the constructor, and the code generation was failing. What's worrying me with the proposed solution is that now the ordering logic is a bit more complicated: annotations can dictate the order explicitly for all the classes involved in the hierarchy, or rely on the default (from parent to descendants, alphabetical order for fields declared within the same class). Since the user must write the constructor knowing exactly the final order the fields will get, it can rapidly become complicated with a rich class hierarchy, especially if no annotations are used. So, in summary, relying on the JVM implementation-specific behaviour of "Class.getFields()" is risky, but on the other hand I see some potential friction for the final user of Calcite if we adopt a safer approach. I'd like to stress that the downside is not linked to my specific choices nor to the suggestions in the ticket description, since the constructor parameters cannot be known even via reflection, there is no way we can automatically match the constructor, the burden will always fall on the user. I'd like to hear your thoughts before eventually moving forward with opening the two PRs. Best regards, Alessandro
