[ https://issues.apache.org/jira/browse/CALCITE-6912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17938748#comment-17938748 ]
Chris Dennis commented on CALCITE-6912: --------------------------------------- I realize that the initial report here was probably a little vague, and describing a (subtly) different issue to the one I really care about. I've tried to dig a little more today and I think I've got a a tighter grip on it now? In my failing test I have the query: {{INSERT INTO "terracotta"."dataset" (KEY, "cell") values (?, ?)}} The logical rel-graph and resultant plan look like this: {noformat} Table :: RecordType(JavaType(long) KEY, JavaType(class [B) cell) LogicalTableModify :: RecordType(BIGINT ROWCOUNT) LogicalProject :: RecordType(JavaType(class java.lang.Long) KEY, JavaType(class [B) cell) LogicalValues :: RecordType(INTEGER ZERO) Table :: RecordType(JavaType(long) KEY, JavaType(class [B) cell) TerracottaIterativeTableModify :: RecordType(BIGINT ROWCOUNT) EnumerableCalc :: RecordType(JavaType(class java.lang.Long) KEY, JavaType(class [B) cell) EnumerableValues :: RecordType(INTEGER ZERO) {noformat} To my untrained eye the type inference there looks wrong. The code appears to be inferring the typing of the parameters by looking at just the target type (the table row type) and not taking in to account the eventual source of the values (the avatica prepared statement logic). When using a literal byte string in the query instead of a parameter ({{INSERT INTO "terracotta"."dataset" (KEY, "cell") values (1, X'00')}} the types are: {noformat} Table :: RecordType(JavaType(long) KEY, JavaType(class [B) cell) LogicalTableModify :: RecordType(BIGINT ROWCOUNT) LogicalValues :: RecordType(BIGINT KEY, VARBINARY cell) {noformat} To me this makes sense, the typing of the LogicalValues is derived from the types of the SQL literals. The resultant linq code then reports the field type as ByteString correctly and I can inject the correct type conversion (the {{getBytes()}} call on the parameter to {{cells.add(...)}}: {code:java} public Enumerable bind(final DataContext root) { return Linq4j.singletonEnumerable(((Number) Primitive.of(long.class).numberValueRoundDown((Linq4j.asEnumerable(new Object[] { new Object[] { 1L, new ByteString(new byte[] {(byte)0})}}).count(new Predicate1(){ public boolean apply(Object arg0) { final CellSet cells = new CellSet(); cells.add(_Cell.__cell("cell", ((ByteString) ((Object[]) arg0)[1]).getBytes())); return ((TerracottaTable) root.getRootSchema().getSubSchema("terracotta").getTable("dataset")).insert(SqlFunctions.toLong(((Object[]) arg0)[0]), cells); } })))).longValue()); } {code} In the prepared statement version the enumerator has the unmodified inferred type and reports the wrong typing during code generation. The generated enumerator has the following current implementation: {code:java} public Object current() { final Number value_dynamic_param = (Number) root.get("?0"); final Object value_dynamic_param0 = root.get("?1"); return new Object[] { value_dynamic_param, value_dynamic_param0}; } {code} Which given Avatica's wrapping of all binary types as ByteString leaves me with an object array containing a Long and a ByteString. When my adapter code asks the EnumerableCalc result phystype for a field reference to the second field I get a hard cast to byte[] because of the strange type inference, which then fails at runtime. {code:java} return Linq4j.singletonEnumerable(((Number) Primitive.of(long.class).numberValueRoundDown((input.count(new Predicate1(){ public boolean apply(Object arg0) { final CellSet cells = new CellSet(); final byte[] value0 = (byte[]) ((Object[]) arg0)[1]; if (value0 != null) { cells.add(_Cell.__cell("cell", value0)); } return ((TerracottaTable) root.getRootSchema().getSubSchema("terracotta").getTable("dataset")).insert(((Long) ((Object[]) arg0)[0]).longValue(), cells); } })))).longValue()); {code} I'm wondering if an appropriate fix here would be to enhance the type inference to account for the way that Avatica translates the parameters passed to it? That way we could infer the types from the target table row type, but then translate the byte[] to ByteString to account for the behavior of the Avatica code. This would result in the correct type being reported by the Enumerable result and allow for me to correctly map back to byte[] at code generation time. > Confusion around typing of byte[] versus ByteString in Enumerable code > ---------------------------------------------------------------------- > > Key: CALCITE-6912 > URL: https://issues.apache.org/jira/browse/CALCITE-6912 > Project: Calcite > Issue Type: Bug > Components: avatica, core > Affects Versions: 1.35.0, 1.40.0 > Reporter: Chris Dennis > Priority: Major > Labels: pull-request-available > > The Enumerable convention code generation seems to be at odds with both > Avatica and at other times with itself when trying to handle {{byte[]}} field > types. This leads to a variety of failure either at during code generation, > or at query execution time. I will push an example of a failing query in PR > shortly after filing this... but I suspect most any query that attempts to > project a byte[] typed column has a good chance of tripping over itself. -- This message was sent by Atlassian Jira (v8.20.10#820010)