[jira] [Commented] (CALCITE-6912) Confusion around typing of byte[] versus ByteString in Enumerable code

Chris Dennis (Jira) Wed, 26 Mar 2025 14:18:21 -0700


    [ 
https://issues.apache.org/jira/browse/CALCITE-6912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17938748#comment-17938748
 ]


Chris Dennis commented on CALCITE-6912:
---------------------------------------

I realize that the initial report here was probably a little vague, and 
describing a (subtly) different issue to the one I really care about. I've 
tried to dig a little more today and I think I've got a a tighter grip on it 
now?

In my failing test I have the query: {{INSERT INTO "terracotta"."dataset" (KEY, 
"cell") values (?, ?)}}

The logical rel-graph and resultant plan look like this:

{noformat}
Table :: RecordType(JavaType(long) KEY, JavaType(class [B) cell)
  LogicalTableModify :: RecordType(BIGINT ROWCOUNT)
    LogicalProject :: RecordType(JavaType(class java.lang.Long) KEY, 
JavaType(class [B) cell)
      LogicalValues :: RecordType(INTEGER ZERO)

Table :: RecordType(JavaType(long) KEY, JavaType(class [B) cell)
  TerracottaIterativeTableModify :: RecordType(BIGINT ROWCOUNT)
    EnumerableCalc :: RecordType(JavaType(class java.lang.Long) KEY, 
JavaType(class [B) cell)
      EnumerableValues :: RecordType(INTEGER ZERO)
{noformat}

To my untrained eye the type inference there looks wrong. The code appears to 
be inferring the typing of the parameters by looking at just the target type 
(the table row type) and not taking in to account the eventual source of the 
values (the avatica prepared statement logic). When using a literal byte string 
in the query instead of a parameter ({{INSERT INTO "terracotta"."dataset" (KEY, 
"cell") values (1, X'00')}} the types are:
{noformat}
Table :: RecordType(JavaType(long) KEY, JavaType(class [B) cell)
  LogicalTableModify :: RecordType(BIGINT ROWCOUNT)
    LogicalValues :: RecordType(BIGINT KEY, VARBINARY cell)
{noformat}
To me this makes sense, the typing of the LogicalValues is derived from the 
types of the SQL literals. The resultant linq code then reports the field type 
as ByteString correctly and I can inject the correct type conversion (the 
{{getBytes()}} call on the parameter to {{cells.add(...)}}:
{code:java}
public Enumerable bind(final DataContext root) {
  return Linq4j.singletonEnumerable(((Number) 
Primitive.of(long.class).numberValueRoundDown((Linq4j.asEnumerable(new Object[] 
{
      new Object[] {
        1L,
        new ByteString(new byte[] {(byte)0})}}).count(new Predicate1(){
      public boolean apply(Object arg0) {
        final CellSet cells = new CellSet();
        cells.add(_Cell.__cell("cell", ((ByteString) ((Object[]) 
arg0)[1]).getBytes()));
        return ((TerracottaTable) 
root.getRootSchema().getSubSchema("terracotta").getTable("dataset")).insert(SqlFunctions.toLong(((Object[])
 arg0)[0]), cells);
      }
    })))).longValue());
}
{code}

In the prepared statement version the enumerator has the unmodified inferred 
type and reports the wrong typing during code generation.
The generated enumerator has the following current implementation:
{code:java}
public Object current() {
  final Number value_dynamic_param = (Number) root.get("?0");
  final Object value_dynamic_param0 = root.get("?1");
  return new Object[] {
    value_dynamic_param,
    value_dynamic_param0};
}
{code}
Which given Avatica's wrapping of all binary types as ByteString leaves me with 
an object array containing a Long and a ByteString. When my adapter code asks 
the EnumerableCalc result phystype for a field reference to the second field I 
get a hard cast to byte[] because of the strange type inference, which then 
fails at runtime.
{code:java}
return Linq4j.singletonEnumerable(((Number) 
Primitive.of(long.class).numberValueRoundDown((input.count(new Predicate1(){
    public boolean apply(Object arg0) {
      final CellSet cells = new CellSet();
      final byte[] value0 = (byte[]) ((Object[]) arg0)[1];
      if (value0 != null) {
        cells.add(_Cell.__cell("cell", value0));
      }
      return ((TerracottaTable) 
root.getRootSchema().getSubSchema("terracotta").getTable("dataset")).insert(((Long)
 ((Object[]) arg0)[0]).longValue(), cells);
    }
  })))).longValue());
{code}

I'm wondering if an appropriate fix here would be to enhance the type inference 
to account for the way that Avatica translates the parameters passed to it? 
That way we could infer the types from the target table row type, but then 
translate the byte[] to ByteString to account for the behavior of the Avatica 
code. This would result in the correct type being reported by the Enumerable 
result and allow for me to correctly map back to byte[] at code generation time.

> Confusion around typing of byte[] versus ByteString in Enumerable code
> ----------------------------------------------------------------------
>
>                 Key: CALCITE-6912
>                 URL: https://issues.apache.org/jira/browse/CALCITE-6912
>             Project: Calcite
>          Issue Type: Bug
>          Components: avatica, core
>    Affects Versions: 1.35.0, 1.40.0
>            Reporter: Chris Dennis
>            Priority: Major
>              Labels: pull-request-available
>
> The Enumerable convention code generation seems to be at odds with both 
> Avatica and at other times with itself when trying to handle {{byte[]}} field 
> types. This leads to a variety of failure either at during code generation, 
> or at query execution time. I will push an example of a failing query in PR 
> shortly after filing this... but I suspect most any query that attempts to 
> project a byte[] typed column has a good chance of tripping over itself.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-6912) Confusion around typing of byte[] versus ByteString in Enumerable code

Reply via email to