Re: Two newbie questions regarding Calcite Cassandra Adapter

Luis Fernando Kauer Mon, 04 Sep 2017 11:15:06 -0700

Hi,

I'll try to answer your questions since nobody else did.
But I'm still fairly new to Calcite and I'm still learning it.


I'll answer your questions in reverse order because the answer to the second 
question is important to answer the first question.

2. ConverterRules: how Calcite knows which LogicalProject conversion rule to 
apply
Calcite has the concept of "Convention", which determines the calling 
convention.  Relational expressions start with Convention.NONE and each adapter 
sets its own convention when implementing TranslatableTable.toRel.
CassandraRel implements a Convention for Cassandra.
MongoRel implements a Convention for MongoDB.
CassandraTable.toRel returns a CassandraTableScan with a traitSet of 
CassandraRel.CONVENTION.
If you check the rules, there are 2 types basically:
- Normal rule (extends RelOptRule): rule defined by "operands"
- Conversion rule (extends ConverterRule): defined by a class and a traitSet 
(convention)
The converter rule uses the convention of the input to match the rule.
CassandraProjectRule is a ConverterRule so it matches only when the input has 
CassandraRel.CONVENTION. So it won't match a LogicalProject with 
MongoRel.CONVENTION.
The adapters also have a rule to convert between its convention and the 
EnumerableConvention.INSTANCE (CassandraToEnumerableConverter).
So the convention is the key to know which conversion rule to match and apply.
1. Different plans with different "Limit" clause

Looking into calcite-cassandra adapter I noticed that some of its rules are not 
convertion rules and they try to convert/apply to nodes already converted to 
Enumerable.
Take a look at CassandraLimitRule:
  super(operand(EnumerableLimit.class, 
operand(CassandraToEnumerableConverter.class, any())), "CassandraLimitRule");
The problem with this approach (IMHO) is that it depends heavily on the overall 
costs to have the CassandraLimit matched and chosen, but this kind of node 
should always be pushed.
That's what is happening when the limit is 1 in your query. 
LogicalSort is used for sorting but also for limit (fetch) and offset.
Cassandra adapter has a conversion rule for Sort but not when offset or limit 
is set.
I don't know the details why this approach was chosen, maybe it was because the 
jdbc adapter is a reference and it does not implement limit and offset 
correctly.
But I think CassandraSortRule could use the limit (fetch) and offset 
information and set it in the implement method, with no need for CassandraLimit.
Trying to solve this by only changing the costs is hard because the planner 
generates many different plans and you'll have to make sure the costs will 
always be smaller and keeping the enumerable nodes.

Hope it helps.

Best regards,

Luis Fernando

Em Sexta-feira, 1 de Setembro de 2017 1:13, Junwei Li 
<[email protected]> escreveu:



1. I happen to notice that the value in limit could change the query plan.

For example, if I do explain plan for "select \"tweet_id\" from

\"userline\" limit 1", the plan is:

PLAN=EnumerableCalc(expr#0..2=[{inputs}], tweet_id=[$t2])

  EnumerableLimit(fetch=[1])

    CassandraToEnumerableConverter

      CassandraTableScan(table=[[twissandra, userline]])


But if limit value no smaller than 8, such as for "select \"tweet_id\" from

\"userline\" limit 8", the plan is:

PLAN=CassandraToEnumerableConverter

  CassandraLimit(fetch=[8])

    CassandraProject(tweet_id=[$2])

      CassandraTableScan(table=[[twissandra, userline]])


Since computeSelfCost() depends on the row count, I guess that is why the

query plans are different. But I want to make sure the query plan stay the

same as the second one even the limit value is smaller than 8. How should I

tweak the computeSelfCost()?




2. If I do a inner join query that spans across cassandra and another

database, such as mongodb, an example for illustration purpose:

                "with\n" +

                "tweet as (\n" +

                "    select \"tweet_id\" as id from

\"twissandra\".\"userline\" where "\"username\"='!PUBLIC!' order by

\"time\")\n," +

                "metrics as (\n" +

                "    select \"id\" as id, \"metricName\", \"value\"\n" +

                "    from \"mongodb\".METRIC\n" +

                "    where \"metricName\" = 'cpu.usage.average'\n" +

                ")\n" +

                "select tweet.id, metrics.\"value\" from tweet\n" +

                "inner join metrics using(id)\n" +

                "where metrics.\"value\" > 70"


According to CassandraProjectRule, super(LogicalProject.class,

"CassandraProjectRule"), seems like as long as there is a

LogicalProject.class, this CassandraProjectRule  will get triggered. Does

this mean LogicalProject that belongs to mongodb will also get matched with

CassandraProjectRule? But when I do explain plan, the generated plan does

separate the projections correctly to Cassandra and mongodb. How can

Calcite tell which projection belong to which schema? Thanks.



Sincerely,

Junwei Li

Re: Two newbie questions regarding Calcite Cassandra Adapter

Reply via email to