Thanks!
> I'm a bit assuming that the end goal is like hasura to avoid resolvers and
generate one big "sql" query for the whole graphql query.
Yeah -- there are a few benefits to using Calcite, but the biggest one
(IMO) is
that Calcite can handle planning and executing distributed/"federated"
queries
across multiple datasources.
I'm unsure about whether the best/most efficient way to handle resolvers
would
be to:
A) Side-step them altogether like you mention, and just intercept the
GraphQL
query + convert it to a single huge Relational Expression and then execute
it +
process the output so it's in the same JSON shape as the query.
B) Auto-generate individual resolvers that just delegate to Calcite. So for
a
table like "user", generating a resolver function that is pre-bound to:
RelBuilder
.select("user")
.filter(query.where)
.project(query.selectedFields)
Or whatnot. My guess would be that A) is more performant but I will have to
test
it to know for sure, haha.
(I am benchmarking every small piece of this with JMH to make sure it'll
actually be feasible as an end-product if everything works)
(Current tests show ~2ms on my personal laptop for end-to-end query parsing
and
resolving the data from local JDBC H2 with Calcite execution)
Benchmark Mode Cnt Score Error Units
calciteSchemaToGraphQLSchema avgt 10 0.277 ± 0.013 ms/op
parse_GraphQL_query avgt 10 0.055 ± 0.002 ms/op
convert_GraphQL_query_AST_to_TableQuery avgt 10 0.011 ± 0.001 ms/op
run_graphql_query_end_to_end avgt 10 2.049 ± 0.145 ms/op
> How are you planning on dealing with relationships? Like in hasura with
relationship metadata?
So I have modeled the conversion of a GQL query with nested selections as a
JOIN. The internal structure a GQL query is translated to currently looks
like
this:
TableSelection(
table = "users"
where = WhereClause(children=[
AND(children=[
EQUALS(column="account_confirmed", value=true)
OR(children=[
GREATER_THAN(column="balance", value=100)
LESS_THAN(column="balance", value=200)
])
])
])
select = SELECT(
columns=[
Column("id")
]
join=[
TableSelection(
table="todo"
where=WhereClause(children=[
AND(children=[
EQUALS(column="is_completed", value=false)
IN(column="name", value=["a", "b", "c"])
])
])
select = SELECT(
columns=[
Column("text")
Column("is_completed")
]
)
)
]
)
)
My thought is roughly this, as a pseudocode algorithm:
for table_selection in table_query.select.join:
base_query_relbuilder
.scan(table_selection.table)
.join(
INNER,
get_foreign_key_for_tables(table_selection.table, table_query.table)
)
Basically to walk through every nested JOIN and in the GQL query and
convert it
to a RelBuilder join. Not sure if it'll work out/maybe this logic is flawed.
> How are you planning to handle mutations? Classic resolvers?
I haven't got that far yet, to be honest haha. Calcite is a massive API,
and I'm
not familiar with Database/Relational Algebra stuff at all.
So it's taken me all this time just to work out the querying part. Maybe I
should have seen if everything was possible first, whoops.
I saw that Calcite has support for INSERT/UPDATE/DELETE statements though,
so I
just sort of assumed there would be some way of doing it + eventually I
could
figure it out. Maybe there isn't though.
> Calcite sql syntax is a bit more restricted than postgres. No returning on
insert, modifiable CTE, etc. So inserts from one big json blob seems not to
be
feasible. > > For me, select queries work well to convert to graphql json
output
but not inserts/updates because of the restrictions in calcite sql syntax.
Maybe
the calcite query plan is more permissive. I have not tried that.
That's unfortunate, but thanks for telling me. This is useful info to have.
To answer why I am targeting RelNodes and not SQL -- one of the biggest
features/goals of this project for me is to allow new data sources to be
added
over HTTP (or gRPC, whatever).
A real world use case: let's say I wanted to be able to query Airtable's
HTTP API,
with all the power of GraphQL and SQL operators.
Currently my thought is to parse + generate the parts of the query
(possibly the
whole query) that should be resolved from the Airtable data source, and then
send across the RelNode serialized as JSON to the other backend.
That backend is then responsible for reading the JSON RelNode query plan and
returning the right data.
This idea is basically a shitty version of the Substrait project that
Jacques
Nadeau is working on (which has a fledgling Calcite backend)
https://substrait.io https://github.com/substrait-io/substrait
On Mon, Jan 3, 2022 at 7:31 AM Martin Jonsson
<[email protected]> wrote:
> Hello Gavin.
> First of all, thanks for some nice work. Also interesting that you work
> for hasura which is a very interesting tool in the postgres ecosystem.
> I'm a bit assuming that the end goal is like hasura to avoid resolvers and
> generate one big "sql" query for the whole graphql query. Some
> questions:How are you planning on dealing with relationships? Like in
> hasura with relationship metadata? Separate data model?How are you planning
> to handle mutations? Classic resolvers? Calcite sql syntax is a bit more
> restricted than postgres. No returning on insert, modifiable CTE, etc. So
> inserts from one big json blob seems not to be feasible.
> I've been tinkering with calcite for the same use case, however with sql
> as transformed query output. For me, select queries work well to convert to
> graphql json output but not inserts/updates because of the restrictions in
> calcite sql syntax. Maybe the calcite query plan is more permissive. I have
> not tried that.
> /Martin
>
>
> Le dimanche 2 janvier 2022, 19:04:54 UTC+1, Gavin Ray <
> [email protected]> a écrit :
>
> I wanted to officially share the project I've been putting all my spare
> time
> into for a bit over a month. It's a tool to take Calcite data source
> schemas, and
> auto-generate a queryable GraphQL API from them.
>
> - https://github.com/GavinRay97/GraphQLCalcite
>
> (If you want to skip to "What does this do exactly?" illustrated visually,
> jump
> to the section: "Walkthrough of Current Progress")
>
> For some brief background -- my dayjob is at an open-source tool that does
> this
> same thing (GraphQL API generation for data sources).
> - https://github.com/hasura/graphql-engine
> - https://hasura.io/
>
> I'm very familiar with GraphQL, and Calcite seemed brilliant
> and interesting, so I thought it could be a fun project to try to
> re-write the core of Hasura's functionality with Calcite.
>
> My hunch is that there's potentially a significant performance increase to
> be
> had by using Calcite for the query planning and optimization. Particularly
> when
> querying data from multiple data sources + joining across them is involved.
>
> The "Roadmap" section lists a checklist of features currently implemented,
> and
> what's left to implement. I am not certain if all of it is possible. I am
> not
> certain Calcite will be faster (that would be a bummer).
>
> Right now it only implements queries, but I want/need to implement
> Mutations
> (insert/update/delete) and I have to figure out whether Calcite is capable
> of
> handling Subscriptions (live/streaming queries). I also have yet to figure
> out
> how to handle queries with JOIN's in them.
>
> But so far, I've had a lot of fun learning and building this. If anyone is
> interested in following along, I plan to build in public =) Also: Calcite
> is a
> large, complicated API. If anyone sees something stupid I'm doing, or has
> advice, I would be grateful for it.
>
> ===================================================================
> As a closing remark, I wanted to take a moment to thank some folks:
>
> - Julian, for answering a bunch of dumb questions and the BOSS workshop
> - Stamatis, for the Youtube videos he has on Calcite tutorials
> - Vladimir Ozerov, for the QuerifyLabs blogposts about Calcite, and the
> resources on Notion
> - Gerard Nico, for https://datacadamia.com/db/calcite/getting_started and
> https://github.com/gerardnico/calcite
> - Jacques, for the conversation he had with me
>