@James Duong <jam...@bitquilltech.com>

You are absolutely right, I realized this and confirmed whether this
would be possible with Jacques to double-check.
It would amount to what I might call "dollar-store Substrait." It's not
elegant or a good solution, but definitely presents a good duct-tape hack
and is a crafty idea.

I agree with Jacques -- when you think about FlightSQL, what you are
attempting with a query isn't necessarily SQL, but a general data-compute
operation.
SQL just so happens to be a fairly universal way to express them, with an
ANSI standard, but FlightSQL doesn't recognize any particular subset of it
and for all intents and purposes it doesn't matter what the operation
string contains.

Substrait would make a fantastic logical next-feature because it's targeted
as a specification for expressing relational algebra and data-compute
operations
This more-or-less equates to SQL strings (in my mind at least) with a much
better toolkit and Dev UX. If there is anything I can do to help move this
forward, please let me know because I am extremely motivated to do so.

@David Li <git...@lidavidm.me>

Also agreed. Substrait is put together by folks much smarter than myself,
and if I had to hedge my bets, I'd put money on it being the future of
data-compute interop.
I would love nothing more than to adopt this technology and push it along.

Your project does sound interesting - basically, it sounds like a tabular
> data storage service with query pushdown?
>

Yeah this is more or less the details of it (my personal email, with
discretion assumed, is always open)

Imagine an environment where a backend wants to advertise some kind of
schema/data catalog

And then a central service introspects these backends, and dynamically
generates an API from the data catalogues/schemas, where requests get
proxied to the underlying backend service for each schema to actually be
executed

In text, the flow would look something like:


     <----> Data Provider Backend 0
Client <-----> Central Service <---> Generated API <----> Data-Provider
Backend 1

     <----> Data Provider Backend 2



On Thu, Mar 3, 2022 at 5:52 PM David Li <lidav...@apache.org> wrote:

> Gavin, thanks for sharing. I'm not so sure you'll find an alternative to
> Substrait, at least one that isn't even more nascent or one that's very
> tied to a particular language, so perhaps it might be better to get
> involved in Substrait and see if it suits your needs? Convincing a team to
> try something new can be hard, though, and it is somewhat of a moving
> target - but Flight SQL is in a similar spot, I think, as it's still
> getting enhancements.
>
> Your project does sound interesting - basically, it sounds like a tabular
> data storage service with query pushdown?
>
> On Thu, Mar 3, 2022, at 19:58, Jacques Nadeau wrote:
> > James, I agree that you could use JSON but that feels a bit hacky
> > (mis-use
> > of the paradigm). Instead, I'd really like to do something like David is
> > suggesting: support Substrait as an alternative to a SQL string.
> > Something like this:
> >
> https://github.com/jacques-n/arrow/commit/e22674fa882e77c2889cf95f69f6e3701db362bc
> >
> > It would be great if someone wanted to pick this up. It would be a nice
> > enhancement to FlightSQL (and provide a structured way to express
> > operations).
> >
> >
> >
> > On Thu, Mar 3, 2022 at 4:56 PM James Duong <jam...@bitquilltech.com
> .invalid>
> > wrote:
> >
> >> In the same way that you could write an ODBC driver that takes in text
> >> that's not SQL, you could write a Flight SQL server that takes in text
> >> that's JSON.
> >> Flight SQL doesn't parse the query, so you could create commands that
> are
> >> just JSON text.
> >>
> >> Is that the only bit you need, Gavin?
> >>
> >> On Thu, Mar 3, 2022 at 4:26 PM Gavin Ray <ray.gavi...@gmail.com> wrote:
> >>
> >> > I am enthusiastic about Substrait and have followed it's progress
> eagerly
> >> > =D
> >> >
> >> > When I presented it as a tentative option, there were reservations
> >> because
> >> > of the project/spec being young and the functionality still being
> >> > fleshed out.
> >> > I think if I were having this conversation in say, 8-16 months, it
> would
> >> > have been an easy choice, no doubt.
> >> >
> >> > On a public mailing list (and I can share more details in private if
> >> you're
> >> > curious), the gist of it is this:
> >> >
> >> > Some well-defined/backed-by-mature tech solution for expressing data
> >> > compute operations between services would be a useful thing to have
> >> > (Especially if it's language-agnostic)
> >> >
> >> > The goal is for an "implementing service" to have:
> >> > - An introspectable schema (IE, "describe yourself to me")
> >> > - A query/operation execution endpoint (IE: "perform this operation on
> >> your
> >> > data")
> >> >
> >> > With FlightSQL this is possible I believe, but it requires the
> operation
> >> to
> >> > be expressed as a SQL string which isn't ideal.
> >> >
> >> > Working with some programmatic, structured object that has the same
> >> > semantics ("Logical Plan", or whatnot) as a SQL query would have,
> would
> >> be
> >> > a better experience
> >> > (Jacques is on to something here!)
> >> >
> >> > This interface between services would be somewhat the equivalent of an
> >> > "SDK", so it would be nice to have a strongly-typed library for
> >> expressing
> >> > and building-up query/data-compute ops.
> >> >
> >> >
> >> > On Thu, Mar 3, 2022 at 3:17 PM David Li <lidav...@apache.org> wrote:
> >> >
> >> > > You probably want Substrait: https://substrait.io/
> >> > >
> >> > > Which is being worked on by several people, including Arrow
> community
> >> > > members.
> >> > >
> >> > > It might be interesting to generalize Flight SQL to include support
> for
> >> > > Substrait. I'm curious what your application, if you're able to
> share
> >> > more.
> >> > >
> >> > > -David
> >> > >
> >> > > On Thu, Mar 3, 2022, at 18:05, Gavin Ray wrote:
> >> > > > Hiya,
> >> > > >
> >> > > > I am drafting a proposal for a way to enable services to express
> data
> >> > > > compute operations to each other.
> >> > > >
> >> > > > However I think it'll be difficult to get buy-in if the only
> >> > > representation
> >> > > > for queries is as SQL strings.
> >> > > >
> >> > > > Is there any kind of lower-level API that can be used to express
> >> > > operations?
> >> > > >
> >> > > > IE instead of "SELECT name FROM user"
> >> > > >
> >> > > > A structured representation like:
> >> > > > {
> >> > > >   "op": "query",
> >> > > >   "schema": "user",
> >> > > >   "project": ["name"]
> >> > > > }
> >> > > >
> >> > > > Or maybe this is a bad idea/doesn't make sense?
> >> > > >
> >> > > > Thank you =)
> >> > >
> >> >
> >>
> >>
> >> --
> >>
> >> *James Duong*
> >> Lead Software Developer
> >> Bit Quill Technologies Inc.
> >> Direct: +1.604.562.6082 | jam...@bitquilltech.com
> >> https://www.bitquilltech.com
> >>
> >> This email message is for the sole use of the intended recipient(s) and
> may
> >> contain confidential and privileged information.  Any unauthorized
> review,
> >> use, disclosure, or distribution is prohibited.  If you are not the
> >> intended recipient, please contact the sender by reply email and destroy
> >> all copies of the original message.  Thank you.
> >>
>

Reply via email to