Forking this thread to a new topic, it has strayed quite a bit from the
original discussion I think which was for server side Java implementations.


On Tue, Mar 15, 2022 at 9:14 AM James Duong <jam...@bitquilltech.com.invalid>
wrote:

> I could also see extensions to ODBC/JDBC being a point of confusion for app
> developers too.
>
> For example, if we were to add hooks in the JDBC driver to report endpoints
> so that
> applications can call getStream() directly, what would happen if the user
> started getting
> a stream then went back and tried to use the regular ResultSet interface? A
> stream
> would be consumed, but the driver wouldn't know it.
>
> On Tue, Mar 15, 2022 at 9:07 AM Kyle Porter <ky...@bitquilltech.com
> .invalid>
> wrote:
>
> > In general, I have problems with attempting to expose other extensions
> > through existing standards such as ODBC/JDBC. What it feels like we're
> > saying is: use the standard so you don't have to change any code, except
> > for this part where you must write custom code to take advantage of the
> > non-standard portions.
> >
> > At that point, why not just write something fully custom and take
> advantage
> > of the underlying interface?
> >
> > The higher level clients are meant to ease adoption and may be all that
> > existing applications use, but new applications can have a choice to use
> > the higher level clients or the lower level interface.
> >
> > *Kyle Porter*
> > CEO
> > Bit Quill Technologies Inc.
> > Office: +1.778.331.3355 | Direct: +1.604.441.7318 |
> ky...@bitquilltech.com
> > https://www.bitquill.com
> >
> > This email message is for the sole use of the intended recipient(s) and
> may
> > contain confidential and privileged information.  Any unauthorized
> review,
> > use, disclosure, or distribution is prohibited.  If you are not the
> > intended recipient, please contact the sender by reply email and destroy
> > all copies of the original message.  Thank you.
> >
> >
> > On Tue, Mar 15, 2022 at 7:55 AM David Li <lidav...@apache.org> wrote:
> >
> > > Aren't we getting a few things mixed up here?
> > >
> > > 1) As Micah says, the original proposal is about adapting Java types to
> > > Arrow. This can be used independently of Flight SQL. I don't think this
> > was
> > > being pitched as a standard itself unless I'm mistaken?
> > >
> > > 2) Flight SQL the protocol, which _is_ a language agnostic standard,
> > > though maybe not the one applications will generally choose to consume.
> > >
> > > 3) Idiomatic/standard per-language APIs that build on Flight SQL, which
> > > will include JDBC/ODBC (there is a reference JDBC driver in the works
> > [1]),
> > > but I agree there's room for something that uses Arrow types, supports
> > > partitioning, etc. as well. (And I agree there's room for something
> that
> > > supports these features but is _not_ Flight SQL underneath.)
> > >
> > > ---
> > >
> > > I'm not super experienced with JDBC/ODBC - would extending them
> basically
> > > mean something like (in JDBC) providing interfaces that Connections,
> > > ResultSets, etc. could be cast to to access the "Arrow-native" bits?
> And
> > in
> > > ODBC, using something like the SQL_C_BINARY type to 'tunnel' Arrow data
> > > through ODBC buffers, and/or providing a set of C API functions that
> > could
> > > convert between (say) an ODBC statement handle and an Arrow C Data
> > > Interface ArrowArrayStream?
> > >
> > > [1]: https://github.com/apache/arrow/pull/12254
> > >
> > > -David
> > >
> > > On Tue, Mar 15, 2022, at 01:06, Micah Kornfield wrote:
> > > > Hi Julian,
> > > >
> > > >
> > > >> I like Gavin’s idea of a data-frame API. But Gavin, if you want to
> > make
> > > it
> > > >> successful, build it on top of the leading API in each language
> (which
> > > in
> > > >> Java would be FlightSQL’s JDBC driver). I don’t see a good reason to
> > > expose
> > > >> through your API the fact that FlightSQL is underneath.
> > > >
> > > >
> > > > My understanding is that this thread is all about implementing a
> Flight
> > > > server and making those ergonomics easier.  On the client side, I
> think
> > > the
> > > > power of Flight/FlightSQL is two fold:
> > > > 1.  Reference ODBC/JDBC drivers that can consume the wire format
> (and I
> > > > think many clients will go this route).  I think these are in the
> > process
> > > > of being contributed already.  Which as you noted there is power in
> > > > standards, so I expect this avenue to see heavy use.
> > > > 2.  For clients that can handle it and want to go through the
> trouble,
> > > > consuming the data directly as Arrow for efficiency purposes.   I
> don't
> > > > think we've discussed canonical APIs by extending ODBC/JDBC but I
> like
> > > that
> > > > idea.  That seems like a discussion for after we have working
> JDBC/ODBC
> > > > reference implementation though?
> > > >
> > > > I might have missed it but I don't think either approach on the
> client
> > > side
> > > > has been discussed on this thread.  I also think this is why
> Dataframe
> > > > might not be the best name for the adapter because it comes with all
> > > sorts
> > > > of assumptions about usage both on a client and a server.
> > > >
> > > > Cheers,
> > > > Micah
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Mon, Mar 14, 2022 at 9:38 PM Julian Hyde <jhyde.apa...@gmail.com>
> > > wrote:
> > > >
> > > >> When I read “language-agnostic standard for data access” I cringed a
> > > >> little. (See [1].)
> > > >>
> > > >> Sure, it’s fun to create a new standard. But if your standard is
> > > >> successful, there will need to be a huge amount of work changing
> > > existing
> > > >> code to use your standard. That effort might even be difference
> > between
> > > >> success and failure for a small project, and therefore you have
> helped
> > > >> protect the incumbents.
> > > >>
> > > >> My solution?
> > > >>
> > > >> I would like the FlightSQL authors to make clear that it is a wire
> > > >> protocol, and only a protocol.
> > > >>
> > > >> Rather than creating new APIs, I would like people to spend their
> > effort
> > > >> implementing existing APIs (such as ODBC and JDBC) on top of
> > FlightSQL.
> > > >>
> > > >> If those APIs are inadequate (e.g. they don’t provide access to the
> > raw
> > > >> Arrow data, or don’t support INSERT or SELECT that are partitioned
> > > across
> > > >> several clients/servers), then add extensions to those APIs. But
> still
> > > >> implement the core APIs. When I describe a table from Java, I want
> to
> > a
> > > >> result set that exactly matches JDBC’s getTables [2].
> > > >>
> > > >> I like Gavin’s idea of a data-frame API. But Gavin, if you want to
> > make
> > > it
> > > >> successful, build it on top of the leading API in each language
> (which
> > > in
> > > >> Java would be FlightSQL’s JDBC driver). I don’t see a good reason to
> > > expose
> > > >> through your API the fact that FlightSQL is underneath.
> > > >>
> > > >> Julian
> > > >>
> > > >> [1] https://xkcd.com/927/ <https://xkcd.com/927/>
> > > >>
> > > >> [2]
> > > >>
> > >
> >
> https://docs.oracle.com/javase/8/docs/api/java/sql/DatabaseMetaData.html#getTables-java.lang.String-java.lang.String-java.lang.String-java.lang.String:A-
> > > >> <
> > > >>
> > >
> >
> https://docs.oracle.com/javase/8/docs/api/java/sql/DatabaseMetaData.html#getTables-java.lang.String-java.lang.String-java.lang.String-java.lang.String:A-
> > > >
> > > >>
> > > >>
> > > >>
> > > >> > On Mar 12, 2022, at 12:14 PM, Gavin Ray <ray.gavi...@gmail.com>
> > > wrote:
> > > >> >
> > > >> > While trying to implement and introduce the idea of adopting
> > > FlightSQL,
> > > >> the
> > > >> > largest challenge was the API itself
> > > >> >
> > > >> > I know it's meant to be low-level. But I found that most of the
> > > >> development
> > > >> > time was in code to convert to/from
> > > >> > row-based data (IE Map<String, Object>) and Java types, and
> columnar
> > > >> data +
> > > >> > Arrow types.
> > > >> >
> > > >> > I'm likely in the minority position here -- I know that Arrow and
> > > >> FlightSQL
> > > >> > users are largely looking at transferring large volumes of data
> and
> > > >> > servicing OLAP-type workloads
> > > >> > But the thing that excites me most about FlightSQL, isn't its
> > > performance
> > > >> > (always nice to have), but that it's a language-agnostic standard
> > for
> > > >> data
> > > >> > access.
> > > >> >
> > > >> > That has broad implications -- for all kinds of data-access
> > workloads
> > > and
> > > >> > business usecases.
> > > >> >
> > > >> > The challenge is that in trying to advocate for it, when
> presenting
> > a
> > > >> > proof-of-concept,
> > > >> > rather than what a developer might expect to see, something like:
> > > >> >
> > > >> > // FlightSQL handler code
> > > >> > List<Map<String, Object>> results = ....;
> > > >> > results.add(Map.of("id", 1, "name", "Person 1");
> > > >> > return results;
> > > >> >
> > > >> > A significant portion of the code is in Arrow-specific
> > implementation
> > > >> > details:
> > > >> > creating a VectorSchemaRoot, FieldVector, de-serializing the
> results
> > > on
> > > >> the
> > > >> > client, etc.
> > > >> >
> > > >> > Just curious whether there is any interest/intention of possibly
> > > making a
> > > >> > higher level API around the basic FlightSQL one?
> > > >> > Maybe something closer to the traditional notion of a row-based
> > > >> "DataFrame"
> > > >> > or "Table", like:
> > > >> >
> > > >> > DataFrame df = new DataFrame();
> > > >> > df.addColumn("id", ArrowTypes.Int);
> > > >> > df.addColumn("name", ArrowTypes.VarChar);
> > > >> > df.addRow(Map.of("id", 1, "name", "Person 1"));
> > > >> > VectorSchemaRoot root = df.toVectorSchemaRoot();
> > > >> > listener.setVectorSchemaRoot(root);
> > > >> > listener.sendVectorSchemaRootContents();
> > > >>
> > > >>
> > >
> >
>
>
> --
>
> *James Duong*
> Lead Software Developer
> Bit Quill Technologies Inc.
> Direct: +1.604.562.6082 | jam...@bitquilltech.com
> https://www.bitquilltech.com
>
> This email message is for the sole use of the intended recipient(s) and may
> contain confidential and privileged information.  Any unauthorized review,
> use, disclosure, or distribution is prohibited.  If you are not the
> intended recipient, please contact the sender by reply email and destroy
> all copies of the original message.  Thank you.
>

Reply via email to