Thank you both for pushing this forward On Mon, Dec 12, 2022 at 1:14 PM James Duong <james.du...@improving.com.invalid> wrote:
> Hi David, > > I've written up the URI parsing in C++ and started adding session > management messages. I'm also planning on having the > ClientCookieMiddlewareFactory be able to report if sessions are enabled on > the server. > > I (or another developer) will send an update once those features are ready > for demo. > ________________________________ > From: David Li <lidav...@apache.org> > Sent: December 12, 2022 10:07 AM > To: dev@arrow.apache.org <dev@arrow.apache.org> > Subject: Re: DISCUSS: [FlightSQL] Catalog support > > Following up here, James are you interested in putting up a draft PR for > the Flight SQL URI format and for session management? > > The Flight SQL URI format would then also cover Andrew's use case. And if > someone wants to draw up a PR to the JDBC driver to enable arbitrary > properties, I can review that too. > > On Sat, Dec 3, 2022, at 05:38, Andrew Lamb wrote: > >> Andrew, do we need to look into adding more metadata to indicate > > different query languages? (It's quite a shame that we named this Flight > > SQL at this point...) > > > > TDLR is I don't think trying to explicitly support languages other than > SQL > > in FlightSQL is a good idea. Among other reasons, the JDBC / ODBC > drivers, > > which mostly assume SQL, are one of the key features of FlightSQL, and > they > > are likely not as useful for non SQL. I can see the argument to support > for > > substrait plans, and it will be interesting to see what use cases benefit > > from that support. > > > > What would make our life easier would be some standard way to pass > > application specific key/value pairs from the JDBC driver to a flight SQL > > backend with each request (perhaps via gRPC headers). This would allow > > passing configuration parameters that were not envisioned in the spec, > from > > end user (of the JDBC driver) all the way to our backend. > > > > Thanks again for driving this forward, > > Andrew > > > > On Thu, Dec 1, 2022 at 7:11 PM David Li <lidav...@apache.org> wrote: > > > >> Hey James, thanks for putting this up. > >> > >> Inline: > >> > >> > The suggestion is to make this part of Flight as an > >> > optional feature, rather than Flight SQL due to its applicability > outside > >> > of just database access. > >> > >> Which uses do you see? I see statefulness as a general antipattern here, > >> so I'm wary of introducing it beyond where we need it. > >> > >> > - The Flight client supplies a New-Session header which has key-value > >> pairs > >> > for initial session options. This header can be applied to any RPC > call, > >> > but logically should be the first one the client makes. > >> > >> Handshake already effectively serves as this RPC - maybe we could extend > >> it? (I also see Handshake as an antipattern because it's a stateful auth > >> mechanism.) > >> > >> Should the session timeout/be on a lease? (gRPC doesn't really give the > >> server a way to track the persistence of a particular client > connection.) > >> > >> > It's a bit asymmetric that creating a new session is done by applying > a > >> > header, but closing a session is an RPC call. This was so that session > >> > creation doesn't introduce another round trip before the first real > data > >> > request. If there's a way to batch RPC calls it might be better to > make > >> > session creation an RPC call. > >> > >> Is this a worrisome amount of overhead? > >> > >> Unfortunately gRPC doesn't batch RPC calls, but RPC calls on the same > >> client generally share the same TCP connection (modulo load balancing > >> behavior, but presumably that is not enabled if you want persistent > >> sessions). > >> > >> On the implementation side, I'd like to avoid baking this in too deeply > if > >> at all possible. Ideally it'd be implemented entirely as middleware, > >> possibly making use of an interface so applications can override the > >> session storage (hashtable, Redis, etcd, etc.) > >> > >> > Just to chime in on this, one thing I'm curious about is whether there > >> > will be support for user-defined catalog/schema hierarchy depth? > >> > >> Gavin, for ADBC we discussed adding a delimiter to the catalog name to > >> handle this case - maybe we can handle this by adding a property for the > >> delimiter to SqlInfo? > >> > >> > https://github.com/influxdata/influxdb_iox/issues/6102 > >> > >> Andrew, do we need to look into adding more metadata to indicate > different > >> query languages? (It's quite a shame that we named this Flight SQL at > this > >> point...) > >> > >> On Thu, Dec 1, 2022, at 08:49, Andrew Lamb wrote: > >> > Sorry for the late reply -- thank you James and David for this > >> discussion. > >> > > >> > I agree that adding Catalog support would be a valuable addition to > >> Flight > >> > SQL, and it recently came up as we begin to implement Flight SQL in > >> > InfluxDB IOx [1]. > >> > > >> >> - A standard URI scheme for Flight SQL that can be used by multiple > >> > client APIs (JDBC, ADBC, etc.) > >> > > >> > I agree this would be very valuable, along with a standard way > (ideally > >> > with HTTP headers) to send this information as part of the FlightSQL > gRPC > >> > requests. > >> > > >> >> I'd suggest we define session management features explicitly in > Flight > >> > (while being optional). > >> > > >> > I agree it is critical that server-side state is not required to > >> implement > >> > FlightSQL. Stateful connections would likely complicate deploying > >> FlightSQL > >> > in distributed systems. I suggest it should be possible to implement > any > >> > session management features by sending the entire session state with > the > >> > request, if desired. > >> > > >> > I don't have a strong opinion about the merits of including explicit > >> > session management features in FlightSQL. It seems to me that keeping > the > >> > API surface of FlightSQL minimal and implementation flexibility > maximal > >> > should be the default. However, if JDBC/ODBC driver compatibility > would > >> be > >> > improved with explicit state management APIs, then adding them to > >> FlightSQL > >> > seems like a good idea to me. > >> > > >> > Thanks again -- it is amazing to hit some issue in design and then > find > >> out > >> > the Arrow community is already hard at work on a solution. > >> > > >> > Andrew > >> > > >> > [1] https://github.com/influxdata/influxdb_iox/issues/6102 > >> > > >> > On Wed, Nov 30, 2022 at 7:17 PM Gavin Ray <ray.gavi...@gmail.com> > wrote: > >> > > >> >> Just to chime in on this, one thing I'm curious about is whether > there > >> >> will be support for user-defined catalog/schema hierarchy depth? > >> >> > >> >> This comment that James made does seem reasonable to me > >> >> > scheme://<host>:<port>/path-1/path-2/.../path-n > >> >> > >> >> Trino/Presto does a similar thing > (jdbc:trino://localhost:8080/tpch/sf1) > >> >> > >> >> At Hasura, what we do is have an alias "FullyQualifiedName" which is > >> >> just "Array<String>" > >> >> and the identifier to some element in a data source is always > >> >> fully-qualified: > >> >> > >> >> > https://github.com/hasura/graphql-engine/tree/master/dc-agents#schema > >> >> > >> >> ["postgres_1", "db1", "schema2", "my_table", "col_a"] > >> >> ["mongo", "db1", "collection_a", "field_a"] > >> >> ["csv_adapter", "myfile.csv", "col_x"] > >> >> > >> >> On Wed, Nov 30, 2022 at 6:31 PM James Duong > >> >> <jam...@bitquilltech.com.invalid> wrote: > >> >> > > >> >> > Our current convention of sending connection properties as headers > >> with > >> >> > every request has the benefit of making statefulness optional, but > has > >> >> the > >> >> > drawback of sending redundant, unused properties on requests after > the > >> >> > first, which increases the payload size unnecessarily. > >> >> > > >> >> > I'd suggest we define session management features explicitly in > Flight > >> >> > (while being optional). The suggestion is to make this part of > Flight > >> as > >> >> an > >> >> > optional feature, rather than Flight SQL due to its applicability > >> outside > >> >> > of just database access. > >> >> > > >> >> > Creating a session: > >> >> > - The Flight client supplies a New-Session header which has > key-value > >> >> pairs > >> >> > for initial session options. This header can be applied to any RPC > >> call, > >> >> > but logically should be the first one the client makes. > >> >> > - The server should send a Set-Cookie header back containing some > >> >> > server-side representation of the session that the client can use > in > >> >> > subsequent requests. > >> >> > - The path specified in the URI is sent as a "Catalog" session > option. > >> >> > > >> >> > Modifying session options: > >> >> > - A separate RPC call that takes in a Stream<string, string> > >> representing > >> >> > each session option that is being modified and returns a stream of > >> >> statuses > >> >> > to indicate if the setting change was accepted. > >> >> > - This RPC call is only valid when the Cookie header is used. > >> >> > - It is up to the server to define if a failed session property > >> change is > >> >> > fatal or if other properties can continue to be set. > >> >> > > >> >> > Closing a session: > >> >> > - A separate RPC call that tells the server to drop the session > >> specified > >> >> > by the Cookie header. > >> >> > > >> >> > Notes: > >> >> > A Flight SQL client would check if session management RPCs are > >> supported > >> >> > through a new GetSqlInfo property. A Flight client doesn't have a > way > >> to > >> >> do > >> >> > this generically, but there could be an application-specific RPC or > >> >> header > >> >> > that reports this metadata. > >> >> > > >> >> > The O/JDBC and ADBC drivers would need to be updated to > >> programmatically > >> >> > check for session management RPCs. If unsupported, then use the old > >> >> > behavior of sending all properties as headers with each request. If > >> >> > supported, make use of the New-Session header and drop the session > >> when > >> >> > closing the client-side connection. > >> >> > > >> >> > It's a bit asymmetric that creating a new session is done by > applying > >> a > >> >> > header, but closing a session is an RPC call. This was so that > session > >> >> > creation doesn't introduce another round trip before the first real > >> data > >> >> > request. If there's a way to batch RPC calls it might be better to > >> make > >> >> > session creation an RPC call. > >> >> > > >> >> > On Tue, Nov 22, 2022 at 3:16 PM David Li <lidav...@apache.org> > wrote: > >> >> > > >> >> > > It sounds reasonable - then there are three points: > >> >> > > > >> >> > > - A standard URI scheme for Flight SQL that can be used by > multiple > >> >> client > >> >> > > APIs (JDBC, ADBC, etc.) > >> >> > > - A standard scheme for session data (likely header/cookie-based) > >> >> > > - A mapping from URI parameters and fields to session data > >> >> > > > >> >> > > > >> >> > > > >> >> > > On Tue, Nov 22, 2022, at 17:45, James Duong wrote: > >> >> > > > Just following up on this and if there are any thoughts. > >> >> > > > > >> >> > > > The purpose would be to standardize how we specify access to > some > >> >> named > >> >> > > > logical grouping of data. This would make it easy to model > >> >> catalog/schema > >> >> > > > semantics in Flight SQL. > >> >> > > > > >> >> > > > Having this be part of the connection URI makes it similar to > >> >> specifying > >> >> > > a > >> >> > > > resource in an HTTP URL (ie an endpoint) which should make it > easy > >> >> for > >> >> > > end > >> >> > > > users to work with and modify. > >> >> > > > > >> >> > > > On Fri, Nov 18, 2022 at 3:17 PM James Duong < > >> jam...@bitquilltech.com > >> >> > > >> >> > > wrote: > >> >> > > > > >> >> > > >> As for surfacing catalogs itself, perhaps we allow the URI > take > >> in a > >> >> > > path > >> >> > > >> and treat that as a way of specifying a multi-level resource > that > >> >> which > >> >> > > the > >> >> > > >> FlightClient is connecting to: > >> >> > > >> > >> >> > > >> eg a connection URI of the form: > >> >> > > >> scheme://<host>:<port>/path-1/path-2/.../path-n > >> >> > > >> > >> >> > > >> The FlightClient could send this path as either a header or a > >> >> session > >> >> > > >> property (with a neutral name like 'resource-path'). Flight > SQL > >> >> > > Producers > >> >> > > >> could interpret this as a catalog or schema. > >> >> > > >> eg > >> >> > > >> grpc://<host>:<port>/catalog/schema > >> >> > > >> > >> >> > > >> On Fri, Nov 11, 2022 at 2:07 AM James Henderson <j...@juxt.pro > > > >> >> wrote: > >> >> > > >> > >> >> > > >>> Sounds good to me. > >> >> > > >>> > >> >> > > >>> > Are you interested in writing up a (sketch of a) proposal? > >> >> > > >>> > >> >> > > >>> Yep, can do - I'm OoO over the next couple of weeks so might > be > >> a > >> >> bit > >> >> > > >>> intermittent. > >> >> > > >>> > >> >> > > >>> On Thu, 10 Nov 2022 at 15:28, David Li <lidav...@apache.org> > >> >> wrote: > >> >> > > >>> > >> >> > > >>> > Hey James H., > >> >> > > >>> > > >> >> > > >>> > That would make sense to me. So it sounds like we'd want > >> >> > > >>> > > >> >> > > >>> > - Formal specification of using cookies/headers to mark a > >> >> 'session' > >> >> > > (I > >> >> > > >>> > guess this will be a little inconsistent with transactions, > >> >> though) > >> >> > > >>> > - Adding RPCs to query session values > >> >> > > >>> > - Adding RPCs to set session values > >> >> > > >>> > - Listing standard values and types > >> >> > > >>> > > >> >> > > >>> > Some things may require more consideration, e.g. > transaction > >> >> > > isolation > >> >> > > >>> > might be better off as part of the transaction RPCs than an > >> >> ambient > >> >> > > >>> > property. Are you interested in writing up a (sketch of a) > >> >> proposal? > >> >> > > >>> > > >> >> > > >>> > -David > >> >> > > >>> > > >> >> > > >>> > On Thu, Nov 10, 2022, at 10:09, James Henderson wrote: > >> >> > > >>> > > Similarly, we're also currently considering how best to > >> >> implement > >> >> > > >>> some of > >> >> > > >>> > > the SQL standard session variables in our Flight SQL > server > >> - > >> >> > > things > >> >> > > >>> like > >> >> > > >>> > > current transaction isolation level, access mode, time > zone > >> >> etc, > >> >> > > which > >> >> > > >>> > seem > >> >> > > >>> > > to have similar properties to the (traditional) > connection's > >> >> > > current > >> >> > > >>> > > catalog. We'd (perhaps naively) looked at solutions > >> involving > >> >> the > >> >> > > >>> Flight > >> >> > > >>> > > client's `ClientCookieMiddleware`, but might these have > >> >> > > standardised > >> >> > > >>> > > support within Flight SQL itself eventually? > >> >> > > >>> > > > >> >> > > >>> > > Cheers, > >> >> > > >>> > > > >> >> > > >>> > > James > >> >> > > >>> > > > >> >> > > >>> > > On Wed, 9 Nov 2022 at 21:51, David Li < > lidav...@apache.org> > >> >> wrote: > >> >> > > >>> > > > >> >> > > >>> > >> I think having better support for this makes sense, but > >> >> perhaps we > >> >> > > >>> can > >> >> > > >>> > >> find a way to make it not tied to the connection itself? > >> For > >> >> > > >>> instance, > >> >> > > >>> > in > >> >> > > >>> > >> the same way transactions were implemented (as a > handle). > >> Or > >> >> > > rather, > >> >> > > >>> > >> instead of adding connection statefulness to Flight RPC, > >> I'd > >> >> > > rather > >> >> > > >>> try > >> >> > > >>> > to > >> >> > > >>> > >> work within the gRPC/RPC paradigm. > >> >> > > >>> > >> > >> >> > > >>> > >> On Wed, Nov 9, 2022, at 16:47, James Duong wrote: > >> >> > > >>> > >> > Databases such as SQL Server and PostgreSQL have the > >> >> concept of > >> >> > > >>> > catalogs > >> >> > > >>> > >> as > >> >> > > >>> > >> > containers of database schemas. Users can usually > >> specify an > >> >> > > >>> initial > >> >> > > >>> > >> > catalog during the connection process, list catalogs, > and > >> >> > > sometimes > >> >> > > >>> > >> change > >> >> > > >>> > >> > catalogs during the session. > >> >> > > >>> > >> > > >> >> > > >>> > >> > Currently catalog support in Flight SQL is somewhat > >> >> limited. The > >> >> > > >>> > protocol > >> >> > > >>> > >> > provides a way to list catalogs as well as metadata in > >> >> > > SqlTypeInfo > >> >> > > >>> for > >> >> > > >>> > >> > reporting how catalogs are supported from a syntax > >> >> perspective. > >> >> > > >>> > >> > > >> >> > > >>> > >> > ODBC and JDBC provide API calls for changing the > catalog. > >> >> > > >>> > Additionally, > >> >> > > >>> > >> > Flight SQL doesn't really provide the concept of > >> "initial" > >> >> > > >>> connection > >> >> > > >>> > >> > properties (such as a starting catalog) since Flight > >> itself > >> >> is > >> >> > > >>> > stateless > >> >> > > >>> > >> > from a connection perspective. > >> >> > > >>> > >> > > >> >> > > >>> > >> > To support catalogs properly, I'd imagine we need to > make > >> >> some > >> >> > > >>> > changes to > >> >> > > >>> > >> > the Flight SQL protocol: > >> >> > > >>> > >> > - Introduce the concept of connection-time properties > >> >> (perhaps > >> >> > > an > >> >> > > >>> > >> optional > >> >> > > >>> > >> > RPC for Flight SQL applications that need this) > >> >> > > >>> > >> > - Related to the above, expand the connection URL and > >> Java > >> >> > > builder > >> >> > > >>> to > >> >> > > >>> > >> allow > >> >> > > >>> > >> > arbitrary application-specific properties. > >> >> > > >>> > >> > - Add optional RPCs for changing the catalog and > relevant > >> >> error > >> >> > > >>> codes > >> >> > > >>> > if > >> >> > > >>> > >> > this is not permitted. > >> >> > > >>> > >> > > >> >> > > >>> > >> > > >> >> > > >>> > >> > -- > >> >> > > >>> > >> > > >> >> > > >>> > >> > *James Duong* > >> >> > > >>> > >> > Lead Software Developer > >> >> > > >>> > >> > Bit Quill Technologies Inc. > >> >> > > >>> > >> > Direct: +1.604.562.6082 | jam...@bitquilltech.com > >> >> > > >>> > >> > https://www.bitquilltech.com > >> >> > > >>> > >> > > >> >> > > >>> > >> > This email message is for the sole use of the intended > >> >> > > recipient(s) > >> >> > > >>> > and > >> >> > > >>> > >> may > >> >> > > >>> > >> > contain confidential and privileged information. Any > >> >> > > unauthorized > >> >> > > >>> > >> review, > >> >> > > >>> > >> > use, disclosure, or distribution is prohibited. If > you > >> are > >> >> not > >> >> > > the > >> >> > > >>> > >> > intended recipient, please contact the sender by reply > >> >> email and > >> >> > > >>> > destroy > >> >> > > >>> > >> > all copies of the original message. Thank you. > >> >> > > >>> > >> > >> >> > > >>> > > > >> >> > > >>> > > > >> >> > > >>> > > -- > >> >> > > >>> > > *James Henderson* > >> >> > > >>> > > XTDB Developer at *JUXT* > >> >> > > >>> > > Email j...@juxt.pro > >> >> > > >>> > > Website https://juxt.pro > >> >> > > >>> > > > >> >> > > >>> > > [image: photo] > >> >> > > >>> > > >> >> > > >>> > >> >> > > >>> > >> >> > > >>> -- > >> >> > > >>> *James Henderson* > >> >> > > >>> XTDB Developer at *JUXT* > >> >> > > >>> Email j...@juxt.pro > >> >> > > >>> Website https://juxt.pro > >> >> > > >>> > >> >> > > >>> [image: photo] > >> >> > > >>> > >> >> > > >> > >> >> > > >> > >> >> > > >> -- > >> >> > > >> > >> >> > > >> *James Duong* > >> >> > > >> Lead Software Developer > >> >> > > >> Bit Quill Technologies Inc. > >> >> > > >> Direct: +1.604.562.6082 | jam...@bitquilltech.com > >> >> > > >> https://www.bitquilltech.com > >> >> > > >> > >> >> > > >> This email message is for the sole use of the intended > >> recipient(s) > >> >> and > >> >> > > >> may contain confidential and privileged information. Any > >> >> unauthorized > >> >> > > >> review, use, disclosure, or distribution is prohibited. If > you > >> are > >> >> not > >> >> > > the > >> >> > > >> intended recipient, please contact the sender by reply email > and > >> >> destroy > >> >> > > >> all copies of the original message. Thank you. > >> >> > > >> > >> >> > > > > >> >> > > > > >> >> > > > -- > >> >> > > > > >> >> > > > *James Duong* > >> >> > > > Lead Software Developer > >> >> > > > Bit Quill Technologies Inc. > >> >> > > > Direct: +1.604.562.6082 | jam...@bitquilltech.com > >> >> > > > https://www.bitquilltech.com > >> >> > > > > >> >> > > > This email message is for the sole use of the intended > >> recipient(s) > >> >> and > >> >> > > may > >> >> > > > contain confidential and privileged information. Any > unauthorized > >> >> > > review, > >> >> > > > use, disclosure, or distribution is prohibited. If you are not > >> the > >> >> > > > intended recipient, please contact the sender by reply email > and > >> >> destroy > >> >> > > > all copies of the original message. Thank you. > >> >> > > > >> >> > > >> >> > > >> >> > -- > >> >> > > >> >> > *James Duong* > >> >> > Lead Software Developer > >> >> > Bit Quill Technologies Inc. > >> >> > Direct: +1.604.562.6082 | jam...@bitquilltech.com > >> >> > https://www.bitquilltech.com > >> >> > > >> >> > This email message is for the sole use of the intended recipient(s) > >> and > >> >> may > >> >> > contain confidential and privileged information. Any unauthorized > >> >> review, > >> >> > use, disclosure, or distribution is prohibited. If you are not the > >> >> > intended recipient, please contact the sender by reply email and > >> destroy > >> >> > all copies of the original message. Thank you. > >> >> > >> >