Hi Sutou, I saw your PostgreSQL project and thought it was quite interesting, especially given the number of PostgreSQL-compatible databases.
Paul Nienaber will be picking up implementation of the catalog feature going forward and can provide an update Get Outlook for Android<https://aka.ms/AAb9ysg> ________________________________ From: Sutou Kouhei <k...@clear-code.com> Sent: Thursday, February 9, 2023, 22:25 To: dev@arrow.apache.org <dev@arrow.apache.org> Subject: Re: DISCUSS: [FlightSQL] Catalog support Hi James Is there any progress of this? I'm developing a Flight SQL adapter for PostgreSQL: https://github.com/apache/arrow-flight-sql-postgresql I want to implement session feature for it because opening a session in PostgreSQL is expensive. PostgreSQL uses one process per session. If we open and close a session for each Flight SQL call, we need to start one process for each Flight SQL call. I noticed that the current Flight SQL specification doesn't provide the standard session support. So I'm interesting in this discussion. Background: https://github.com/apache/arrow-flight-sql-postgresql/issues/13 Thanks, -- kou In <ch2pr13mb3831b71269c257469631fb5de3...@ch2pr13mb3831.namprd13.prod.outlook.com> "Re: DISCUSS: [FlightSQL] Catalog support" on Mon, 12 Dec 2022 18:12:06 +0000, James Duong <james.du...@improving.com.INVALID> wrote: > Hi David, > > I've written up the URI parsing in C++ and started adding session management > messages. I'm also planning on having the ClientCookieMiddlewareFactory be > able to report if sessions are enabled on the server. > > I (or another developer) will send an update once those features are ready > for demo. > ________________________________ > From: David Li <lidav...@apache.org> > Sent: December 12, 2022 10:07 AM > To: dev@arrow.apache.org <dev@arrow.apache.org> > Subject: Re: DISCUSS: [FlightSQL] Catalog support > > Following up here, James are you interested in putting up a draft PR for the > Flight SQL URI format and for session management? > > The Flight SQL URI format would then also cover Andrew's use case. And if > someone wants to draw up a PR to the JDBC driver to enable arbitrary > properties, I can review that too. > > On Sat, Dec 3, 2022, at 05:38, Andrew Lamb wrote: >>> Andrew, do we need to look into adding more metadata to indicate >> different query languages? (It's quite a shame that we named this Flight >> SQL at this point...) >> >> TDLR is I don't think trying to explicitly support languages other than SQL >> in FlightSQL is a good idea. Among other reasons, the JDBC / ODBC drivers, >> which mostly assume SQL, are one of the key features of FlightSQL, and they >> are likely not as useful for non SQL. I can see the argument to support for >> substrait plans, and it will be interesting to see what use cases benefit >> from that support. >> >> What would make our life easier would be some standard way to pass >> application specific key/value pairs from the JDBC driver to a flight SQL >> backend with each request (perhaps via gRPC headers). This would allow >> passing configuration parameters that were not envisioned in the spec, from >> end user (of the JDBC driver) all the way to our backend. >> >> Thanks again for driving this forward, >> Andrew >> >> On Thu, Dec 1, 2022 at 7:11 PM David Li <lidav...@apache.org> wrote: >> >>> Hey James, thanks for putting this up. >>> >>> Inline: >>> >>> > The suggestion is to make this part of Flight as an >>> > optional feature, rather than Flight SQL due to its applicability outside >>> > of just database access. >>> >>> Which uses do you see? I see statefulness as a general antipattern here, >>> so I'm wary of introducing it beyond where we need it. >>> >>> > - The Flight client supplies a New-Session header which has key-value >>> pairs >>> > for initial session options. This header can be applied to any RPC call, >>> > but logically should be the first one the client makes. >>> >>> Handshake already effectively serves as this RPC - maybe we could extend >>> it? (I also see Handshake as an antipattern because it's a stateful auth >>> mechanism.) >>> >>> Should the session timeout/be on a lease? (gRPC doesn't really give the >>> server a way to track the persistence of a particular client connection.) >>> >>> > It's a bit asymmetric that creating a new session is done by applying a >>> > header, but closing a session is an RPC call. This was so that session >>> > creation doesn't introduce another round trip before the first real data >>> > request. If there's a way to batch RPC calls it might be better to make >>> > session creation an RPC call. >>> >>> Is this a worrisome amount of overhead? >>> >>> Unfortunately gRPC doesn't batch RPC calls, but RPC calls on the same >>> client generally share the same TCP connection (modulo load balancing >>> behavior, but presumably that is not enabled if you want persistent >>> sessions). >>> >>> On the implementation side, I'd like to avoid baking this in too deeply if >>> at all possible. Ideally it'd be implemented entirely as middleware, >>> possibly making use of an interface so applications can override the >>> session storage (hashtable, Redis, etcd, etc.) >>> >>> > Just to chime in on this, one thing I'm curious about is whether there >>> > will be support for user-defined catalog/schema hierarchy depth? >>> >>> Gavin, for ADBC we discussed adding a delimiter to the catalog name to >>> handle this case - maybe we can handle this by adding a property for the >>> delimiter to SqlInfo? >>> >>> > https://github.com/influxdata/influxdb_iox/issues/6102 >>> >>> Andrew, do we need to look into adding more metadata to indicate different >>> query languages? (It's quite a shame that we named this Flight SQL at this >>> point...) >>> >>> On Thu, Dec 1, 2022, at 08:49, Andrew Lamb wrote: >>> > Sorry for the late reply -- thank you James and David for this >>> discussion. >>> > >>> > I agree that adding Catalog support would be a valuable addition to >>> Flight >>> > SQL, and it recently came up as we begin to implement Flight SQL in >>> > InfluxDB IOx [1]. >>> > >>> >> - A standard URI scheme for Flight SQL that can be used by multiple >>> > client APIs (JDBC, ADBC, etc.) >>> > >>> > I agree this would be very valuable, along with a standard way (ideally >>> > with HTTP headers) to send this information as part of the FlightSQL gRPC >>> > requests. >>> > >>> >> I'd suggest we define session management features explicitly in Flight >>> > (while being optional). >>> > >>> > I agree it is critical that server-side state is not required to >>> implement >>> > FlightSQL. Stateful connections would likely complicate deploying >>> FlightSQL >>> > in distributed systems. I suggest it should be possible to implement any >>> > session management features by sending the entire session state with the >>> > request, if desired. >>> > >>> > I don't have a strong opinion about the merits of including explicit >>> > session management features in FlightSQL. It seems to me that keeping the >>> > API surface of FlightSQL minimal and implementation flexibility maximal >>> > should be the default. However, if JDBC/ODBC driver compatibility would >>> be >>> > improved with explicit state management APIs, then adding them to >>> FlightSQL >>> > seems like a good idea to me. >>> > >>> > Thanks again -- it is amazing to hit some issue in design and then find >>> out >>> > the Arrow community is already hard at work on a solution. >>> > >>> > Andrew >>> > >>> > [1] https://github.com/influxdata/influxdb_iox/issues/6102 >>> > >>> > On Wed, Nov 30, 2022 at 7:17 PM Gavin Ray <ray.gavi...@gmail.com> wrote: >>> > >>> >> Just to chime in on this, one thing I'm curious about is whether there >>> >> will be support for user-defined catalog/schema hierarchy depth? >>> >> >>> >> This comment that James made does seem reasonable to me >>> >> > scheme://<host>:<port>/path-1/path-2/.../path-n >>> >> >>> >> Trino/Presto does a similar thing (jdbc:trino://localhost:8080/tpch/sf1) >>> >> >>> >> At Hasura, what we do is have an alias "FullyQualifiedName" which is >>> >> just "Array<String>" >>> >> and the identifier to some element in a data source is always >>> >> fully-qualified: >>> >> >>> >> https://github.com/hasura/graphql-engine/tree/master/dc-agents#schema >>> >> >>> >> ["postgres_1", "db1", "schema2", "my_table", "col_a"] >>> >> ["mongo", "db1", "collection_a", "field_a"] >>> >> ["csv_adapter", "myfile.csv", "col_x"] >>> >> >>> >> On Wed, Nov 30, 2022 at 6:31 PM James Duong >>> >> <jam...@bitquilltech.com.invalid> wrote: >>> >> > >>> >> > Our current convention of sending connection properties as headers >>> with >>> >> > every request has the benefit of making statefulness optional, but has >>> >> the >>> >> > drawback of sending redundant, unused properties on requests after the >>> >> > first, which increases the payload size unnecessarily. >>> >> > >>> >> > I'd suggest we define session management features explicitly in Flight >>> >> > (while being optional). The suggestion is to make this part of Flight >>> as >>> >> an >>> >> > optional feature, rather than Flight SQL due to its applicability >>> outside >>> >> > of just database access. >>> >> > >>> >> > Creating a session: >>> >> > - The Flight client supplies a New-Session header which has key-value >>> >> pairs >>> >> > for initial session options. This header can be applied to any RPC >>> call, >>> >> > but logically should be the first one the client makes. >>> >> > - The server should send a Set-Cookie header back containing some >>> >> > server-side representation of the session that the client can use in >>> >> > subsequent requests. >>> >> > - The path specified in the URI is sent as a "Catalog" session option. >>> >> > >>> >> > Modifying session options: >>> >> > - A separate RPC call that takes in a Stream<string, string> >>> representing >>> >> > each session option that is being modified and returns a stream of >>> >> statuses >>> >> > to indicate if the setting change was accepted. >>> >> > - This RPC call is only valid when the Cookie header is used. >>> >> > - It is up to the server to define if a failed session property >>> change is >>> >> > fatal or if other properties can continue to be set. >>> >> > >>> >> > Closing a session: >>> >> > - A separate RPC call that tells the server to drop the session >>> specified >>> >> > by the Cookie header. >>> >> > >>> >> > Notes: >>> >> > A Flight SQL client would check if session management RPCs are >>> supported >>> >> > through a new GetSqlInfo property. A Flight client doesn't have a way >>> to >>> >> do >>> >> > this generically, but there could be an application-specific RPC or >>> >> header >>> >> > that reports this metadata. >>> >> > >>> >> > The O/JDBC and ADBC drivers would need to be updated to >>> programmatically >>> >> > check for session management RPCs. If unsupported, then use the old >>> >> > behavior of sending all properties as headers with each request. If >>> >> > supported, make use of the New-Session header and drop the session >>> when >>> >> > closing the client-side connection. >>> >> > >>> >> > It's a bit asymmetric that creating a new session is done by applying >>> a >>> >> > header, but closing a session is an RPC call. This was so that session >>> >> > creation doesn't introduce another round trip before the first real >>> data >>> >> > request. If there's a way to batch RPC calls it might be better to >>> make >>> >> > session creation an RPC call. >>> >> > >>> >> > On Tue, Nov 22, 2022 at 3:16 PM David Li <lidav...@apache.org> wrote: >>> >> > >>> >> > > It sounds reasonable - then there are three points: >>> >> > > >>> >> > > - A standard URI scheme for Flight SQL that can be used by multiple >>> >> client >>> >> > > APIs (JDBC, ADBC, etc.) >>> >> > > - A standard scheme for session data (likely header/cookie-based) >>> >> > > - A mapping from URI parameters and fields to session data >>> >> > > >>> >> > > >>> >> > > >>> >> > > On Tue, Nov 22, 2022, at 17:45, James Duong wrote: >>> >> > > > Just following up on this and if there are any thoughts. >>> >> > > > >>> >> > > > The purpose would be to standardize how we specify access to some >>> >> named >>> >> > > > logical grouping of data. This would make it easy to model >>> >> catalog/schema >>> >> > > > semantics in Flight SQL. >>> >> > > > >>> >> > > > Having this be part of the connection URI makes it similar to >>> >> specifying >>> >> > > a >>> >> > > > resource in an HTTP URL (ie an endpoint) which should make it easy >>> >> for >>> >> > > end >>> >> > > > users to work with and modify. >>> >> > > > >>> >> > > > On Fri, Nov 18, 2022 at 3:17 PM James Duong < >>> jam...@bitquilltech.com >>> >> > >>> >> > > wrote: >>> >> > > > >>> >> > > >> As for surfacing catalogs itself, perhaps we allow the URI take >>> in a >>> >> > > path >>> >> > > >> and treat that as a way of specifying a multi-level resource that >>> >> which >>> >> > > the >>> >> > > >> FlightClient is connecting to: >>> >> > > >> >>> >> > > >> eg a connection URI of the form: >>> >> > > >> scheme://<host>:<port>/path-1/path-2/.../path-n >>> >> > > >> >>> >> > > >> The FlightClient could send this path as either a header or a >>> >> session >>> >> > > >> property (with a neutral name like 'resource-path'). Flight SQL >>> >> > > Producers >>> >> > > >> could interpret this as a catalog or schema. >>> >> > > >> eg >>> >> > > >> grpc://<host>:<port>/catalog/schema >>> >> > > >> >>> >> > > >> On Fri, Nov 11, 2022 at 2:07 AM James Henderson <j...@juxt.pro> >>> >> wrote: >>> >> > > >> >>> >> > > >>> Sounds good to me. >>> >> > > >>> >>> >> > > >>> > Are you interested in writing up a (sketch of a) proposal? >>> >> > > >>> >>> >> > > >>> Yep, can do - I'm OoO over the next couple of weeks so might be >>> a >>> >> bit >>> >> > > >>> intermittent. >>> >> > > >>> >>> >> > > >>> On Thu, 10 Nov 2022 at 15:28, David Li <lidav...@apache.org> >>> >> wrote: >>> >> > > >>> >>> >> > > >>> > Hey James H., >>> >> > > >>> > >>> >> > > >>> > That would make sense to me. So it sounds like we'd want >>> >> > > >>> > >>> >> > > >>> > - Formal specification of using cookies/headers to mark a >>> >> 'session' >>> >> > > (I >>> >> > > >>> > guess this will be a little inconsistent with transactions, >>> >> though) >>> >> > > >>> > - Adding RPCs to query session values >>> >> > > >>> > - Adding RPCs to set session values >>> >> > > >>> > - Listing standard values and types >>> >> > > >>> > >>> >> > > >>> > Some things may require more consideration, e.g. transaction >>> >> > > isolation >>> >> > > >>> > might be better off as part of the transaction RPCs than an >>> >> ambient >>> >> > > >>> > property. Are you interested in writing up a (sketch of a) >>> >> proposal? >>> >> > > >>> > >>> >> > > >>> > -David >>> >> > > >>> > >>> >> > > >>> > On Thu, Nov 10, 2022, at 10:09, James Henderson wrote: >>> >> > > >>> > > Similarly, we're also currently considering how best to >>> >> implement >>> >> > > >>> some of >>> >> > > >>> > > the SQL standard session variables in our Flight SQL server >>> - >>> >> > > things >>> >> > > >>> like >>> >> > > >>> > > current transaction isolation level, access mode, time zone >>> >> etc, >>> >> > > which >>> >> > > >>> > seem >>> >> > > >>> > > to have similar properties to the (traditional) connection's >>> >> > > current >>> >> > > >>> > > catalog. We'd (perhaps naively) looked at solutions >>> involving >>> >> the >>> >> > > >>> Flight >>> >> > > >>> > > client's `ClientCookieMiddleware`, but might these have >>> >> > > standardised >>> >> > > >>> > > support within Flight SQL itself eventually? >>> >> > > >>> > > >>> >> > > >>> > > Cheers, >>> >> > > >>> > > >>> >> > > >>> > > James >>> >> > > >>> > > >>> >> > > >>> > > On Wed, 9 Nov 2022 at 21:51, David Li <lidav...@apache.org> >>> >> wrote: >>> >> > > >>> > > >>> >> > > >>> > >> I think having better support for this makes sense, but >>> >> perhaps we >>> >> > > >>> can >>> >> > > >>> > >> find a way to make it not tied to the connection itself? >>> For >>> >> > > >>> instance, >>> >> > > >>> > in >>> >> > > >>> > >> the same way transactions were implemented (as a handle). >>> Or >>> >> > > rather, >>> >> > > >>> > >> instead of adding connection statefulness to Flight RPC, >>> I'd >>> >> > > rather >>> >> > > >>> try >>> >> > > >>> > to >>> >> > > >>> > >> work within the gRPC/RPC paradigm. >>> >> > > >>> > >> >>> >> > > >>> > >> On Wed, Nov 9, 2022, at 16:47, James Duong wrote: >>> >> > > >>> > >> > Databases such as SQL Server and PostgreSQL have the >>> >> concept of >>> >> > > >>> > catalogs >>> >> > > >>> > >> as >>> >> > > >>> > >> > containers of database schemas. Users can usually >>> specify an >>> >> > > >>> initial >>> >> > > >>> > >> > catalog during the connection process, list catalogs, and >>> >> > > sometimes >>> >> > > >>> > >> change >>> >> > > >>> > >> > catalogs during the session. >>> >> > > >>> > >> > >>> >> > > >>> > >> > Currently catalog support in Flight SQL is somewhat >>> >> limited. The >>> >> > > >>> > protocol >>> >> > > >>> > >> > provides a way to list catalogs as well as metadata in >>> >> > > SqlTypeInfo >>> >> > > >>> for >>> >> > > >>> > >> > reporting how catalogs are supported from a syntax >>> >> perspective. >>> >> > > >>> > >> > >>> >> > > >>> > >> > ODBC and JDBC provide API calls for changing the catalog. >>> >> > > >>> > Additionally, >>> >> > > >>> > >> > Flight SQL doesn't really provide the concept of >>> "initial" >>> >> > > >>> connection >>> >> > > >>> > >> > properties (such as a starting catalog) since Flight >>> itself >>> >> is >>> >> > > >>> > stateless >>> >> > > >>> > >> > from a connection perspective. >>> >> > > >>> > >> > >>> >> > > >>> > >> > To support catalogs properly, I'd imagine we need to make >>> >> some >>> >> > > >>> > changes to >>> >> > > >>> > >> > the Flight SQL protocol: >>> >> > > >>> > >> > - Introduce the concept of connection-time properties >>> >> (perhaps >>> >> > > an >>> >> > > >>> > >> optional >>> >> > > >>> > >> > RPC for Flight SQL applications that need this) >>> >> > > >>> > >> > - Related to the above, expand the connection URL and >>> Java >>> >> > > builder >>> >> > > >>> to >>> >> > > >>> > >> allow >>> >> > > >>> > >> > arbitrary application-specific properties. >>> >> > > >>> > >> > - Add optional RPCs for changing the catalog and relevant >>> >> error >>> >> > > >>> codes >>> >> > > >>> > if >>> >> > > >>> > >> > this is not permitted. >>> >> > > >>> > >> > >>> >> > > >>> > >> > >>> >> > > >>> > >> > -- >>> >> > > >>> > >> > >>> >> > > >>> > >> > *James Duong* >>> >> > > >>> > >> > Lead Software Developer >>> >> > > >>> > >> > Bit Quill Technologies Inc. >>> >> > > >>> > >> > Direct: +1.604.562.6082 | jam...@bitquilltech.com >>> >> > > >>> > >> > https://www.bitquilltech.com >>> >> > > >>> > >> > >>> >> > > >>> > >> > This email message is for the sole use of the intended >>> >> > > recipient(s) >>> >> > > >>> > and >>> >> > > >>> > >> may >>> >> > > >>> > >> > contain confidential and privileged information. Any >>> >> > > unauthorized >>> >> > > >>> > >> review, >>> >> > > >>> > >> > use, disclosure, or distribution is prohibited. If you >>> are >>> >> not >>> >> > > the >>> >> > > >>> > >> > intended recipient, please contact the sender by reply >>> >> email and >>> >> > > >>> > destroy >>> >> > > >>> > >> > all copies of the original message. Thank you. >>> >> > > >>> > >> >>> >> > > >>> > > >>> >> > > >>> > > >>> >> > > >>> > > -- >>> >> > > >>> > > *James Henderson* >>> >> > > >>> > > XTDB Developer at *JUXT* >>> >> > > >>> > > Email j...@juxt.pro >>> >> > > >>> > > Website https://juxt.pro >>> >> > > >>> > > >>> >> > > >>> > > [image: photo] >>> >> > > >>> > >>> >> > > >>> >>> >> > > >>> >>> >> > > >>> -- >>> >> > > >>> *James Henderson* >>> >> > > >>> XTDB Developer at *JUXT* >>> >> > > >>> Email j...@juxt.pro >>> >> > > >>> Website https://juxt.pro >>> >> > > >>> >>> >> > > >>> [image: photo] >>> >> > > >>> >>> >> > > >> >>> >> > > >> >>> >> > > >> -- >>> >> > > >> >>> >> > > >> *James Duong* >>> >> > > >> Lead Software Developer >>> >> > > >> Bit Quill Technologies Inc. >>> >> > > >> Direct: +1.604.562.6082 | jam...@bitquilltech.com >>> >> > > >> https://www.bitquilltech.com >>> >> > > >> >>> >> > > >> This email message is for the sole use of the intended >>> recipient(s) >>> >> and >>> >> > > >> may contain confidential and privileged information. Any >>> >> unauthorized >>> >> > > >> review, use, disclosure, or distribution is prohibited. If you >>> are >>> >> not >>> >> > > the >>> >> > > >> intended recipient, please contact the sender by reply email and >>> >> destroy >>> >> > > >> all copies of the original message. Thank you. >>> >> > > >> >>> >> > > > >>> >> > > > >>> >> > > > -- >>> >> > > > >>> >> > > > *James Duong* >>> >> > > > Lead Software Developer >>> >> > > > Bit Quill Technologies Inc. >>> >> > > > Direct: +1.604.562.6082 | jam...@bitquilltech.com >>> >> > > > https://www.bitquilltech.com >>> >> > > > >>> >> > > > This email message is for the sole use of the intended >>> recipient(s) >>> >> and >>> >> > > may >>> >> > > > contain confidential and privileged information. Any unauthorized >>> >> > > review, >>> >> > > > use, disclosure, or distribution is prohibited. If you are not >>> the >>> >> > > > intended recipient, please contact the sender by reply email and >>> >> destroy >>> >> > > > all copies of the original message. Thank you. >>> >> > > >>> >> > >>> >> > >>> >> > -- >>> >> > >>> >> > *James Duong* >>> >> > Lead Software Developer >>> >> > Bit Quill Technologies Inc. >>> >> > Direct: +1.604.562.6082 | jam...@bitquilltech.com >>> >> > https://www.bitquilltech.com >>> >> > >>> >> > This email message is for the sole use of the intended recipient(s) >>> and >>> >> may >>> >> > contain confidential and privileged information. Any unauthorized >>> >> review, >>> >> > use, disclosure, or distribution is prohibited. If you are not the >>> >> > intended recipient, please contact the sender by reply email and >>> destroy >>> >> > all copies of the original message. Thank you. >>> >> >>>