Hi Sutou,

I saw your PostgreSQL project and thought it was quite interesting, especially 
given the number of PostgreSQL-compatible databases.

Paul Nienaber will be picking up implementation of the catalog feature going 
forward and can provide an update

Get Outlook for Android<https://aka.ms/AAb9ysg>

________________________________
From: Sutou Kouhei <k...@clear-code.com>
Sent: Thursday, February 9, 2023, 22:25
To: dev@arrow.apache.org <dev@arrow.apache.org>
Subject: Re: DISCUSS: [FlightSQL] Catalog support

Hi James

Is there any progress of this?

I'm developing a Flight SQL adapter for PostgreSQL:
https://github.com/apache/arrow-flight-sql-postgresql

I want to implement session feature for it because opening
a session in PostgreSQL is expensive. PostgreSQL uses one
process per session. If we open and close a session for
each Flight SQL call, we need to start one process for each
Flight SQL call.

I noticed that the current Flight SQL specification doesn't
provide the standard session support. So I'm interesting in
this discussion.

Background: https://github.com/apache/arrow-flight-sql-postgresql/issues/13


Thanks,
--
kou

In
 
<ch2pr13mb3831b71269c257469631fb5de3...@ch2pr13mb3831.namprd13.prod.outlook.com>
  "Re: DISCUSS: [FlightSQL] Catalog support" on Mon, 12 Dec 2022 18:12:06 +0000,
  James Duong <james.du...@improving.com.INVALID> wrote:

> Hi David,
>
> I've written up the URI parsing in C++ and started adding session management 
> messages. I'm also planning on having the ClientCookieMiddlewareFactory be 
> able to report if sessions are enabled on the server.
>
> I (or another developer) will send an update once those features are ready 
> for demo.
> ________________________________
> From: David Li <lidav...@apache.org>
> Sent: December 12, 2022 10:07 AM
> To: dev@arrow.apache.org <dev@arrow.apache.org>
> Subject: Re: DISCUSS: [FlightSQL] Catalog support
>
> Following up here, James are you interested in putting up a draft PR for the 
> Flight SQL URI format and for session management?
>
> The Flight SQL URI format would then also cover Andrew's use case. And if 
> someone wants to draw up a PR to the JDBC driver to enable arbitrary 
> properties, I can review that too.
>
> On Sat, Dec 3, 2022, at 05:38, Andrew Lamb wrote:
>>> Andrew, do we need to look into adding more metadata to indicate
>> different query languages? (It's quite a shame that we named this Flight
>> SQL at this point...)
>>
>> TDLR is I don't think trying to explicitly support languages other than SQL
>> in FlightSQL is a good idea. Among other reasons, the JDBC / ODBC drivers,
>> which mostly assume SQL, are one of the key features of FlightSQL, and they
>> are likely not as useful for non SQL. I can see the argument to support for
>> substrait plans, and it will be interesting to see what use cases benefit
>> from that support.
>>
>> What would make our life easier would be some standard way to pass
>> application specific key/value pairs from the JDBC driver to a flight SQL
>> backend with each request (perhaps via gRPC headers). This would allow
>> passing configuration parameters that were not envisioned in the spec, from
>> end user (of the JDBC driver) all the way to our backend.
>>
>> Thanks again for driving this forward,
>> Andrew
>>
>> On Thu, Dec 1, 2022 at 7:11 PM David Li <lidav...@apache.org> wrote:
>>
>>> Hey James, thanks for putting this up.
>>>
>>> Inline:
>>>
>>> > The suggestion is to make this part of Flight as an
>>> > optional feature, rather than Flight SQL due to its applicability outside
>>> > of just database access.
>>>
>>> Which uses do you see? I see statefulness as a general antipattern here,
>>> so I'm wary of introducing it beyond where we need it.
>>>
>>> > - The Flight client supplies a New-Session header which has key-value
>>> pairs
>>> > for initial session options. This header can be applied to any RPC call,
>>> > but logically should be the first one the client makes.
>>>
>>> Handshake already effectively serves as this RPC - maybe we could extend
>>> it? (I also see Handshake as an antipattern because it's a stateful auth
>>> mechanism.)
>>>
>>> Should the session timeout/be on a lease? (gRPC doesn't really give the
>>> server a way to track the persistence of a particular client connection.)
>>>
>>> > It's a bit asymmetric that creating a new session is done by applying a
>>> > header, but closing a session is an RPC call. This was so that session
>>> > creation doesn't introduce another round trip before the first real data
>>> > request. If there's a way to batch RPC calls it might be better to make
>>> > session creation an RPC call.
>>>
>>> Is this a worrisome amount of overhead?
>>>
>>> Unfortunately gRPC doesn't batch RPC calls, but RPC calls on the same
>>> client generally share the same TCP connection (modulo load balancing
>>> behavior, but presumably that is not enabled if you want persistent
>>> sessions).
>>>
>>> On the implementation side, I'd like to avoid baking this in too deeply if
>>> at all possible. Ideally it'd be implemented entirely as middleware,
>>> possibly making use of an interface so applications can override the
>>> session storage (hashtable, Redis, etcd, etc.)
>>>
>>> > Just to chime in on this, one thing I'm curious about is whether there
>>> > will be support for user-defined catalog/schema hierarchy depth?
>>>
>>> Gavin, for ADBC we discussed adding a delimiter to the catalog name to
>>> handle this case - maybe we can handle this by adding a property for the
>>> delimiter to SqlInfo?
>>>
>>> > https://github.com/influxdata/influxdb_iox/issues/6102
>>>
>>> Andrew, do we need to look into adding more metadata to indicate different
>>> query languages? (It's quite a shame that we named this Flight SQL at this
>>> point...)
>>>
>>> On Thu, Dec 1, 2022, at 08:49, Andrew Lamb wrote:
>>> > Sorry for the late reply -- thank you James and David for this
>>> discussion.
>>> >
>>> > I agree that adding Catalog support would be a valuable addition to
>>> Flight
>>> > SQL, and it recently came up as we begin to implement Flight SQL in
>>> > InfluxDB IOx [1].
>>> >
>>> >> - A standard URI scheme for Flight SQL that can be used by multiple
>>> > client APIs (JDBC, ADBC, etc.)
>>> >
>>> > I agree this would be very valuable, along with a standard way (ideally
>>> > with HTTP headers) to send this information as part of the FlightSQL gRPC
>>> > requests.
>>> >
>>> >> I'd suggest we define session management features explicitly in Flight
>>> > (while being optional).
>>> >
>>> > I agree it is critical that server-side state is not required to
>>> implement
>>> > FlightSQL. Stateful connections would likely complicate deploying
>>> FlightSQL
>>> > in distributed systems. I suggest it should be possible to implement any
>>> > session management features by sending the entire session state with the
>>> > request, if desired.
>>> >
>>> > I don't have a strong opinion about the merits of including explicit
>>> > session management features in FlightSQL. It seems to me that keeping the
>>> > API surface of FlightSQL minimal and implementation flexibility maximal
>>> > should be the default. However, if JDBC/ODBC driver compatibility would
>>> be
>>> > improved with explicit state management APIs, then adding them to
>>> FlightSQL
>>> > seems like a good idea to me.
>>> >
>>> > Thanks again -- it is amazing to hit some issue in design and then find
>>> out
>>> > the Arrow community is already hard at work on a solution.
>>> >
>>> > Andrew
>>> >
>>> > [1] https://github.com/influxdata/influxdb_iox/issues/6102
>>> >
>>> > On Wed, Nov 30, 2022 at 7:17 PM Gavin Ray <ray.gavi...@gmail.com> wrote:
>>> >
>>> >> Just to chime in on this, one thing I'm curious about is whether there
>>> >> will be support for user-defined catalog/schema hierarchy depth?
>>> >>
>>> >> This comment that James made does seem reasonable to me
>>> >> > scheme://<host>:<port>/path-1/path-2/.../path-n
>>> >>
>>> >> Trino/Presto does a similar thing (jdbc:trino://localhost:8080/tpch/sf1)
>>> >>
>>> >> At Hasura, what we do is have an alias "FullyQualifiedName" which is
>>> >> just "Array<String>"
>>> >> and the identifier to some element in a data source is always
>>> >> fully-qualified:
>>> >>
>>> >> https://github.com/hasura/graphql-engine/tree/master/dc-agents#schema
>>> >>
>>> >> ["postgres_1", "db1", "schema2", "my_table", "col_a"]
>>> >> ["mongo", "db1",  "collection_a", "field_a"]
>>> >> ["csv_adapter", "myfile.csv", "col_x"]
>>> >>
>>> >> On Wed, Nov 30, 2022 at 6:31 PM James Duong
>>> >> <jam...@bitquilltech.com.invalid> wrote:
>>> >> >
>>> >> > Our current convention of sending connection properties as headers
>>> with
>>> >> > every request has the benefit of making statefulness optional, but has
>>> >> the
>>> >> > drawback of sending redundant, unused properties on requests after the
>>> >> > first, which increases the payload size unnecessarily.
>>> >> >
>>> >> > I'd suggest we define session management features explicitly in Flight
>>> >> > (while being optional). The suggestion is to make this part of Flight
>>> as
>>> >> an
>>> >> > optional feature, rather than Flight SQL due to its applicability
>>> outside
>>> >> > of just database access.
>>> >> >
>>> >> > Creating a session:
>>> >> > - The Flight client supplies a New-Session header which has key-value
>>> >> pairs
>>> >> > for initial session options. This header can be applied to any RPC
>>> call,
>>> >> > but logically should be the first one the client makes.
>>> >> > - The server should send a Set-Cookie header back containing some
>>> >> > server-side representation of the session that the client can use in
>>> >> > subsequent requests.
>>> >> > - The path specified in the URI is sent as a "Catalog" session option.
>>> >> >
>>> >> > Modifying session options:
>>> >> > - A separate RPC call that takes in a Stream<string, string>
>>> representing
>>> >> > each session option that is being modified and returns a stream of
>>> >> statuses
>>> >> > to indicate if the setting change was accepted.
>>> >> > - This RPC call is only valid when the Cookie header is used.
>>> >> > - It is up to the server to define if a failed session property
>>> change is
>>> >> > fatal or if other properties can continue to be set.
>>> >> >
>>> >> > Closing a session:
>>> >> > - A separate RPC call that tells the server to drop the session
>>> specified
>>> >> > by the Cookie header.
>>> >> >
>>> >> > Notes:
>>> >> > A Flight SQL client would check if session management RPCs are
>>> supported
>>> >> > through a new GetSqlInfo property. A Flight client doesn't have a way
>>> to
>>> >> do
>>> >> > this generically, but there could be an application-specific RPC or
>>> >> header
>>> >> > that reports this metadata.
>>> >> >
>>> >> > The O/JDBC and ADBC drivers would need to be updated to
>>> programmatically
>>> >> > check for session management RPCs. If unsupported, then use the old
>>> >> > behavior of sending all properties as headers with each request. If
>>> >> > supported, make use of the New-Session header and drop the session
>>> when
>>> >> > closing the client-side connection.
>>> >> >
>>> >> > It's a bit asymmetric that creating a new session is done by applying
>>> a
>>> >> > header, but closing a session is an RPC call. This was so that session
>>> >> > creation doesn't introduce another round trip before the first real
>>> data
>>> >> > request. If there's a way to batch RPC calls it might be better to
>>> make
>>> >> > session creation an RPC call.
>>> >> >
>>> >> > On Tue, Nov 22, 2022 at 3:16 PM David Li <lidav...@apache.org> wrote:
>>> >> >
>>> >> > > It sounds reasonable - then there are three points:
>>> >> > >
>>> >> > > - A standard URI scheme for Flight SQL that can be used by multiple
>>> >> client
>>> >> > > APIs (JDBC, ADBC, etc.)
>>> >> > > - A standard scheme for session data (likely header/cookie-based)
>>> >> > > - A mapping from URI parameters and fields to session data
>>> >> > >
>>> >> > >
>>> >> > >
>>> >> > > On Tue, Nov 22, 2022, at 17:45, James Duong wrote:
>>> >> > > > Just following up on this and if there are any thoughts.
>>> >> > > >
>>> >> > > > The purpose would be to standardize how we specify access to some
>>> >> named
>>> >> > > > logical grouping of data. This would make it easy to model
>>> >> catalog/schema
>>> >> > > > semantics in Flight SQL.
>>> >> > > >
>>> >> > > > Having this be part of the connection URI makes it similar to
>>> >> specifying
>>> >> > > a
>>> >> > > > resource in an HTTP URL (ie an endpoint) which should make it easy
>>> >> for
>>> >> > > end
>>> >> > > > users to work with and modify.
>>> >> > > >
>>> >> > > > On Fri, Nov 18, 2022 at 3:17 PM James Duong <
>>> jam...@bitquilltech.com
>>> >> >
>>> >> > > wrote:
>>> >> > > >
>>> >> > > >> As for surfacing catalogs itself, perhaps we allow the URI take
>>> in a
>>> >> > > path
>>> >> > > >> and treat that as a way of specifying a multi-level resource that
>>> >> which
>>> >> > > the
>>> >> > > >> FlightClient is connecting to:
>>> >> > > >>
>>> >> > > >> eg a connection URI of the form:
>>> >> > > >> scheme://<host>:<port>/path-1/path-2/.../path-n
>>> >> > > >>
>>> >> > > >> The FlightClient could send this path as either a header or a
>>> >> session
>>> >> > > >> property (with a neutral name like 'resource-path'). Flight SQL
>>> >> > > Producers
>>> >> > > >> could interpret this as a catalog or schema.
>>> >> > > >> eg
>>> >> > > >> grpc://<host>:<port>/catalog/schema
>>> >> > > >>
>>> >> > > >> On Fri, Nov 11, 2022 at 2:07 AM James Henderson <j...@juxt.pro>
>>> >> wrote:
>>> >> > > >>
>>> >> > > >>> Sounds good to me.
>>> >> > > >>>
>>> >> > > >>> > Are you interested in writing up a (sketch of a) proposal?
>>> >> > > >>>
>>> >> > > >>> Yep, can do - I'm OoO over the next couple of weeks so might be
>>> a
>>> >> bit
>>> >> > > >>> intermittent.
>>> >> > > >>>
>>> >> > > >>> On Thu, 10 Nov 2022 at 15:28, David Li <lidav...@apache.org>
>>> >> wrote:
>>> >> > > >>>
>>> >> > > >>> > Hey James H.,
>>> >> > > >>> >
>>> >> > > >>> > That would make sense to me. So it sounds like we'd want
>>> >> > > >>> >
>>> >> > > >>> > - Formal specification of using cookies/headers to mark a
>>> >> 'session'
>>> >> > > (I
>>> >> > > >>> > guess this will be a little inconsistent with transactions,
>>> >> though)
>>> >> > > >>> > - Adding RPCs to query session values
>>> >> > > >>> > - Adding RPCs to set session values
>>> >> > > >>> > - Listing standard values and types
>>> >> > > >>> >
>>> >> > > >>> > Some things may require more consideration, e.g. transaction
>>> >> > > isolation
>>> >> > > >>> > might be better off as part of the transaction RPCs than an
>>> >> ambient
>>> >> > > >>> > property. Are you interested in writing up a (sketch of a)
>>> >> proposal?
>>> >> > > >>> >
>>> >> > > >>> > -David
>>> >> > > >>> >
>>> >> > > >>> > On Thu, Nov 10, 2022, at 10:09, James Henderson wrote:
>>> >> > > >>> > > Similarly, we're also currently considering how best to
>>> >> implement
>>> >> > > >>> some of
>>> >> > > >>> > > the SQL standard session variables in our Flight SQL server
>>> -
>>> >> > > things
>>> >> > > >>> like
>>> >> > > >>> > > current transaction isolation level, access mode, time zone
>>> >> etc,
>>> >> > > which
>>> >> > > >>> > seem
>>> >> > > >>> > > to have similar properties to the (traditional) connection's
>>> >> > > current
>>> >> > > >>> > > catalog. We'd (perhaps naively) looked at solutions
>>> involving
>>> >> the
>>> >> > > >>> Flight
>>> >> > > >>> > > client's `ClientCookieMiddleware`, but might these have
>>> >> > > standardised
>>> >> > > >>> > > support within Flight SQL itself eventually?
>>> >> > > >>> > >
>>> >> > > >>> > > Cheers,
>>> >> > > >>> > >
>>> >> > > >>> > > James
>>> >> > > >>> > >
>>> >> > > >>> > > On Wed, 9 Nov 2022 at 21:51, David Li <lidav...@apache.org>
>>> >> wrote:
>>> >> > > >>> > >
>>> >> > > >>> > >> I think having better support for this makes sense, but
>>> >> perhaps we
>>> >> > > >>> can
>>> >> > > >>> > >> find a way to make it not tied to the connection itself?
>>> For
>>> >> > > >>> instance,
>>> >> > > >>> > in
>>> >> > > >>> > >> the same way transactions were implemented (as a handle).
>>> Or
>>> >> > > rather,
>>> >> > > >>> > >> instead of adding connection statefulness to Flight RPC,
>>> I'd
>>> >> > > rather
>>> >> > > >>> try
>>> >> > > >>> > to
>>> >> > > >>> > >> work within the gRPC/RPC paradigm.
>>> >> > > >>> > >>
>>> >> > > >>> > >> On Wed, Nov 9, 2022, at 16:47, James Duong wrote:
>>> >> > > >>> > >> > Databases such as SQL Server and PostgreSQL have the
>>> >> concept of
>>> >> > > >>> > catalogs
>>> >> > > >>> > >> as
>>> >> > > >>> > >> > containers of database schemas. Users can usually
>>> specify an
>>> >> > > >>> initial
>>> >> > > >>> > >> > catalog during the connection process, list catalogs, and
>>> >> > > sometimes
>>> >> > > >>> > >> change
>>> >> > > >>> > >> > catalogs during the session.
>>> >> > > >>> > >> >
>>> >> > > >>> > >> > Currently catalog support in Flight SQL is somewhat
>>> >> limited. The
>>> >> > > >>> > protocol
>>> >> > > >>> > >> > provides a way to list catalogs as well as metadata in
>>> >> > > SqlTypeInfo
>>> >> > > >>> for
>>> >> > > >>> > >> > reporting how catalogs are supported from a syntax
>>> >> perspective.
>>> >> > > >>> > >> >
>>> >> > > >>> > >> > ODBC and JDBC provide API calls for changing the catalog.
>>> >> > > >>> > Additionally,
>>> >> > > >>> > >> > Flight SQL doesn't really provide the concept of
>>> "initial"
>>> >> > > >>> connection
>>> >> > > >>> > >> > properties (such as a starting catalog) since Flight
>>> itself
>>> >> is
>>> >> > > >>> > stateless
>>> >> > > >>> > >> > from a connection perspective.
>>> >> > > >>> > >> >
>>> >> > > >>> > >> > To support catalogs properly, I'd imagine we need to make
>>> >> some
>>> >> > > >>> > changes to
>>> >> > > >>> > >> > the Flight SQL protocol:
>>> >> > > >>> > >> > - Introduce the concept of connection-time properties
>>> >> (perhaps
>>> >> > > an
>>> >> > > >>> > >> optional
>>> >> > > >>> > >> > RPC for Flight SQL applications that need this)
>>> >> > > >>> > >> > - Related to the above, expand the connection URL and
>>> Java
>>> >> > > builder
>>> >> > > >>> to
>>> >> > > >>> > >> allow
>>> >> > > >>> > >> > arbitrary application-specific properties.
>>> >> > > >>> > >> > - Add optional RPCs for changing the catalog and relevant
>>> >> error
>>> >> > > >>> codes
>>> >> > > >>> > if
>>> >> > > >>> > >> > this is not permitted.
>>> >> > > >>> > >> >
>>> >> > > >>> > >> >
>>> >> > > >>> > >> > --
>>> >> > > >>> > >> >
>>> >> > > >>> > >> > *James Duong*
>>> >> > > >>> > >> > Lead Software Developer
>>> >> > > >>> > >> > Bit Quill Technologies Inc.
>>> >> > > >>> > >> > Direct: +1.604.562.6082 | jam...@bitquilltech.com
>>> >> > > >>> > >> > https://www.bitquilltech.com
>>> >> > > >>> > >> >
>>> >> > > >>> > >> > This email message is for the sole use of the intended
>>> >> > > recipient(s)
>>> >> > > >>> > and
>>> >> > > >>> > >> may
>>> >> > > >>> > >> > contain confidential and privileged information.  Any
>>> >> > > unauthorized
>>> >> > > >>> > >> review,
>>> >> > > >>> > >> > use, disclosure, or distribution is prohibited.  If you
>>> are
>>> >> not
>>> >> > > the
>>> >> > > >>> > >> > intended recipient, please contact the sender by reply
>>> >> email and
>>> >> > > >>> > destroy
>>> >> > > >>> > >> > all copies of the original message.  Thank you.
>>> >> > > >>> > >>
>>> >> > > >>> > >
>>> >> > > >>> > >
>>> >> > > >>> > > --
>>> >> > > >>> > > *James Henderson*
>>> >> > > >>> > > XTDB Developer at *JUXT*
>>> >> > > >>> > > Email j...@juxt.pro
>>> >> > > >>> > > Website https://juxt.pro
>>> >> > > >>> > >
>>> >> > > >>> > > [image: photo]
>>> >> > > >>> >
>>> >> > > >>>
>>> >> > > >>>
>>> >> > > >>> --
>>> >> > > >>> *James Henderson*
>>> >> > > >>> XTDB Developer at *JUXT*
>>> >> > > >>> Email j...@juxt.pro
>>> >> > > >>> Website https://juxt.pro
>>> >> > > >>>
>>> >> > > >>> [image: photo]
>>> >> > > >>>
>>> >> > > >>
>>> >> > > >>
>>> >> > > >> --
>>> >> > > >>
>>> >> > > >> *James Duong*
>>> >> > > >> Lead Software Developer
>>> >> > > >> Bit Quill Technologies Inc.
>>> >> > > >> Direct: +1.604.562.6082 | jam...@bitquilltech.com
>>> >> > > >> https://www.bitquilltech.com
>>> >> > > >>
>>> >> > > >> This email message is for the sole use of the intended
>>> recipient(s)
>>> >> and
>>> >> > > >> may contain confidential and privileged information.  Any
>>> >> unauthorized
>>> >> > > >> review, use, disclosure, or distribution is prohibited.  If you
>>> are
>>> >> not
>>> >> > > the
>>> >> > > >> intended recipient, please contact the sender by reply email and
>>> >> destroy
>>> >> > > >> all copies of the original message.  Thank you.
>>> >> > > >>
>>> >> > > >
>>> >> > > >
>>> >> > > > --
>>> >> > > >
>>> >> > > > *James Duong*
>>> >> > > > Lead Software Developer
>>> >> > > > Bit Quill Technologies Inc.
>>> >> > > > Direct: +1.604.562.6082 | jam...@bitquilltech.com
>>> >> > > > https://www.bitquilltech.com
>>> >> > > >
>>> >> > > > This email message is for the sole use of the intended
>>> recipient(s)
>>> >> and
>>> >> > > may
>>> >> > > > contain confidential and privileged information.  Any unauthorized
>>> >> > > review,
>>> >> > > > use, disclosure, or distribution is prohibited.  If you are not
>>> the
>>> >> > > > intended recipient, please contact the sender by reply email and
>>> >> destroy
>>> >> > > > all copies of the original message.  Thank you.
>>> >> > >
>>> >> >
>>> >> >
>>> >> > --
>>> >> >
>>> >> > *James Duong*
>>> >> > Lead Software Developer
>>> >> > Bit Quill Technologies Inc.
>>> >> > Direct: +1.604.562.6082 | jam...@bitquilltech.com
>>> >> > https://www.bitquilltech.com
>>> >> >
>>> >> > This email message is for the sole use of the intended recipient(s)
>>> and
>>> >> may
>>> >> > contain confidential and privileged information.  Any unauthorized
>>> >> review,
>>> >> > use, disclosure, or distribution is prohibited.  If you are not the
>>> >> > intended recipient, please contact the sender by reply email and
>>> destroy
>>> >> > all copies of the original message.  Thank you.
>>> >>
>>>

Reply via email to