This is an automated email from the ASF dual-hosted git repository.
kenhuuu pushed a commit to branch 3.7-dev
in repository https://gitbox.apache.org/repos/asf/tinkerpop.git
The following commit(s) were added to refs/heads/3.7-dev by this push:
new 8e286c5d97 Update session reuse and Python set deserialization
documentation CTR
8e286c5d97 is described below
commit 8e286c5d971c3f6d0a7e32d9a50df97f04ba509f
Author: Ken Hu <[email protected]>
AuthorDate: Tue Mar 31 13:38:38 2026 -0700
Update session reuse and Python set deserialization documentation CTR
---
docs/src/dev/provider/index.asciidoc | 10 ++
docs/src/reference/gremlin-applications.asciidoc | 2 +-
docs/src/reference/gremlin-variants.asciidoc | 122 ++++++++++++++++++++--
docs/src/upgrade/release-3.7.x.asciidoc | 127 ++++++++++++++++++++++-
4 files changed, 250 insertions(+), 11 deletions(-)
diff --git a/docs/src/dev/provider/index.asciidoc
b/docs/src/dev/provider/index.asciidoc
index d12335388b..c77d67c2bb 100644
--- a/docs/src/dev/provider/index.asciidoc
+++ b/docs/src/dev/provider/index.asciidoc
@@ -1191,6 +1191,16 @@ in 3.5.0, but has been added back as of 3.5.2. Servers
wishing to be compatible
this message (which is what Gremlin Server does as of 3.5.0). Drivers wishing
to be compatible with servers prior to
3.3.11 may continue to send the message on calls to `close()`, otherwise such
code can be removed.
+NOTE: As of 3.7.6/3.8.1, the session lifecycle contract described above where
sessions are cleaned up only when the
+underlying connection closes can be modified by the `closeSessionPostGraphOp`
server setting. When enabled, the
+server will close a session after a successful TX_COMMIT or TX_ROLLBACK
bytecode request, independent of the
+connection state. This allows multiple short-lived transaction sessions to
share a single WebSocket connection over
+time, which is required by the Java GLV's `reuseConnectionsForSessions`
option. This setting defaults to `false`
+to preserve the established 3.5.0 behavior. Providers implementing session
support should be aware that when this
+setting is enabled, sessions and connections no longer have a one-to-one
lifecycle relationship as a single connection
+may host many sessions sequentially. Providers wishing to support this
capability are recommended to use the same
+`closeSessionPostGraphOp` configuration name for consistency across the
TinkerPop ecosystem.
+
**`authentication` operation arguments**
[width="100%",cols="2,2,9",options="header"]
diff --git a/docs/src/reference/gremlin-applications.asciidoc
b/docs/src/reference/gremlin-applications.asciidoc
index e76b718e81..338467bafb 100644
--- a/docs/src/reference/gremlin-applications.asciidoc
+++ b/docs/src/reference/gremlin-applications.asciidoc
@@ -1017,7 +1017,7 @@ The following table describes the various YAML
configuration options that Gremli
|authorization.authorizer |The fully qualified classname of an `Authorizer`
implementation to use. |_none_
|authorization.config |A `Map` of configuration settings to be passed to the
`Authorizer` when it is constructed. The settings available are dependent on
the implementation. |_none_
|channelizer |The fully qualified classname of the `Channelizer`
implementation to use. A `Channelizer` is a "channel initializer" which
Gremlin Server uses to define the type of processing pipeline to use. By
allowing different `Channelizer` implementations, Gremlin Server can support
different communication protocols (e.g. WebSocket). |`WebSocketChannelizer`
-|closeSessionPostGraphOp |Controls whether a `Session` will be closed by the
server after a successful TX_COMMIT or TX_ROLLBACK bytecode request. |_false_
+|closeSessionPostGraphOp |Controls whether a `Session` will be closed by the
server after a successful TX_COMMIT or TX_ROLLBACK bytecode request. This
setting should be enabled when clients use the `reuseConnectionsForSessions`
option (see <<gremlin-java-connection-reuse>>), which allows transaction
sessions to share pooled connections. Without this setting, sessions opened by
`reuseConnectionsForSessions` will not be cleaned up after commit or rollback
and will remain open on the server [...]
|enableAuditLog |The `AuthenticationHandler`, `AuthorizationHandler` and
processors can issue audit logging messages with the authenticated user, remote
socket address and requests with a gremlin query. For privacy reasons, the
default value of this setting is false. The audit logging messages are logged
at the INFO level via the `audit.org.apache.tinkerpop.gremlin.server` logger,
which can be configured using the `logback.xml` file. |_false_
|graphManager |The fully qualified classname of the `GraphManager`
implementation to use. A `GraphManager` is a class that adheres to the
TinkerPop `GraphManager` interface, allowing custom implementations for storing
and managing graph references, as well as defining custom methods to open and
close graphs instantiations. To prevent Gremlin Server from starting when all
graphs fails, the `CheckedGraphManager` can be used.|`DefaultGraphManager`
|graphs |A `Map` of `Graph` configuration files where the key of the `Map`
becomes the name to which the `Graph` will be bound and the value is the file
name of a `Graph` configuration file. |_none_
diff --git a/docs/src/reference/gremlin-variants.asciidoc
b/docs/src/reference/gremlin-variants.asciidoc
index 254d1f1e23..cbb7ab9369 100644
--- a/docs/src/reference/gremlin-variants.asciidoc
+++ b/docs/src/reference/gremlin-variants.asciidoc
@@ -883,6 +883,112 @@ Please see the
link:https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/
Transactions with Java are best described in <<transactions,The Traversal -
Transactions>> section of this
documentation as Java covers both embedded and remote use cases.
+[[gremlin-java-connection-reuse]]
+=== Connection Reuse for Transactions
+
+By default, each call to `g.tx()` opens a new dedicated WebSocket connection
for the session that backs the
+transaction. For workloads that issue many short-lived transactions, the
overhead of repeatedly establishing and
+tearing down WebSocket connections can become significant, particularly when
the client and server are separated by
+network latency.
+
+The `reuseConnectionsForSessions` option on `Cluster.Builder` changes this
behavior so that transaction sessions
+borrow connections from the existing connection pool instead of creating
dedicated ones. When a transaction commits or
+rolls back, the borrowed connection is returned to the pool and becomes
available for the next transaction.
+
+==== Enabling Connection Reuse
+
+This feature requires configuration on both the client and the server (if
using Gremlin Server).
+
+On the client, enable `reuseConnectionsForSessions` when building the
`Cluster`:
+
+[source,java]
+----
+Cluster cluster = Cluster.build("localhost")
+ .reuseConnectionsForSessions(true)
+ .create();
+GraphTraversalSource g =
traversal().withRemote(DriverRemoteConnection.using(cluster));
+----
+
+The same setting can be specified in a YAML configuration file used with
`Cluster.open()`:
+
+[source,yaml]
+----
+reuseConnectionsForSessions: true
+----
+
+On servers based on the Gremlin Server, enable `closeSessionPostGraphOp` so
that sessions are closed immediately after
+a commit or rollback completes:
+
+[source,yaml]
+----
+# gremlin-server.yaml
+closeSessionPostGraphOp: true
+----
+
+IMPORTANT: Both settings must be configured together. If
`reuseConnectionsForSessions` is enabled on the client but
+`closeSessionPostGraphOp` is not enabled on the server, sessions will not be
cleaned up after commit or rollback.
+These leaked sessions will accumulate on the server until the configured
session timeout is reached, consuming server
+resources unnecessarily.
+
+NOTE: Some Remote Gremlin Providers may handle session cleanup automatically
and may not require explicit
+`closeSessionPostGraphOp` configuration. Consult the provider's documentation
to determine whether this behavior is
+enabled by default, requires explicit configuration, or is unsupported.
+
+==== Usage
+
+The transaction API itself does not change when connection reuse is enabled.
The standard pattern of `begin`, mutate,
+and `commit` or `rollback` applies:
+
+[source,java]
+----
+Cluster cluster = Cluster.build("localhost")
+ .reuseConnectionsForSessions(true)
+ .create();
+GraphTraversalSource g =
traversal().withRemote(DriverRemoteConnection.using(cluster));
+
+GraphTraversalSource gtx = g.tx().begin();
+gtx.addV("person").property("name", "marko").iterate();
+gtx.addV("software").property("name", "lop").iterate();
+gtx.tx().commit();
+----
+
+After `commit()` or `rollback()`, the connection is returned to the pool. A
subsequent call to `g.tx().begin()` will
+borrow a connection from the pool again, potentially reusing the same
underlying WebSocket connection:
+
+A `GraphTraversalSource` obtained from `begin()` cannot be reused after its
transaction has been committed or rolled
+back. Attempting to do so will result in an exception. A fresh call to
`g.tx().begin()` is required for each new
+transaction.
+
+==== Concurrent Transactions
+
+Multiple transactions can be open simultaneously. Each transaction gets its
own server-side session regardless of
+whether the underlying connections are shared. Because the underlying
connection is borrowed rather than created, other
+settings on the `Cluster` such as `minConnectionPoolSize` and
`maxSimultaneousUsagePerConnection` will have an effect
+on how the connection gets borrowed. These settings may need to be tweaked if
there are many concurrent transactions.
+
+==== Restrictions
+
+Connection reuse for transactions has the following restrictions:
+
+* It is designed for short-lived transaction sessions that follow the
begin/mutate/commit-or-rollback pattern. It
+ should not be used for classic long-running sessions such as those used with
a remote console. For long-running
+ sessions, use the standard `cluster.connect(sessionId)` approach described
in the
+ <<sessions,Considering Sessions>> Section.
+* It is not compatible with `HttpChannelizer`. Attempting to call `tx()` when
the driver is configured with
+ `HttpChannelizer` will throw an `IllegalStateException`. This restriction
applies regardless of the
+ `reuseConnectionsForSessions` setting.
+
+==== When to Use
+
+Connection reuse provides the greatest benefit when:
+
+* Network latency between client and server is significant (e.g. cross-region
deployments).
+* Transactions are lightweight (few operations per transaction).
+* Many short-lived transactions are issued in sequence or concurrently.
+
+For local deployments or transactions that perform substantial graph
mutations, the connection setup overhead is a
+smaller proportion of the total transaction time and the benefit is
correspondingly smaller.
+
[[gremlin-java-serialization]]
=== Serialization
@@ -2849,15 +2955,13 @@ therefore cardinality functions that take a value like
`list()`, `set()`, and `s
[[gremlin-python-limitations]]
=== Limitations
-* Traversals that return a `Set` *might* be coerced to a `List` in Python. In
the case of Python, number equality
-is different from JVM languages which produces different `Set` results when
those types are in use. When this case
-is detected during deserialization, the `Set` is coerced to a `List` so that
traversals return consistent
-results within a collection across different languages. If a `Set` is needed
then convert `List` results
-to `Set` manually.
-* Traversals that return a `Set` containing non-hashable items, such as
`Dictionary`, `Set` and `List`, will be coerced
-into a `List` during deserialization. Python requires set elements to be
hashable, for which Gremlin does not. If a
-`Set` is needed, convert elements to hashable equivalents manually (e.g.
`dict` to `HashableDict`, `list` to `tuple`,
-`set` to `frozenset`).
+* Traversals that return a `Set` may be coerced to a `List` in Python in two
cases. First, when the `Set` contains
+mixed numeric types (e.g. `int` and `float`), because Python number equality
differs from the JVM — a Java `Set` of
+`[1, 1.0d]` has two elements, but Python considers `1 == 1.0` and would
collapse them to one, so the `Set` is coerced to
+a `List` to preserve all elements consistently across languages. Second, when
the `Set` contains non-hashable items such
+as `Dictionary`, `Set`, or `List`, because Python requires set elements to be
hashable while Gremlin does not, the `Set`
+is also coerced to a `List`. For this case, if a `Set` is needed, convert
elements to hashable equivalents manually
+(e.g. `dict` to `HashableDict`, `list` to `tuple`, `set` to `frozenset`).
* Gremlin is capable of returning `Dictionary` results that use non-hashable
keys (e.g. Dictionary as a key) and Python
does not support that at a language level. Using GraphSON 3.0 or GraphBinary
(after 3.5.0) makes it possible to return
such results. In all other cases, Gremlin that returns such results will need
to be re-written to avoid that sort of
diff --git a/docs/src/upgrade/release-3.7.x.asciidoc
b/docs/src/upgrade/release-3.7.x.asciidoc
index 57fb43be18..17258b50d9 100644
--- a/docs/src/upgrade/release-3.7.x.asciidoc
+++ b/docs/src/upgrade/release-3.7.x.asciidoc
@@ -61,7 +61,7 @@ Gremlin Javascript now supports Node 22 and 24 alongside Node
20.
Gremlin Go has been upgraded to Go version 1.25.
-==== Python Set Deserialization with Non-Hashable Elements
+==== Python Set-to-List Fallback
Traversals that return a `Set` containing non-hashable items (such as
`Dictionary`, `Set`, or `List`) previously caused
a `TypeError` during deserialization in Gremlin-Python. These results are now
coerced to a `List` to avoid errors. This
@@ -70,10 +70,135 @@ Python hashable types manually (e.g. `dict` to
`HashableDict`, `list` to `tuple`
See: link:https://issues.apache.org/jira/browse/TINKERPOP-3232[TINKERPOP-3232]
+==== Remote Transaction Improvements
+
+The Java driver now supports reusing existing pooled WebSocket connections for
session-based requests rather than
+establishing a dedicated connection per session. This behavior is controlled
by the `Cluster.Builder` option
+`reuseConnectionsForSessions`, which defaults to `false`.
+
+When enabled, a `Client.SessionedChildClient` will attempt to borrow a
connection from the connection pool of a standard
+`Client` rather than opening its own WebSocket connection. This avoids the
overhead of the TCP handshake and WebSocket
+upgrade for each session, which can be significant when issuing many
short-lived transactions.
+
+[source,java]
+----
+// Enable connection reuse for sessions
+Cluster cluster = Cluster.build(host)
+ .reuseConnectionsForSessions(true)
+ .create();
+----
+
+This feature was designed specifically for use with remote transactions, where
sessions are short-lived and terminate
+after a `commit()` or `rollback()`. It should not be used for classic
long-running session use cases where a session
+is used for purposes other than transactions such as remote console.
+
+===== Server Configuration
+
+When using `reuseConnectionsForSessions`, the server should be configured to
close sessions immediately after a graph
+operation such as commit() or rollback() completes. Without this behavior,
sessions may remain open until the session
+timeout expires, potentially leading to a buildup of idle sessions on the
server side.
+
+Some remote graph providers handle this automatically and require no
additional configuration. For the reference Gremlin
+Server, this is controlled by the `closeSessionPostGraphOp` setting, which
should be set to true. Users of other graph
+providers should consult their provider's documentation to determine whether
this behavior is enabled by default,
+requires explicit configuration or is unsupported.
+
+[source,yaml]
+----
+# gremlin-server.yaml
+closeSessionPostGraphOp: true
+----
+
+IMPORTANT: Failing to enable `closeSessionPostGraphOp` on the server when
using `reuseConnectionsForSessions` on the
+client will result in sessions that are not properly cleaned up. These leaked
sessions will accumulate until the
+configured `sessionLifetimeTimeout` is reached, consuming server resources
unnecessarily.
+
+===== Performance
+
+Performance was measured with an ad-hoc benchmark application. The application
executes a configurable number of
+complete transaction lifecycles (begin, mutate, commit) and reports throughput
and latency percentiles. Each transaction
+opens a session, submits one or more `addV()` operations, commits, and closes
the session.
+
+The benchmark varies the following parameters:
+
+* *Concurrent clients* (`threads`): The number of threads issuing transactions
simultaneously. A value of 1 means
+ transactions are executed sequentially by a single client. Higher values
simulate multiple application threads or
+ service instances issuing transactions concurrently against the same server.
+* *Connection pool size* (`pool`): The number of WebSocket connections
maintained in the pool when
+ `reuseConnectionsForSessions` is enabled. When reuse is disabled, each
session creates its own dedicated connection
+ and this parameter does not apply (shown as `n/a`).
+* *Transaction weight* (`weight`): "light" transactions perform a single
`addV()` plus commit. "heavy" transactions
+ perform ten `addV()` operations plus commit, simulating a more substantial
unit of work per transaction.
+
+Tests were conducted both locally (client and server on the same machine) and
remotely (client on the US west coast,
+server on the US east coast) to isolate the effect of network latency on
connection setup overhead. Each scenario
+executed 1000 transactions after a warmup phase of 50 transactions.
+
+*Local Results (same machine)*
+
+[cols="3,1,1,1", options="header"]
+|=========================================================
+|Configuration |No-Reuse (tx/s) |Best-Reuse (tx/s) |Speedup
+|1 client, light |23.1 |26.7 |1.16x
+|8 clients, light |25.2 |28.5 |1.13x
+|16 clients, light |25.4 |27.9 |1.10x
+|1 client, heavy |26.0 |26.9 |1.03x
+|8 clients, heavy |26.4 |27.9 |1.06x
+|16 clients, heavy |25.8 |26.5 |1.03x
+|=========================================================
+
+*Remote Results (west coast to east coast)*
+
+[cols="3,1,1,1", options="header"]
+|=========================================================
+|Configuration |No-Reuse (tx/s) |Best-Reuse (tx/s) |Speedup
+|1 client, light |3.6 |7.6 |2.10x
+|8 clients, light |15.6 |23.0 |1.48x
+|16 clients, light |15.4 |25.3 |1.64x
+|1 client, heavy |1.4 |1.8 |1.26x
+|8 clients, heavy |9.2 |10.8 |1.17x
+|16 clients, heavy |14.5 |15.9 |1.10x
+|=========================================================
+
+The "Best-Reuse" column reflects the highest throughput observed across all
tested pool sizes (2, 4, and 8 connections)
+for each scenario.
+
+The benefit of connection reuse is most pronounced in remote scenarios with
light transactions. When the network
+round-trip cost is high and the transaction payload is small, the WebSocket
connection setup overhead represents a
+larger proportion of the total transaction time. In the single-client remote
light workload, connection reuse yielded a
+2.10x throughput improvement, as the connection handshake cost dominated the
per-transaction time. With 16 concurrent
+clients in the same remote light scenario, throughput improved from 15.4 tx/s
to 25.3 tx/s (1.64x), as the connection
+pool amortized the setup cost across many parallel sessions.
+
+As transaction weight increases, the relative benefit diminishes because the
graph operations themselves become the
+bottleneck rather than connection setup. In the local heavy workload
scenarios, the improvement was only 3-6%, as the
+connection overhead was already negligible relative to the cost of the graph
mutations. Even in the remote heavy
+scenarios, the improvement ranged from 10-26%, as the ten `addV()` operations
per transaction shifted the time
+distribution toward server-side processing.
+
+In summary, `reuseConnectionsForSessions` provides the greatest benefit when:
+
+* Network latency between client and server is significant (remote deployments)
+* Transactions are lightweight (few operations per transaction)
+* Many short-lived transactions are issued in sequence or concurrently
+
+See: link:https://issues.apache.org/jira/browse/TINKERPOP-3213[TINKERPOP-3213]
+
=== Upgrading for Providers
==== Graph System Providers
+===== Session Changes
+
+An option has been added to the Java GLV (`reuseConnectionsForSessions`) that
allows for borrowing open WebSocket
+connections for sessions. This is primarily to reduce the overhead of new
connection setup per session. This can lead
+to large performance gains in remote transaction scenarios where there are
many small mutation traversals.
+
+This option is disabled by default on the driver but providers may want to add
an option that will allow sessions to end
+on the successful completion of a graph operation (commit/rollback). This will
prevent a buildup of sessions if a user
+has enabled this option as the driver will *not* close the underlying
WebSocket connection as a signal to end the
+session. Gremlin Server has added an option like this called
`closeSessionPostGraphOp`. Remote graph providers are
+encouraged to add the same functionality.
==== Graph Driver Providers