kenhuuu opened a new pull request, #3484:
URL: https://github.com/apache/tinkerpop/pull/3484
## Bound HTTP transaction lifetime: suspend idle timer while busy + add
`maxTransactionLifetime`
### What changed
Two related changes to how Gremlin Server bounds the lifetime of an HTTP
transaction (each open transaction owns a dedicated worker thread and a
`maxConcurrentTransactions` slot, so an unbounded transaction is a real resource
leak).
**Idle timer suspends while busy.** The `transactionTimeout` idle timer was
armed on request *arrival*, so a single operation running longer than the
timeout would trip it mid-execution and roll back a perfectly healthy
transaction. It now arms only when the transaction goes genuinely idle (no
operation running or queued) and is suspended while work is in flight — a long
operation is bounded by `evaluationTimeout`, not the idle timer. The
setting is renamed `transactionTimeout` → `idleTransactionTimeout` to
reflect what it actually means, and `0` now correctly disables it (the docs
always claimed this; the code never honored it).
**New `maxTransactionLifetime` absolute cap.** A new setting that bounds
*total* transaction age regardless of activity, closing the gap where a client
holds a transaction open indefinitely via one long operation or
keep-alive drips. When it fires it interrupts the running operation and
rolls the transaction back, and the in-flight client receives a
transaction-timeout (504) rather than a misleading "increase evaluationTimeout"
error.
### Why
The committed idle-timer behavior acted more like a per request timeout
instead of the described idle timeout and there was no ceiling on transaction
lifetime at all. Together these give three composable bounds —
per-operation (`evaluationTimeout`), between-operations
(`idleTransactionTimeout`), and whole-transaction (`maxTransactionLifetime`) —
mirroring PostgreSQL's `statement_timeout` /
`idle_in_transaction_session_timeout` /
`transaction_timeout`.
Notably, the server **does not validate** these settings or reject begins
when bounds are disabled. Instead it ships sane defaults (idle 1 min, lifetime
10 min): a transaction is bounded out of the box, disabling the bounds
is a deliberate operator choice, and a client's per-request `timeoutMs` is
always honored as sent rather than second-guessed.
### Review guide
- **`UnmanagedTransaction` — the executor swap.** The single-thread executor
is now a `ThreadPoolExecutor(1,1)` subclass purely to expose
`beforeExecute`/`afterExecute` + the queue, which drive the suspend-while-busy
logic.
The key invariant: **submitted tasks must not be wrapped** — `submit()`
returns the same `FutureTask` so the eval-timeout / cap `cancel(true)`
interrupts the real work.
- **Concurrency on the idle/cap timers.** `maybeScheduleIdleTimer` re-checks
`accepting` *after* arming (so a concurrent `close()` can't be raced into
re-arming a dying transaction); the in-flight op is tracked as a single
immutable `Running(future, context)` pair so the cap never flags one
operation's `Context` while interrupting another's future.
- **Ownership split (intentional asymmetry).** The idle timer lives in
`UnmanagedTransaction` (it must see the executor hooks); the lifetime cap is
scheduled/cancelled by `TransactionManager` (a fixed schedule tied to
registry membership). The cap is armed **after** `putIfAbsent` so it can
never fire into an unregistered transaction and leak a thread; `destroy()`
cancels it on every close path.
- **`close()` ordering is still load-bearing** — `manager.destroy()` before
`executor.shutdown()`, graceful `shutdown()` (not `shutdownNow()`). The cap
path reuses this exact path; verify it's unchanged.
- **Error mapping** — cap-kill → 504 `TransactionException` via a
`Context.closedByLifetimeCap` flag set before the interrupt; ordinary eval
timeout still → 500. Both the eval-timeout writer and
`formErrorResponseMessage` are
cap-aware so the code is correct regardless of which thread writes the
response first.
- **Tests** — deterministic timer behavior is unit-tested via a
virtual-clock `ManualScheduledExecutorService` (no `Thread.sleep` flakiness);
integration tests assert the guarantee actually made (transaction reclaimed /
subsequent 404), since timing can't reliably catch the mid-op interrupt.
<!--
Thanks for contributing! Reminders:
+ TARGET the earliest branch where you want the change
3.7-dev -> 3.7.7 (non-breaking only)
3.8-dev -> 3.8.2 (non-breaking only)
master -> 4.0.0
+ Committers will MERGE the PR forward to newer versions
+ ADD entry to the CHANGELOG.asciidoc for the targeted version
Do not reference a JIRA number there
+ ADD JIRA number to title and link in description
+ PRs requires 3 +1s from committers OR
1 +1 and 7 day wait to merge.
+ MORE details: https://s.apache.org/rtnal
-->
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]