[
https://issues.apache.org/jira/browse/TIKA-4721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18076598#comment-18076598
]
ASF GitHub Bot commented on TIKA-4721:
--------------------------------------
nddipiazza opened a new pull request, #2791:
URL: https://github.com/apache/tika/pull/2791
## Summary
Fixes the intermittent `testGracefulShutdown` failure in the **main jdk17
windows build (multi-locale)** CI job (tr_TR.UTF-8 / de_DE.UTF-8).
The failure was **not** locale-related despite appearing only in that job.
The root cause was a TOCTOU (time-of-check-time-of-use) race in
`SharedServerManager.startServer()`:
1. A `ServerSocket` was opened to find a free port → then **closed**
2. The child process was started and told to bind to that same port
3. Between step 1 and step 2, another process (or the OS in TIME_WAIT state,
which is far more common on slow Windows runners) could grab the port
When this happened, the child process failed to bind, never printed
`READY:{port}`, and the parse on the next iteration returned a non-success
result.
## Changes
**`SharedServerManager`**
- Remove the TOCTOU port probe (open/close ServerSocket just to learn a port)
- Pass `TIKA_PIPES_PORT=0` to the child process
- `waitForServerReady()` now returns the actual port read from the
`READY:{port}` signal and sets `serverPort` accordingly
**`PipesServer`**
- In `runSharedMode()`, print `READY:{serverSocket.getLocalPort()}` instead
of `READY:{port}` so the actual ephemeral port assigned by the OS is reported
## Review Focus Areas
- `SharedServerManager.startServer()` — TOCTOU removal
- `SharedServerManager.waitForServerReady()` — now returns int (actual port)
- `PipesServer.runSharedMode()` — `getLocalPort()` instead of `port`
## Critical Files
-
`tika-pipes/tika-pipes-core/src/main/java/org/apache/tika/pipes/core/SharedServerManager.java`
-
`tika-pipes/tika-pipes-core/src/main/java/org/apache/tika/pipes/core/server/PipesServer.java`
## Testing Instructions
```bash
cd tika-pipes/tika-pipes-integration-tests
mvn test -Dtest=SharedServerModeTest#testGracefulShutdown
```
The existing CI job **main jdk17 windows build (multi-locale)** should stop
failing intermittently once this merges.
## Review Checklist
- [x] No functional behavior change — shared server still starts on a free
port, clients connect to the same port
- [x] Removes a latent race condition that was causing intermittent CI
failures
- [x] No new dependencies or test infrastructure changes needed
## Potential Concerns
The `READY:{port}` stdout protocol is internal — only `SharedServerManager`
reads it. No external consumers are affected by this change.
Closes TIKA-4721
> Fix intermittent locale-sensitive test failures in
> tika-pipes-integration-tests on Windows (tr_TR, de_DE)
> ---------------------------------------------------------------------------------------------------------
>
> Key: TIKA-4721
> URL: https://issues.apache.org/jira/browse/TIKA-4721
> Project: Tika
> Issue Type: Bug
> Reporter: Nicholas DiPiazza
> Priority: Minor
>
> h2. Summary
> The *main jdk17 windows build (multi-locale)* CI job fails intermittently on
> the {{main}} branch with test failures in the
> {{tika-pipes-integration-tests}} module ({{Apache Tika pipes core integration
> tests}}). The failures occur specifically under the Turkish ({{tr_TR.UTF-8}})
> and German ({{de_DE.UTF-8}}) locales on Windows runners.
> h2. Evidence
> The following workflow runs on {{main}} failed with this issue before any
> related PR changes:
> * [Run 24999911562|https://github.com/apache/tika/actions/runs/24999911562]
> -- job: build (17, tr_TR.UTF-8)
> * [Run 24942999949|https://github.com/apache/tika/actions/runs/24942999949]
> -- job: build (17, tr_TR.UTF-8)
> * [Run 24924376894|https://github.com/apache/tika/actions/runs/24924376894]
> -- job: build (17, tr_TR.UTF-8)
> * [Run 24887823286|https://github.com/apache/tika/actions/runs/24887823286]
> -- job: build (17, tr_TR.UTF-8)
> The Maven build output shows:
> {noformat}
> [INFO] Apache Tika pipes core integration tests ....... FAILURE [08:57 min]
> [ERROR] Failed to execute goal
> org.apache.maven.plugins:maven-surefire-plugin:3.5.5:test
> (default-test) on project tika-pipes-integration-tests: There are
> test failures.
> {noformat}
> h2. Root Cause (suspected)
> Locale-sensitive string operations (e.g. {{String.toLowerCase()}} /
> {{String.toUpperCase()}} called without an explicit {{Locale}} argument) are
> the most common cause of this class of failure. In Turkish,
> {{"I".toLowerCase()}} produces {{ı}} (dotless i) instead of {{"i"}}, which
> breaks equality checks, map lookups, and assertions that assume English
> casing semantics.
> A {{MockUpperCaseFilter}} in the test suite already uses {{Locale.US}}
> correctly, suggesting awareness of this issue, but the root cause may be in
> production code or config parsing hit by the integration tests.
> h2. Steps to Reproduce
> # Run the {{main-jdk17-windows-build-multi-locale}} GitHub Actions workflow
> on any branch
> # Observe intermittent failures in the {{tika-pipes-integration-tests}}
> module under {{tr_TR.UTF-8}} locale
> h2. Suggested Fix
> # Identify the failing test(s) from the surefire report in a failed CI run
> # Audit all {{toLowerCase()}} / {{toUpperCase()}} calls in {{tika-pipes}} and
> {{tika-pipes-integration-tests}} that do not pass an explicit {{Locale}}
> # Replace with {{toLowerCase(Locale.ROOT)}} / {{toUpperCase(Locale.ROOT)}}
> where locale-independence is intended
> # Consider adding a static analysis rule (e.g. Checkstyle
> {{IllegalMethodCall}} or SpotBugs {{DM_CONVERT_CASE}}) to prevent regressions
--
This message was sent by Atlassian Jira
(v8.20.10#820010)