[
https://issues.apache.org/jira/browse/TIKA-4753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18086390#comment-18086390
]
ASF GitHub Bot commented on TIKA-4753:
--------------------------------------
Copilot commented on code in PR #2870:
URL: https://github.com/apache/tika/pull/2870#discussion_r3363328003
##########
tika-server/tika-server-core/src/test/java/org/apache/tika/server/core/TikaServerIntegrationTest.java:
##########
@@ -212,11 +219,25 @@ public void testTimeout() throws Exception {
// Server should return 503 (Service Unavailable) for timeout
assertEquals(503, response.getStatus());
+ assertErrorResponseStatus(response, "TIMEOUT");
// Server should still be running - verify with a successful request
testBaseline();
}
+ /**
+ * Asserts that an error response body is JSON with a {@code status} field
matching
+ * {@code expectedStatus} (a {@code PipesResult.RESULT_STATUS} enum name).
+ */
+ private void assertErrorResponseStatus(Response response, String
expectedStatus) throws IOException {
+ try (InputStream is = (InputStream) response.getEntity()) {
+ String body = IOUtils.toString(is, StandardCharsets.UTF_8);
+ JsonNode node = new ObjectMapper().readTree(body);
+ assertEquals(expectedStatus, node.path("status").asText(null),
+ "Expected JSON error body with status=" + expectedStatus +
" but got: " + body);
+ }
+ }
Review Comment:
The new helper method is not indented consistently with the rest of the
class, and it can reuse the existing static `UTF_8` import rather than
`StandardCharsets.UTF_8`.
##########
tika-server/tika-server-core/src/test/java/org/apache/tika/server/core/TikaServerIntegrationTest.java:
##########
@@ -29,12 +29,15 @@
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
+import java.nio.charset.StandardCharsets;
Review Comment:
After switching the helper to use the existing `UTF_8` static import, this
`StandardCharsets` import becomes unused and should be removed to avoid
compiler/linter warnings.
##########
docs/modules/ROOT/pages/using-tika/server/index.adoc:
##########
@@ -156,6 +156,41 @@ curl -T document.pdf
http://localhost:9998/meta/Content-Type # single field
* `/translate/all/\{translator}/\{src}/\{dest}` — translation
* `/pipes`, `/async` — Pipes-based bulk processing
+== Error Responses
+
+When parsing fails due to a process-level problem — the forked child process
timed out,
+ran out of memory, or crashed unexpectedly — the server returns an HTTP error
with a
+JSON body whose shape matches the `PipesResult` status:
+
+[source,json]
+----
+{"status": "TIMEOUT", "message": "Task timed out after 60000ms"}
+----
+
+The `status` field is the `PipesResult.RESULT_STATUS` enum name. The `message`
field is
+present when Tika provided one, absent otherwise.
+
+[cols="1,1,3"]
+|===
+|HTTP status |`status` values |Meaning
+
+|`503 Service Unavailable`
+|`TIMEOUT`, `OOM`, `UNSPECIFIED_CRASH`, `CLIENT_UNAVAILABLE_WITHIN_MS`
+|The forked parse process failed. The server is still healthy; the client may
retry.
Review Comment:
The 503 row includes `CLIENT_UNAVAILABLE_WITHIN_MS`, which is not a
forked-process crash (it means no pipes client was available within the
configured wait time). The meaning text should be broadened so it accurately
covers all listed statuses.
##########
tika-server/tika-server-core/src/main/java/org/apache/tika/server/core/resource/PipesParsingHelper.java:
##########
@@ -184,33 +187,54 @@ private String getSuffix(Metadata metadata) {
return ".tmp";
}
+ /**
+ * Builds a JSON error response whose shape matches PipesResult
serialization:
+ * {@code {"status": "TIMEOUT", "message": "..."}}
+ * <p>
+ * This allows clients to distinguish failure modes (TIMEOUT, OOM,
UNSPECIFIED_CRASH, …)
+ * without parsing plain-text bodies or inspecting custom headers.
+ */
+ private static Response buildProcessFailureResponse(PipesResult result) {
+ ObjectMapper mapper = new ObjectMapper();
+ ObjectNode node = mapper.createObjectNode();
+ node.put("status", result.status().name());
+ if (result.message() != null && !result.message().isBlank()) {
+ node.put("message", result.message());
+ }
+ String json;
+ try {
+ json = mapper.writeValueAsString(node);
+ } catch (Exception e) {
+ json = "{\"status\":\"" + result.status().name() + "\"}";
+ }
+ return Response.status(mapStatusToHttpResponse(result.status()))
+ .entity(json)
+ .type(MediaType.APPLICATION_JSON)
+ .build();
+ }
+
/**
* Processes the PipesResult and returns the metadata list.
*/
private List<Metadata> processResult(PipesResult result) {
if (result.isProcessCrash()) {
- // Process crashed (OOM, timeout, etc.) - return 503
+ // Process crashed (OOM, timeout, unspecified crash) — 503 with
JSON status body
LOG.warn("Parse process crashed: {}", result.status());
- throw new WebApplicationException(
- "Parse failed: " + result.status(),
- mapStatusToHttpResponse(result.status()));
+ throw new
WebApplicationException(buildProcessFailureResponse(result));
}
if (result.isFatal() || result.isInitializationFailure()) {
- // Fatal or initialization error - return 500
+ // Initialization/fatal error — JSON status body, HTTP status per
mapStatusToHttpResponse
+ // (500, or 503 for CLIENT_UNAVAILABLE_WITHIN_MS)
LOG.error("Parse initialization/fatal error: {} - {}",
result.status(), result.message());
Review Comment:
The comment says initialization failures are always "500", but
`mapStatusToHttpResponse` maps some initialization failures (e.g.,
`CLIENT_UNAVAILABLE_WITHIN_MS`) to `503`. Please reword the comment so it
matches the actual status mapping.
##########
tika-server/tika-server-core/src/main/java/org/apache/tika/server/core/resource/PipesParsingHelper.java:
##########
@@ -184,33 +187,54 @@ private String getSuffix(Metadata metadata) {
return ".tmp";
}
+ /**
+ * Builds a JSON error response whose shape matches PipesResult
serialization:
+ * {@code {"status": "TIMEOUT", "message": "..."}}
+ * <p>
+ * This allows clients to distinguish failure modes (TIMEOUT, OOM,
UNSPECIFIED_CRASH, …)
+ * without parsing plain-text bodies or inspecting custom headers.
+ */
Review Comment:
This Javadoc says the response shape "matches PipesResult serialization",
but the implementation returns only a subset of fields (status + optional
message) and omits `emitData` (and omits `message` when blank). Please adjust
the wording to avoid misleading API/documentation for clients.
> Improve msg on oom/timeout in tika-server's /tika/json endpoint
> ---------------------------------------------------------------
>
> Key: TIKA-4753
> URL: https://issues.apache.org/jira/browse/TIKA-4753
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Priority: Trivial
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)