Copilot commented on code in PR #2870:
URL: https://github.com/apache/tika/pull/2870#discussion_r3363147660
##########
tika-server/tika-server-core/src/test/java/org/apache/tika/server/core/TikaServerIntegrationTest.java:
##########
@@ -212,11 +219,23 @@ public void testTimeout() throws Exception {
// Server should return 503 (Service Unavailable) for timeout
assertEquals(503, response.getStatus());
+ assertErrorResponseStatus(response, "TIMEOUT");
// Server should still be running - verify with a successful request
testBaseline();
}
+ /**
+ * Asserts that an error response body is JSON with a {@code status} field
matching
+ * {@code expectedStatus} (a {@code PipesResult.STATUS} enum name).
+ */
+ private void assertErrorResponseStatus(Response response, String
expectedStatus) throws IOException {
+ String body = IOUtils.toString((InputStream) response.getEntity(),
StandardCharsets.UTF_8);
+ JsonNode node = new ObjectMapper().readTree(body);
+ assertEquals(expectedStatus, node.get("status").asText(),
+ "Expected JSON error body with status=" + expectedStatus + "
but got: " + body);
+ }
Review Comment:
The helper currently assumes the response entity always contains a non-null
`status` field and will throw a NullPointerException on
`node.get("status").asText()` if the server returns a non-conforming body.
Also, the entity InputStream isn't closed here, which can leak resources during
the integration test run. Using `path("status")` avoids the NPE and
try-with-resources ensures the stream is closed.
##########
tika-server/tika-server-core/src/main/java/org/apache/tika/server/core/resource/PipesParsingHelper.java:
##########
@@ -184,33 +187,53 @@ private String getSuffix(Metadata metadata) {
return ".tmp";
}
+ /**
+ * Builds a JSON error response whose shape matches PipesResult
serialization:
+ * {@code {"status": "TIMEOUT", "message": "..."}}
+ * <p>
+ * This allows clients to distinguish failure modes (TIMEOUT, OOM,
UNSPECIFIED_CRASH, …)
+ * without parsing plain-text bodies or inspecting custom headers.
+ */
+ private static Response buildProcessFailureResponse(PipesResult result) {
+ ObjectMapper mapper = new ObjectMapper();
+ ObjectNode node = mapper.createObjectNode();
+ node.put("status", result.status().name());
+ if (result.message() != null) {
+ node.put("message", result.message());
+ }
+ String json;
+ try {
+ json = mapper.writeValueAsString(node);
+ } catch (Exception e) {
+ json = "{\"status\":\"" + result.status().name() + "\"}";
+ }
+ return Response.status(mapStatusToHttpResponse(result.status()))
+ .entity(json)
+ .type(MediaType.APPLICATION_JSON)
+ .build();
+ }
+
/**
* Processes the PipesResult and returns the metadata list.
*/
private List<Metadata> processResult(PipesResult result) {
if (result.isProcessCrash()) {
- // Process crashed (OOM, timeout, etc.) - return 503
+ // Process crashed (OOM, timeout, unspecified crash) — 503 with
JSON status body
LOG.warn("Parse process crashed: {}", result.status());
- throw new WebApplicationException(
- "Parse failed: " + result.status(),
- mapStatusToHttpResponse(result.status()));
+ throw new
WebApplicationException(buildProcessFailureResponse(result));
}
if (result.isFatal() || result.isInitializationFailure()) {
- // Fatal or initialization error - return 500
+ // Server misconfiguration — 500 with JSON status body
LOG.error("Parse initialization/fatal error: {} - {}",
result.status(), result.message());
- throw new WebApplicationException(
- "Parse failed: " + result.status(),
- mapStatusToHttpResponse(result.status()));
+ throw new
WebApplicationException(buildProcessFailureResponse(result));
Review Comment:
This comment says initialization/fatal errors are "— 500", but
`mapStatusToHttpResponse` can return `503` for `CLIENT_UNAVAILABLE_WITHIN_MS`
(which is an initialization failure). Updating the comment to avoid hard-coding
500 will keep it accurate as mappings evolve.
##########
tika-server/tika-server-core/src/main/java/org/apache/tika/server/core/resource/PipesParsingHelper.java:
##########
@@ -184,33 +187,53 @@ private String getSuffix(Metadata metadata) {
return ".tmp";
}
+ /**
+ * Builds a JSON error response whose shape matches PipesResult
serialization:
+ * {@code {"status": "TIMEOUT", "message": "..."}}
+ * <p>
+ * This allows clients to distinguish failure modes (TIMEOUT, OOM,
UNSPECIFIED_CRASH, …)
+ * without parsing plain-text bodies or inspecting custom headers.
+ */
+ private static Response buildProcessFailureResponse(PipesResult result) {
+ ObjectMapper mapper = new ObjectMapper();
+ ObjectNode node = mapper.createObjectNode();
+ node.put("status", result.status().name());
+ if (result.message() != null) {
+ node.put("message", result.message());
+ }
Review Comment:
`buildProcessFailureResponse` claims to match `PipesResult` serialization,
but it currently includes a `message` field whenever `result.message() != null`
(including empty/blank strings). `PipesResultSerializer` omits the message when
it's blank, and the docs added in this PR also say the field is absent when not
provided. Consider mirroring that behavior to keep responses stable for clients.
##########
docs/modules/ROOT/pages/using-tika/server/index.adoc:
##########
@@ -156,6 +156,41 @@ curl -T document.pdf
http://localhost:9998/meta/Content-Type # single field
* `/translate/all/\{translator}/\{src}/\{dest}` — translation
* `/pipes`, `/async` — Pipes-based bulk processing
+== Error Responses
+
+When parsing fails due to a process-level problem — the forked child process
timed out,
+ran out of memory, or crashed unexpectedly — the server returns an HTTP error
with a
+JSON body whose shape matches the `PipesResult` status:
+
+[source,json]
+----
+{"status": "TIMEOUT", "message": "Task timed out after 60000ms"}
+----
+
+The `status` field is the `PipesResult.STATUS` enum name. The `message` field
is
+present when Tika provided one, absent otherwise.
Review Comment:
The documentation refers to `PipesResult.STATUS`, but the actual enum is
`PipesResult.RESULT_STATUS`. Using the correct type name will help users locate
the enum and avoid confusion.
##########
tika-server/tika-server-core/src/test/java/org/apache/tika/server/core/TikaServerIntegrationTest.java:
##########
@@ -212,11 +219,23 @@ public void testTimeout() throws Exception {
// Server should return 503 (Service Unavailable) for timeout
assertEquals(503, response.getStatus());
+ assertErrorResponseStatus(response, "TIMEOUT");
// Server should still be running - verify with a successful request
testBaseline();
}
+ /**
+ * Asserts that an error response body is JSON with a {@code status} field
matching
+ * {@code expectedStatus} (a {@code PipesResult.STATUS} enum name).
+ */
Review Comment:
The Javadoc references a `PipesResult.STATUS` enum, but the actual enum type
is `PipesResult.RESULT_STATUS`. Aligning the comment with the real API name
avoids confusion when navigating the code.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]