[
https://issues.apache.org/jira/browse/TIKA-4753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18086428#comment-18086428
]
ASF GitHub Bot commented on TIKA-4753:
--------------------------------------
Copilot commented on code in PR #2870:
URL: https://github.com/apache/tika/pull/2870#discussion_r3363959200
##########
docs/modules/ROOT/pages/migration-to-4x/migrating-tika-server-4x.adoc:
##########
@@ -89,6 +89,25 @@ The separate `/config` endpoints have been removed.
Configuration is now handled
**Migration:** Use `POST /tika` or `POST /tika/json` with a `config` part in
your multipart request.
+=== Error Response Bodies Are Now JSON
+
+In 3.x, error responses from `/tika`, `/rmeta`, and `/unpack` returned a
plain-text
+body such as `"Parse failed: TIMEOUT"`. In 4.x these endpoints return a JSON
body:
+
+[source,json]
+----
+{"status": "TIMEOUT", "message": "Task timed out after 60000ms"}
+----
Review Comment:
The migration guide example implies the `message` field is always present in
JSON error bodies, but the implementation only includes it when
`returnStackTrace=true` (otherwise the body is typically status-only). This can
mislead client migrations that expect `message` to exist.
##########
tika-server/tika-server-core/src/main/java/org/apache/tika/server/core/resource/PipesParsingHelper.java:
##########
@@ -184,33 +194,59 @@ private String getSuffix(Metadata metadata) {
return ".tmp";
}
+ /**
+ * Builds a JSON error response carrying a subset of the {@code
PipesResult}
+ * serialization. By default the body is just {@code {"status":
"TIMEOUT"}}. The
+ * {@code PipesResult} message frequently contains a server-side stack
trace
+ * (e.g. for {@code *_EXCEPTION} statuses), so the {@code message} field
is included
+ * only when {@code returnStackTrace} is enabled — matching the legacy
+ * {@code TikaServerParseExceptionMapper}, which gates stack traces the
same way.
+ * Successful-parse fields such as {@code emitData} are never part of an
error body.
+ * <p>
+ * This allows clients to distinguish failure modes (TIMEOUT, OOM,
UNSPECIFIED_CRASH, …)
+ * without parsing plain-text bodies or inspecting custom headers.
+ */
+ private Response buildProcessFailureResponse(PipesResult result) {
+ ObjectMapper mapper = new ObjectMapper();
+ ObjectNode node = mapper.createObjectNode();
+ node.put("status", result.status().name());
+ if (returnStackTrace && result.message() != null &&
!result.message().isBlank()) {
+ node.put("message", result.message());
+ }
+ String json;
+ try {
+ json = mapper.writeValueAsString(node);
+ } catch (Exception e) {
+ json = "{\"status\":\"" + result.status().name() + "\"}";
+ }
Review Comment:
Exceptions during JSON serialization are swallowed silently here. If
serialization fails, we’ll return a fallback body without any indication in
logs, which makes diagnosing unexpected failures harder.
> Improve msg on oom/timeout in tika-server's /tika/json endpoint
> ---------------------------------------------------------------
>
> Key: TIKA-4753
> URL: https://issues.apache.org/jira/browse/TIKA-4753
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Priority: Trivial
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)