[ 
https://issues.apache.org/jira/browse/TIKA-4753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18086428#comment-18086428
 ] 

ASF GitHub Bot commented on TIKA-4753:
--------------------------------------

Copilot commented on code in PR #2870:
URL: https://github.com/apache/tika/pull/2870#discussion_r3363959200


##########
docs/modules/ROOT/pages/migration-to-4x/migrating-tika-server-4x.adoc:
##########
@@ -89,6 +89,25 @@ The separate `/config` endpoints have been removed. 
Configuration is now handled
 
 **Migration:** Use `POST /tika` or `POST /tika/json` with a `config` part in 
your multipart request.
 
+=== Error Response Bodies Are Now JSON
+
+In 3.x, error responses from `/tika`, `/rmeta`, and `/unpack` returned a 
plain-text
+body such as `"Parse failed: TIMEOUT"`. In 4.x these endpoints return a JSON 
body:
+
+[source,json]
+----
+{"status": "TIMEOUT", "message": "Task timed out after 60000ms"}
+----

Review Comment:
   The migration guide example implies the `message` field is always present in 
JSON error bodies, but the implementation only includes it when 
`returnStackTrace=true` (otherwise the body is typically status-only). This can 
mislead client migrations that expect `message` to exist.



##########
tika-server/tika-server-core/src/main/java/org/apache/tika/server/core/resource/PipesParsingHelper.java:
##########
@@ -184,33 +194,59 @@ private String getSuffix(Metadata metadata) {
         return ".tmp";
     }
 
+    /**
+     * Builds a JSON error response carrying a subset of the {@code 
PipesResult}
+     * serialization. By default the body is just {@code {"status": 
"TIMEOUT"}}. The
+     * {@code PipesResult} message frequently contains a server-side stack 
trace
+     * (e.g. for {@code *_EXCEPTION} statuses), so the {@code message} field 
is included
+     * only when {@code returnStackTrace} is enabled — matching the legacy
+     * {@code TikaServerParseExceptionMapper}, which gates stack traces the 
same way.
+     * Successful-parse fields such as {@code emitData} are never part of an 
error body.
+     * <p>
+     * This allows clients to distinguish failure modes (TIMEOUT, OOM, 
UNSPECIFIED_CRASH, …)
+     * without parsing plain-text bodies or inspecting custom headers.
+     */
+    private Response buildProcessFailureResponse(PipesResult result) {
+        ObjectMapper mapper = new ObjectMapper();
+        ObjectNode node = mapper.createObjectNode();
+        node.put("status", result.status().name());
+        if (returnStackTrace && result.message() != null && 
!result.message().isBlank()) {
+            node.put("message", result.message());
+        }
+        String json;
+        try {
+            json = mapper.writeValueAsString(node);
+        } catch (Exception e) {
+            json = "{\"status\":\"" + result.status().name() + "\"}";
+        }

Review Comment:
   Exceptions during JSON serialization are swallowed silently here. If 
serialization fails, we’ll return a fallback body without any indication in 
logs, which makes diagnosing unexpected failures harder.





> Improve msg on oom/timeout in tika-server's /tika/json endpoint
> ---------------------------------------------------------------
>
>                 Key: TIKA-4753
>                 URL: https://issues.apache.org/jira/browse/TIKA-4753
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Trivial
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to