[
https://issues.apache.org/jira/browse/TIKA-4742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18084159#comment-18084159
]
ASF GitHub Bot commented on TIKA-4742:
--------------------------------------
Copilot commented on code in PR #2844:
URL: https://github.com/apache/tika/pull/2844#discussion_r3319658781
##########
docs/modules/ROOT/pages/pipes/troubleshooting.adoc:
##########
@@ -112,6 +143,55 @@ When the watcher fires, the child exits via `System.exit`,
which runs
`AbstractExternalProcessParser`'s shutdown hook and cleans up any
in-flight external subprocesses.
+== Log levels and sensitive data
+
+Tika Pipes treats `FetchKey` and `EmitKey` values as potentially sensitive --
+they typically contain file paths, URLs, object-store keys, or other
identifiers
+that may be private to the data owner. The convention across pipes core and the
+bundled plugins is:
+
+[cols="1,3"]
+|===
+|Level |What is logged
+
+|`ERROR` / `WARN`
+|Failures, exceptions, and configuration problems. *Never* the literal
+ `fetchKey`/`emitKey` or any file content. When a failure refers to a
+ specific document, it is identified by the non-sensitive `FetchEmitTuple.id`
+ (e.g. `parse exception: id=abc-123`).
+
+|`INFO`
+|Lifecycle events
> Review logging levels and configuration for 4.0.0-beta-1
> --------------------------------------------------------
>
> Key: TIKA-4742
> URL: https://issues.apache.org/jira/browse/TIKA-4742
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Priority: Minor
>
> There are a number of places where we've hit a stable state and we should
> downgrade from info to debug.
> We also are still including fetchkeys and emitkeys in logging in fetchers and
> emitters, which is not great from a security standpoint. We should demote
> those to {{{}trace(){}}}. We might consider adding mdc to inject
> fetchEmitTuple ids in fetchers+emitters.
> Then there are places where we have info level for actual problems.
> We should do a review of logging levels before 4.0.0-beta-1.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)