nddipiazza opened a new pull request, #2919: URL: https://github.com/apache/tika/pull/2919
## Summary Fixes two bugs reported in [TIKA-4735](https://issues.apache.org/jira/browse/TIKA-4735). ## Changes ### Bug 1 — `-h` option conflict (`TikaAsyncCLI.java`) `-h` was bound to `--help`, so using it as a handler shorthand (as documented in the CLI docs) triggered the help menu instead of setting the handler type. Fix: removed the `-h` short option from `--help`; `--help` still works as a long option. ### Bug 2 — `--content-only` produced JSON output instead of raw content (`ParseHandler.java`) `ParseHandler.parseWithStream()` resolves `ParseMode` using a fallback to `defaultParseMode` (loaded from the pipes config). However, it never wrote the resolved mode back into `ParseContext`. `EmitHandler.emit()` checks `parseContext.get(ParseMode.class)` directly—with no fallback—so it always received `null` when `parseMode` was set only as a config default (e.g. via `--content-only` on the CLI). This caused it to fall through to full JSON emit. **Fix:** one line in `ParseHandler.parseWithStream()`: ```java parseContext.set(ParseMode.class, parseMode); // Ensure EmitHandler sees the resolved mode ``` ## Critical Files - `tika-pipes/tika-pipes-core/src/main/java/org/apache/tika/pipes/core/server/ParseHandler.java` — core fix - `tika-pipes/tika-async-cli/src/main/java/org/apache/tika/async/cli/TikaAsyncCLI.java` — remove `-h` short option ## Testing Instructions ```bash mvn test -pl tika-pipes/tika-pipes-core,tika-pipes/tika-async-cli ``` ## Review Checklist - [x] Reproducer tests added for both bugs - [x] Integration test `AsyncProcessorTest.testContentOnlyFromConfigDefault` confirms the runtime fix - [x] All 65 tests in affected modules pass - [x] No unrelated changes ## Potential Concerns The `ParseMode` is now always written into `ParseContext` inside `ParseHandler.parseWithStream()`. This is safe because `ParseHandler` is the authoritative resolver of parse mode (it already has the defaultParseMode fallback) and writing it into the context makes the state explicit for all downstream consumers (`EmitHandler`, `PipesWorker`, etc.). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
