nddipiazza opened a new pull request, #2919:
URL: https://github.com/apache/tika/pull/2919

   ## Summary
   
   Fixes two bugs reported in 
[TIKA-4735](https://issues.apache.org/jira/browse/TIKA-4735).
   
   ## Changes
   
   ### Bug 1 — `-h` option conflict (`TikaAsyncCLI.java`)
   
   `-h` was bound to `--help`, so using it as a handler shorthand (as 
documented in the CLI docs) triggered the help menu instead of setting the 
handler type. Fix: removed the `-h` short option from `--help`; `--help` still 
works as a long option.
   
   ### Bug 2 — `--content-only` produced JSON output instead of raw content 
(`ParseHandler.java`)
   
   `ParseHandler.parseWithStream()` resolves `ParseMode` using a fallback to 
`defaultParseMode` (loaded from the pipes config). However, it never wrote the 
resolved mode back into `ParseContext`. `EmitHandler.emit()` checks 
`parseContext.get(ParseMode.class)` directly—with no fallback—so it always 
received `null` when `parseMode` was set only as a config default (e.g. via 
`--content-only` on the CLI). This caused it to fall through to full JSON emit.
   
   **Fix:** one line in `ParseHandler.parseWithStream()`:
   
   ```java
   parseContext.set(ParseMode.class, parseMode); // Ensure EmitHandler sees the 
resolved mode
   ```
   
   ## Critical Files
   
   - 
`tika-pipes/tika-pipes-core/src/main/java/org/apache/tika/pipes/core/server/ParseHandler.java`
 — core fix
   - 
`tika-pipes/tika-async-cli/src/main/java/org/apache/tika/async/cli/TikaAsyncCLI.java`
 — remove `-h` short option
   
   ## Testing Instructions
   
   ```bash
   mvn test -pl tika-pipes/tika-pipes-core,tika-pipes/tika-async-cli
   ```
   
   ## Review Checklist
   
   - [x] Reproducer tests added for both bugs
   - [x] Integration test `AsyncProcessorTest.testContentOnlyFromConfigDefault` 
confirms the runtime fix
   - [x] All 65 tests in affected modules pass
   - [x] No unrelated changes
   
   ## Potential Concerns
   
   The `ParseMode` is now always written into `ParseContext` inside 
`ParseHandler.parseWithStream()`. This is safe because `ParseHandler` is the 
authoritative resolver of parse mode (it already has the defaultParseMode 
fallback) and writing it into the context makes the state explicit for all 
downstream consumers (`EmitHandler`, `PipesWorker`, etc.).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to