nddipiazza opened a new pull request, #2672: URL: https://github.com/apache/tika/pull/2672
## Summary Adds HTTP/2 (h2c cleartext) support to tika-server by including the `org.eclipse.jetty.http2:http2-server` jar on the classpath. When this jar is present, CXF's Jetty transport automatically negotiates HTTP/2 alongside HTTP/1.1 on the existing port (default 9998). Existing HTTP/1.1 clients are completely unaffected. This implements [TIKA-4679](https://issues.apache.org/jira/browse/TIKA-4679). The core dependency change was originally contributed by Lawrence Moorehead ([@elemdisc](https://github.com/elemdisc)) — see [elemdisc/tika PR#1](https://github.com/elemdisc/tika/pull/1) — and is cherry-picked here with full author credit. --- ## Changes ### tika-parent/pom.xml - Added `http2-server` to the dependency management block alongside the existing `http2-hpack`, `http2-client`, `http2-common` entries (all at `${jetty.http2.version}`) ### tika-server/tika-server-core/pom.xml _(Lawrence Moorehead's commit)_ - Added `org.eclipse.jetty.http2:http2-server` runtime dependency (version from parent BOM) ### tika-server/tika-server-core/src/test/.../TikaServerIntegrationTest.java _(Lawrence Moorehead's commit)_ - Added `testH2c()` unit test that sends a request via `HttpClient.Version.HTTP_2` and asserts the response was served over HTTP/2 ### tika-e2e-tests/tika-server/ _(new module)_ - New e2e module that starts the actual fat-jar process and validates HTTP/2 (h2c) end-to-end - Tests are skipped by default; run with `-Pe2e` - Wired into `tika-e2e-tests/pom.xml` --- ## How it works Adding `http2-server` to the classpath is sufficient for h2c (HTTP/2 cleartext) support. CXF's `JettyHTTPServerEngineFactory` detects the jar at startup and wires in `HTTP2CServerConnectionFactory`. No startup code changes are required. For h2 over TLS (recommended for production), configure `TlsConfig` in `tika-server.json`. Java 17's built-in ALPN handles protocol negotiation automatically — no separate ALPN agent is needed. --- ## Port management - Single port (9998 by default) continues to serve both HTTP/1.1 and HTTP/2 - No second port added; Docker `EXPOSE 9998` and health-check are unchanged - The fat-jar grows by ~500 KB from the new jar --- ## Shutdown note HTTP/2 multiplexes multiple requests over a single TCP connection. The current `shutdownNow()` path does not send a GOAWAY frame before closing. Under moderate load this is acceptable for h2c, but a future improvement could add a drain timeout for graceful HTTP/2 shutdown. --- ## Backward compatibility Purely additive classpath change: - Does **not** change the default port - Does **not** require TLS (TLS remains opt-in) - Does **not** break any existing HTTP/1.1 client - Does **not** change the REST API surface --- ## Testing Instructions ```bash # Unit test (no external process) mvn test -pl tika-server/tika-server-core -Dtest=TikaServerIntegrationTest#testH2c # E2E test (requires fat-jar to be built first) mvn package -pl tika-server/tika-server-standard -DskipTests mvn test -pl tika-e2e-tests/tika-server -Pe2e ``` Manually with curl (after starting the server): ```bash # HTTP/2 cleartext (h2c) curl --http2-prior-knowledge http://localhost:9998/tika # HTTP/1.1 — unchanged behavior curl http://localhost:9998/tika ``` --- ## Review Checklist - [ ] `http2-server` version comes from `${jetty.http2.version}` in parent BOM (not hardcoded) - [ ] Existing HTTP/1.1 tests still pass - [ ] `TikaServerIntegrationTest#testH2c` passes - [ ] E2E module compiles and tests pass with `-Pe2e` - [ ] No second port introduced --- ## Potential Concerns - **h2c vs h2**: This PR enables h2c (cleartext). For h2 over TLS an additional `jetty-alpn-java-server` dependency may be needed depending on the Jetty version and JVM. This can be addressed in a follow-up. - **Reverse proxies**: Most reverse proxies (nginx, AWS ALB, GCP LB) do not support h2c — they require h2 over TLS. For internal service-to-service use h2c is fine; for edge deployments, TLS is recommended. - **Fat-jar size**: The `http2-server` jar adds ~500 KB to `tika-server-standard`. This also increases the `apache/tika` Docker image slightly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
