[
https://issues.apache.org/jira/browse/HTTPCLIENT-2422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sneha Murganoor updated HTTPCLIENT-2422:
----------------------------------------
Description:
In 5.2, DecompressingEntity.getContent() returned a
LazyDecompressingInputStream that deferred GZIPInputStream creation to the
first read() call. This allowed responses with Content-Encoding: gzip but empty
or non-gzip bodies to be handled gracefully — the stream was never read or the
error surfaced at a point where callers could handle it.
In 5.4+, DecompressingEntity (moved to
org.apache.hc.client5.http.entity.compress) was rewritten to eagerly call
decoder.apply(super.getContent()) in getContent(). This immediately creates
GZIPInputStream, which reads the gzip magic bytes in its constructor. If the
body is empty (e.g., chunked transfer with zero-length body) or not actually
compressed, this throws ZipException: Not in GZIP format at getContent() time —
before the caller has any opportunity to handle it.
Reproduction:
A backend sends:
{quote}
HTTP/1.1 200 OK
Content-Encoding: gzip
Transfer-Encoding: chunked
0\r\n\r\n
(Empty chunked body with Content-Encoding: gzip header.)
{quote}
In 5.2: entity.getContent() succeeds, returns LazyDecompressingInputStream.
Caller reads EOF without error.
In 5.4+: entity.getContent() throws java.util.zip.ZipException: Not in GZIP
format.
Stack trace:
{quote}
java.util.zip.ZipException: Not in GZIP format
at
java.base/java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:197)
at java.base/java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:81)
at
org.apache.hc.client5.http.entity.compress.DecompressingEntity.getContent(DecompressingEntity.java:63)
{quote}
Context:
HTTPCLIENT-1690 reported the same class of issue (ZipException on 304 responses
with Content-Encoding: gzip). It was fixed in 4.5.5 and 5.0 Beta1 by using
LazyDecompressingInputStream. The 5.4 rewrite of DecompressingEntity removed
lazy initialization, reintroducing this failure mode.
While the backend is arguably misbehaving by sending Content-Encoding: gzip
with no body, this is common in practice (web servers that add the header
unconditionally regardless of whether compression occurred). The 5.2 behavior
was more resilient to this.
Suggested fix:
Restore lazy stream initialization in DecompressingEntity.getContent() — defer
decoder.apply() to first read(), or handle the case where the underlying stream
is empty before attempting decompression.
was:
In 5.2, DecompressingEntity.getContent() returned a
LazyDecompressingInputStream that deferred GZIPInputStream creation to the
first read() call. This allowed responses with Content-Encoding: gzip but empty
or non-gzip bodies to be handled gracefully — the stream was never read or the
error surfaced at a point where callers could handle it.
In 5.4+, DecompressingEntity (moved to
org.apache.hc.client5.http.entity.compress) was rewritten to eagerly call
decoder.apply(super.getContent()) in getContent(). This immediately creates
GZIPInputStream, which reads the gzip magic bytes in its constructor. If the
body is empty (e.g., chunked transfer with zero-length body) or not actually
compressed, this throws ZipException: Not in GZIP format at getContent() time —
before the caller has any opportunity to handle it.
Reproduction:
A backend sends:
{quote}
HTTP/1.1 200 OK
Content-Encoding: gzip
Transfer-Encoding: chunked
0\r\n\r\n
(Empty chunked body with Content-Encoding: gzip header.)
{quote}
In 5.2: entity.getContent() succeeds, returns LazyDecompressingInputStream.
Caller reads EOF without error.
In 5.4+: entity.getContent() throws java.util.zip.ZipException: Not in GZIP
format.
Stack trace:
java.util.zip.ZipException: Not in GZIP format
at
java.base/java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:197)
at java.base/java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:81)
at
org.apache.hc.client5.http.entity.compress.DecompressingEntity.getContent(DecompressingEntity.java:63)
Context:
HTTPCLIENT-1690 reported the same class of issue (ZipException on 304 responses
with Content-Encoding: gzip). It was fixed in 4.5.5 and 5.0 Beta1 by using
LazyDecompressingInputStream. The 5.4 rewrite of DecompressingEntity removed
lazy initialization, reintroducing this failure mode.
While the backend is arguably misbehaving by sending Content-Encoding: gzip
with no body, this is common in practice (web servers that add the header
unconditionally regardless of whether compression occurred). The 5.2 behavior
was more resilient to this.
Suggested fix:
Restore lazy stream initialization in DecompressingEntity.getContent() — defer
decoder.apply() to first read(), or handle the case where the underlying stream
is empty before attempting decompression.
> DecompressingEntity in 5.4+ eagerly creates decompression stream, causing
> ZipException on empty/invalid bodies (regression from 5.2 lazy behavior)
> --------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: HTTPCLIENT-2422
> URL: https://issues.apache.org/jira/browse/HTTPCLIENT-2422
> Project: HttpComponents HttpClient
> Issue Type: Bug
> Components: HttpClient (classic)
> Affects Versions: 5.4, 5.5, 5.6
> Reporter: Sneha Murganoor
> Priority: Critical
>
> In 5.2, DecompressingEntity.getContent() returned a
> LazyDecompressingInputStream that deferred GZIPInputStream creation to the
> first read() call. This allowed responses with Content-Encoding: gzip but
> empty or non-gzip bodies to be handled gracefully — the stream was never read
> or the error surfaced at a point where callers could handle it.
> In 5.4+, DecompressingEntity (moved to
> org.apache.hc.client5.http.entity.compress) was rewritten to eagerly call
> decoder.apply(super.getContent()) in getContent(). This immediately creates
> GZIPInputStream, which reads the gzip magic bytes in its constructor. If the
> body is empty (e.g., chunked transfer with zero-length body) or not actually
> compressed, this throws ZipException: Not in GZIP format at getContent() time
> — before the caller has any opportunity to handle it.
> Reproduction:
> A backend sends:
> {quote}
> HTTP/1.1 200 OK
> Content-Encoding: gzip
> Transfer-Encoding: chunked
> 0\r\n\r\n
> (Empty chunked body with Content-Encoding: gzip header.)
> {quote}
> In 5.2: entity.getContent() succeeds, returns LazyDecompressingInputStream.
> Caller reads EOF without error.
> In 5.4+: entity.getContent() throws java.util.zip.ZipException: Not in GZIP
> format.
> Stack trace:
> {quote}
> java.util.zip.ZipException: Not in GZIP format
> at
> java.base/java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:197)
> at java.base/java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:81)
> at
> org.apache.hc.client5.http.entity.compress.DecompressingEntity.getContent(DecompressingEntity.java:63)
> {quote}
> Context:
> HTTPCLIENT-1690 reported the same class of issue (ZipException on 304
> responses with Content-Encoding: gzip). It was fixed in 4.5.5 and 5.0 Beta1
> by using LazyDecompressingInputStream. The 5.4 rewrite of DecompressingEntity
> removed lazy initialization, reintroducing this failure mode.
> While the backend is arguably misbehaving by sending Content-Encoding: gzip
> with no body, this is common in practice (web servers that add the header
> unconditionally regardless of whether compression occurred). The 5.2 behavior
> was more resilient to this.
> Suggested fix:
> Restore lazy stream initialization in DecompressingEntity.getContent() —
> defer decoder.apply() to first read(), or handle the case where the
> underlying stream is empty before attempting decompression.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]