I have found that the OpenJDK net team is very open to receiving patches.

Have you filed an issue that has been accepted? This is usually the first step.

> On Jul 29, 2024, at 5:20 PM, Andy Boothe <andy.boo...@gmail.com> wrote:
> 
> First, thank you both for the responses. I know how busy everyone is, and I 
> really appreciate the time.
> 
> We can talk about use cases and architecture and such, but I think we all 
> agree that a developer should be able to make an HTTP request with HttpClient 
> without worrying about whether or not it will cause an OOM. Or, at least, 
> that whether or not it causes an OOM should be fully within their control. 
> And that's not where this implementation is right now.
> 
> I’d be very happy to work on a fix for this. Would it be out of order for me 
> to propose a patch?
> 
> Andy Boothe
> Email: andy.boo...@gmail.com <mailto:andy.boo...@gmail.com>
> Mobile: (979) 574-1089
> 
> 
> On Mon, Jul 29, 2024 at 3:57 PM robert engels <reng...@ix.netcom.com 
> <mailto:reng...@ix.netcom.com>> wrote:
> Yes, but normally you fork a worker process that tracks progress and scrapes 
> N sites. If the worker process dies processing a site, the site is marked 
> “bad” and only periodically scraped after a retry/backoff period.
> 
> There are probably a lot of ways to crash a worker process, intentionally or 
> accidentally - a robust design is called for.
> 
> As an aside, if I was writing a large scale scraper I don’t think I would use 
> HttpClient anyway - I think a custom url accessor would be easier to monitor, 
> etc.
> 
>> On Jul 29, 2024, at 3:43 PM, Ethan McCue <et...@mccue.dev 
>> <mailto:et...@mccue.dev>> wrote:
>> 
>> Scraping of unknown/untrusted websites is a common task in certain...fields? 
>> I don't want to comment on it too deeply, but I know that is something folks 
>> would do.
>> 
>> Imagine a site where someone inputs a URL, clicks submit, and then with the 
>> power of funding they return a summary of the page.
>> 
>> On Mon, Jul 29, 2024, 3:52 PM robert engels <reng...@ix.netcom.com 
>> <mailto:reng...@ix.netcom.com>> wrote:
>> Isn’t the HttpClient almost always used to access other services?
>> 
>> Why would a developer access a malicious service?
>> 
>> I also think there are lots of ways for a service to crash the client - .e.g 
>> it could attempt to return a very large response - if the client uses a 
>> memory buffered reader, it will cause an OOM as well.
>> 
>>> On Jul 29, 2024, at 2:42 PM, Andy Boothe <andy.boo...@gmail.com 
>>> <mailto:andy.boo...@gmail.com>> wrote:
>>> 
>>> Following up here.
>>> 
>>> I believe I have discovered that it is possible to craft a malicious HTTP 
>>> response that can cause the built-in HttpURLConnection and HttpClient 
>>> implementations to throw exceptions. Specifically, HttpURLConnection can be 
>>> made to throw a NegativeArraySizeException, and HttpClient can be made to 
>>> throw an OutOfMemoryError. Proof of this behavior is in the attached (very 
>>> simple) Java programs.
>>> 
>>> This seems like A Bad Thing to me.
>>> 
>>> I've moved from the dev list to this list based on a recommendation from 
>>> that list. Is this the right list? If not, can you point me in the right 
>>> direction? Perhaps a security list?
>>> 
>>> Thank you,
>>> 
>>> Andy Boothe
>>> Email: andy.boo...@gmail.com <mailto:andy.boo...@gmail.com>
>>> Mobile: (979) 574-1089
>>> On Wed, Jul 24, 2024 at 4:47 PM Andy Boothe <andy.boo...@gmail.com 
>>> <mailto:andy.boo...@gmail.com>> wrote:
>>> Hello,
>>> 
>>> I'm moving this thread from jdk-dev to this list on the sage advice of 
>>> Pavel Rappo.
>>> 
>>> As a brief recap, it looks like HttpClient and HttpURLConnection do not 
>>> currently support a way to set the maximum acceptable response header 
>>> length. As a result, sending HTTP requests with these classes that result 
>>> in a response with very long headers causes an OutOfMemoryError and a 
>>> NegativeArraySizeException, respectively. (Simple programs for reproducing 
>>> the issue are attached.) This seems like A Bad Thing. There is a (very 
>>> brief) discussion in the thread about how to handle, but of course you guys 
>>> are the experts.
>>> 
>>> If my head is on straight and this turns out to be a real issue as opposed 
>>> to a mistake on my part, I'm keen to help however I can. 
>>> 
>>> Andy Boothe
>>> Email: andy.boo...@gmail.com <mailto:andy.boo...@gmail.com>
>>> Mobile: (979) 574-1089
>>> 
>>> 
>>> ---------- Forwarded message ---------
>>> From: Pavel Rappo <pavel.ra...@oracle.com <mailto:pavel.ra...@oracle.com>>
>>> Date: Wed, Jul 24, 2024 at 4:30 PM
>>> Subject: Re: Very long response headers and java.net.http.HttpClient?
>>> To: Andy Boothe <andy.boo...@gmail.com <mailto:andy.boo...@gmail.com>>
>>> Cc: jdk-...@openjdk.org <mailto:jdk-...@openjdk.org> <jdk-...@openjdk.org 
>>> <mailto:jdk-...@openjdk.org>>
>>> 
>>> 
>>> A proper list would be net-dev at openjdk.java.net 
>>> <http://openjdk.java.net/>.
>>> 
>>> > On 24 Jul 2024, at 21:13, Andy Boothe <andy.boo...@gmail.com 
>>> > <mailto:andy.boo...@gmail.com>> wrote:
>>> > 
>>> > Hello,
>>> > 
>>> > I'm documenting some guidelines for using java.net.http.HttpClient 
>>> > defensively for my team. For example: "Always set a request timeout", 
>>> > "Don't assume HTTP response entities are small and/or will fit in 
>>> > memory", etc.
>>> > 
>>> > One guideline I'd like to document is "Set a maximum for HTTP response 
>>> > header size." However, I can't seem to find a way to set that limit, 
>>> > either in documentation or in OpenJDK code.
>>> > 
>>> > I tried my best to search the archives for this mailing list for any 
>>> > mentions, but came up empty.
>>> > 
>>> > To make sure my head is on straight and there isn't an undocumented limit 
>>> > set by default, I wrote the attached (very quick and dirty) client and 
>>> > server programs. LongResponseHeaderDemoServer opens a raw server socket 
>>> > and reads (what it assumes is) a well-formed HTTP request, and then 
>>> > prints an HTTP response which includes a response header of infinite 
>>> > length. LongResponseHeaderDemoHttpClient uses java.net.http.HttpClient to 
>>> > make a request and print the response body.
>>> > 
>>> > When I run LongResponseHeaderDemoServer in one terminal and make a curl 
>>> > request to the server in another terminal, this is what curl spits out:
>>> > 
>>> > $ curl -vvv -D - http://localhost:3000 <http://localhost:3000/>
>>> > * Host localhost:3000 was resolved.
>>> > * IPv6: ::1
>>> > * IPv4: 127.0.0.1
>>> > *   Trying [::1]:3000...
>>> > * Connected to localhost (::1) port 3000
>>> > > GET / HTTP/1.1
>>> > > Host: localhost:3000
>>> > > User-Agent: curl/8.6.0
>>> > > Accept: */*
>>> > > 
>>> > < HTTP/1.1 200 OK
>>> > HTTP/1.1 200 OK
>>> > < Content-Type: text/plain
>>> > Content-Type: text/plain
>>> > < Connection: close
>>> > Connection: close
>>> > < Content-Length: 3
>>> > Content-Length: 3
>>> > * Closing connection
>>> > curl: (100) A value or data field grew larger than allowed
>>> > 
>>> > So curl detects the long response header and bails out. Safe and sane.
>>> > 
>>> > However, when I run LongResponseHeaderDemoServer in one terminal and run 
>>> > LongResponseHeaderDemoHttpClient in another terminal, this is what 
>>> > happens:
>>> > 
>>> > $ java LongResponseHeaderDemoHttpClient       
>>> > Exception in thread "main" java.io.IOException: Requested array size 
>>> > exceeds VM limit
>>> > at 
>>> > java.net.http/jdk.internal.net.http.HttpClientImpl.send(HttpClientImpl.java:966)
>>> > at 
>>> > java.net.http/jdk.internal.net.http.HttpClientFacade.send(HttpClientFacade.java:133)
>>> > at 
>>> > LongResponseHeaderDemoHttpClient.main(LongResponseHeaderDemoHttpClient.java:13)
>>> > Caused by: java.lang.OutOfMemoryError: Requested array size exceeds VM 
>>> > limit
>>> > at java.base/java.util.Arrays.copyOf(Arrays.java:3541)
>>> > at 
>>> > java.base/java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:242)
>>> > at 
>>> > java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:806)
>>> > at java.base/java.lang.StringBuilder.append(StringBuilder.java:246)
>>> > at 
>>> > java.net.http/jdk.internal.net.http.Http1HeaderParser.readResumeHeader(Http1HeaderParser.java:250)
>>> > at 
>>> > java.net.http/jdk.internal.net.http.Http1HeaderParser.parse(Http1HeaderParser.java:124)
>>> > at 
>>> > java.net.http/jdk.internal.net.http.Http1Response$HeadersReader.handle(Http1Response.java:605)
>>> > at 
>>> > java.net.http/jdk.internal.net.http.Http1Response$HeadersReader.handle(Http1Response.java:536)
>>> > at 
>>> > java.net.http/jdk.internal.net.http.Http1Response$Receiver.accept(Http1Response.java:527)
>>> > at 
>>> > java.net.http/jdk.internal.net.http.Http1Response$HeadersReader.tryAsyncReceive(Http1Response.java:583)
>>> > at 
>>> > java.net.http/jdk.internal.net.http.Http1AsyncReceiver.flush(Http1AsyncReceiver.java:233)
>>> > at 
>>> > java.net.http/jdk.internal.net.http.Http1AsyncReceiver$$Lambda/0x00000008010dbd50.run(Unknown
>>> >  Source)
>>> > at 
>>> > java.net.http/jdk.internal.net.http.common.SequentialScheduler$LockingRestartableTask.run(SequentialScheduler.java:182)
>>> > at 
>>> > java.net.http/jdk.internal.net.http.common.SequentialScheduler$CompleteRestartableTask.run(SequentialScheduler.java:149)
>>> > at 
>>> > java.net.http/jdk.internal.net.http.common.SequentialScheduler$SchedulableTask.run(SequentialScheduler.java:207)
>>> > at 
>>> > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
>>> > at 
>>> > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
>>> > at java.base/java.lang.Thread.runWith(Thread.java:1596)
>>> > at java.base/java.lang.Thread.run(Thread.java:1583)
>>> > 
>>> > Ostensibly, HttpClient just keeps on reading the never-ending header 
>>> > until it OOMs. This seems to confirm that there is no default limit to 
>>> > header size. It also seems like A Very Bad Thing to me. This suggests 
>>> > that any time a program makes an HTTP request to an untrusted source 
>>> > using HttpClient, for example when crawling the web, they are at risk of 
>>> > an OOM.
>>> > 
>>> > For grins, I also wrote an application 
>>> > LongResponseHeaderDemoHttpURLConnection that does the same thing as 
>>> > LongResponseHeaderDemoHttpClient, just using HttpURLConnection instead of 
>>> > HttpClient. When I run LongResponseHeaderDemoServer in one terminal and 
>>> > LongResponseHeaderDemoHttpURLConnection in another terminal, this is what 
>>> > happens:
>>> > 
>>> > $ java LongResponseHeaderDemoHttpURLConnection
>>> > Exception in thread "main" java.lang.NegativeArraySizeException: 
>>> > -1610612736
>>> > at java.base/sun.net.www.MessageHeader.mergeHeader(MessageHeader.java:526)
>>> > at java.base/sun.net.www.MessageHeader.parseHeader(MessageHeader.java:481)
>>> > at 
>>> > java.base/sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:804)
>>> > at java.base/sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:726)
>>> > at 
>>> > java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1688)
>>> > at 
>>> > java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1589)
>>> > at java.base/java.net.URL.openStream(URL.java:1161)
>>> > at 
>>> > LongResponseHeaderDemoHttpURLConnection.main(LongResponseHeaderDemoHttpURLConnection.java:12)
>>> > 
>>> > So HttpURLConnection doesn't handle things gracefully either, but at 
>>> > least it doesn't OOM. That seems like a bug, too, but perhaps less severe.
>>> > 
>>> > For reference, here's my java version:
>>> > 
>>> > $ java -version
>>> > openjdk version "21.0.2" 2024-01-16 LTS
>>> > OpenJDK Runtime Environment Corretto-21.0.2.13.1 (build 21.0.2+13-LTS)
>>> > OpenJDK 64-Bit Server VM Corretto-21.0.2.13.1 (build 21.0.2+13-LTS, mixed 
>>> > mode, sharing)
>>> > 
>>> > Can anyone check my work, and maybe reproduce? And ideally, can someone 
>>> > with more knowledge than me about java.net.http.HttpClient and/or 
>>> > java.net.HttpURLConnection please comment? Is this real, or have I made a 
>>> > mistake somewhere along the way? If it's real, what's next? A bug report?
>>> > 
>>> > Andy Boothe
>>> > Email: andy.boo...@gmail.com <mailto:andy.boo...@gmail.com>
>>> > Mobile: (979) 574-1089
>>> 
>>> <LongResponseHeaderDemoHttpClient.java><LongResponseHeaderDemoHttpURLConnection.java><LongResponseHeaderDemoServer.java>
>> 
> 

Reply via email to