I have found that the OpenJDK net team is very open to receiving patches. Have you filed an issue that has been accepted? This is usually the first step.
> On Jul 29, 2024, at 5:20 PM, Andy Boothe <andy.boo...@gmail.com> wrote: > > First, thank you both for the responses. I know how busy everyone is, and I > really appreciate the time. > > We can talk about use cases and architecture and such, but I think we all > agree that a developer should be able to make an HTTP request with HttpClient > without worrying about whether or not it will cause an OOM. Or, at least, > that whether or not it causes an OOM should be fully within their control. > And that's not where this implementation is right now. > > I’d be very happy to work on a fix for this. Would it be out of order for me > to propose a patch? > > Andy Boothe > Email: andy.boo...@gmail.com <mailto:andy.boo...@gmail.com> > Mobile: (979) 574-1089 > > > On Mon, Jul 29, 2024 at 3:57 PM robert engels <reng...@ix.netcom.com > <mailto:reng...@ix.netcom.com>> wrote: > Yes, but normally you fork a worker process that tracks progress and scrapes > N sites. If the worker process dies processing a site, the site is marked > “bad” and only periodically scraped after a retry/backoff period. > > There are probably a lot of ways to crash a worker process, intentionally or > accidentally - a robust design is called for. > > As an aside, if I was writing a large scale scraper I don’t think I would use > HttpClient anyway - I think a custom url accessor would be easier to monitor, > etc. > >> On Jul 29, 2024, at 3:43 PM, Ethan McCue <et...@mccue.dev >> <mailto:et...@mccue.dev>> wrote: >> >> Scraping of unknown/untrusted websites is a common task in certain...fields? >> I don't want to comment on it too deeply, but I know that is something folks >> would do. >> >> Imagine a site where someone inputs a URL, clicks submit, and then with the >> power of funding they return a summary of the page. >> >> On Mon, Jul 29, 2024, 3:52 PM robert engels <reng...@ix.netcom.com >> <mailto:reng...@ix.netcom.com>> wrote: >> Isn’t the HttpClient almost always used to access other services? >> >> Why would a developer access a malicious service? >> >> I also think there are lots of ways for a service to crash the client - .e.g >> it could attempt to return a very large response - if the client uses a >> memory buffered reader, it will cause an OOM as well. >> >>> On Jul 29, 2024, at 2:42 PM, Andy Boothe <andy.boo...@gmail.com >>> <mailto:andy.boo...@gmail.com>> wrote: >>> >>> Following up here. >>> >>> I believe I have discovered that it is possible to craft a malicious HTTP >>> response that can cause the built-in HttpURLConnection and HttpClient >>> implementations to throw exceptions. Specifically, HttpURLConnection can be >>> made to throw a NegativeArraySizeException, and HttpClient can be made to >>> throw an OutOfMemoryError. Proof of this behavior is in the attached (very >>> simple) Java programs. >>> >>> This seems like A Bad Thing to me. >>> >>> I've moved from the dev list to this list based on a recommendation from >>> that list. Is this the right list? If not, can you point me in the right >>> direction? Perhaps a security list? >>> >>> Thank you, >>> >>> Andy Boothe >>> Email: andy.boo...@gmail.com <mailto:andy.boo...@gmail.com> >>> Mobile: (979) 574-1089 >>> On Wed, Jul 24, 2024 at 4:47 PM Andy Boothe <andy.boo...@gmail.com >>> <mailto:andy.boo...@gmail.com>> wrote: >>> Hello, >>> >>> I'm moving this thread from jdk-dev to this list on the sage advice of >>> Pavel Rappo. >>> >>> As a brief recap, it looks like HttpClient and HttpURLConnection do not >>> currently support a way to set the maximum acceptable response header >>> length. As a result, sending HTTP requests with these classes that result >>> in a response with very long headers causes an OutOfMemoryError and a >>> NegativeArraySizeException, respectively. (Simple programs for reproducing >>> the issue are attached.) This seems like A Bad Thing. There is a (very >>> brief) discussion in the thread about how to handle, but of course you guys >>> are the experts. >>> >>> If my head is on straight and this turns out to be a real issue as opposed >>> to a mistake on my part, I'm keen to help however I can. >>> >>> Andy Boothe >>> Email: andy.boo...@gmail.com <mailto:andy.boo...@gmail.com> >>> Mobile: (979) 574-1089 >>> >>> >>> ---------- Forwarded message --------- >>> From: Pavel Rappo <pavel.ra...@oracle.com <mailto:pavel.ra...@oracle.com>> >>> Date: Wed, Jul 24, 2024 at 4:30 PM >>> Subject: Re: Very long response headers and java.net.http.HttpClient? >>> To: Andy Boothe <andy.boo...@gmail.com <mailto:andy.boo...@gmail.com>> >>> Cc: jdk-...@openjdk.org <mailto:jdk-...@openjdk.org> <jdk-...@openjdk.org >>> <mailto:jdk-...@openjdk.org>> >>> >>> >>> A proper list would be net-dev at openjdk.java.net >>> <http://openjdk.java.net/>. >>> >>> > On 24 Jul 2024, at 21:13, Andy Boothe <andy.boo...@gmail.com >>> > <mailto:andy.boo...@gmail.com>> wrote: >>> > >>> > Hello, >>> > >>> > I'm documenting some guidelines for using java.net.http.HttpClient >>> > defensively for my team. For example: "Always set a request timeout", >>> > "Don't assume HTTP response entities are small and/or will fit in >>> > memory", etc. >>> > >>> > One guideline I'd like to document is "Set a maximum for HTTP response >>> > header size." However, I can't seem to find a way to set that limit, >>> > either in documentation or in OpenJDK code. >>> > >>> > I tried my best to search the archives for this mailing list for any >>> > mentions, but came up empty. >>> > >>> > To make sure my head is on straight and there isn't an undocumented limit >>> > set by default, I wrote the attached (very quick and dirty) client and >>> > server programs. LongResponseHeaderDemoServer opens a raw server socket >>> > and reads (what it assumes is) a well-formed HTTP request, and then >>> > prints an HTTP response which includes a response header of infinite >>> > length. LongResponseHeaderDemoHttpClient uses java.net.http.HttpClient to >>> > make a request and print the response body. >>> > >>> > When I run LongResponseHeaderDemoServer in one terminal and make a curl >>> > request to the server in another terminal, this is what curl spits out: >>> > >>> > $ curl -vvv -D - http://localhost:3000 <http://localhost:3000/> >>> > * Host localhost:3000 was resolved. >>> > * IPv6: ::1 >>> > * IPv4: 127.0.0.1 >>> > * Trying [::1]:3000... >>> > * Connected to localhost (::1) port 3000 >>> > > GET / HTTP/1.1 >>> > > Host: localhost:3000 >>> > > User-Agent: curl/8.6.0 >>> > > Accept: */* >>> > > >>> > < HTTP/1.1 200 OK >>> > HTTP/1.1 200 OK >>> > < Content-Type: text/plain >>> > Content-Type: text/plain >>> > < Connection: close >>> > Connection: close >>> > < Content-Length: 3 >>> > Content-Length: 3 >>> > * Closing connection >>> > curl: (100) A value or data field grew larger than allowed >>> > >>> > So curl detects the long response header and bails out. Safe and sane. >>> > >>> > However, when I run LongResponseHeaderDemoServer in one terminal and run >>> > LongResponseHeaderDemoHttpClient in another terminal, this is what >>> > happens: >>> > >>> > $ java LongResponseHeaderDemoHttpClient >>> > Exception in thread "main" java.io.IOException: Requested array size >>> > exceeds VM limit >>> > at >>> > java.net.http/jdk.internal.net.http.HttpClientImpl.send(HttpClientImpl.java:966) >>> > at >>> > java.net.http/jdk.internal.net.http.HttpClientFacade.send(HttpClientFacade.java:133) >>> > at >>> > LongResponseHeaderDemoHttpClient.main(LongResponseHeaderDemoHttpClient.java:13) >>> > Caused by: java.lang.OutOfMemoryError: Requested array size exceeds VM >>> > limit >>> > at java.base/java.util.Arrays.copyOf(Arrays.java:3541) >>> > at >>> > java.base/java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:242) >>> > at >>> > java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:806) >>> > at java.base/java.lang.StringBuilder.append(StringBuilder.java:246) >>> > at >>> > java.net.http/jdk.internal.net.http.Http1HeaderParser.readResumeHeader(Http1HeaderParser.java:250) >>> > at >>> > java.net.http/jdk.internal.net.http.Http1HeaderParser.parse(Http1HeaderParser.java:124) >>> > at >>> > java.net.http/jdk.internal.net.http.Http1Response$HeadersReader.handle(Http1Response.java:605) >>> > at >>> > java.net.http/jdk.internal.net.http.Http1Response$HeadersReader.handle(Http1Response.java:536) >>> > at >>> > java.net.http/jdk.internal.net.http.Http1Response$Receiver.accept(Http1Response.java:527) >>> > at >>> > java.net.http/jdk.internal.net.http.Http1Response$HeadersReader.tryAsyncReceive(Http1Response.java:583) >>> > at >>> > java.net.http/jdk.internal.net.http.Http1AsyncReceiver.flush(Http1AsyncReceiver.java:233) >>> > at >>> > java.net.http/jdk.internal.net.http.Http1AsyncReceiver$$Lambda/0x00000008010dbd50.run(Unknown >>> > Source) >>> > at >>> > java.net.http/jdk.internal.net.http.common.SequentialScheduler$LockingRestartableTask.run(SequentialScheduler.java:182) >>> > at >>> > java.net.http/jdk.internal.net.http.common.SequentialScheduler$CompleteRestartableTask.run(SequentialScheduler.java:149) >>> > at >>> > java.net.http/jdk.internal.net.http.common.SequentialScheduler$SchedulableTask.run(SequentialScheduler.java:207) >>> > at >>> > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) >>> > at >>> > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) >>> > at java.base/java.lang.Thread.runWith(Thread.java:1596) >>> > at java.base/java.lang.Thread.run(Thread.java:1583) >>> > >>> > Ostensibly, HttpClient just keeps on reading the never-ending header >>> > until it OOMs. This seems to confirm that there is no default limit to >>> > header size. It also seems like A Very Bad Thing to me. This suggests >>> > that any time a program makes an HTTP request to an untrusted source >>> > using HttpClient, for example when crawling the web, they are at risk of >>> > an OOM. >>> > >>> > For grins, I also wrote an application >>> > LongResponseHeaderDemoHttpURLConnection that does the same thing as >>> > LongResponseHeaderDemoHttpClient, just using HttpURLConnection instead of >>> > HttpClient. When I run LongResponseHeaderDemoServer in one terminal and >>> > LongResponseHeaderDemoHttpURLConnection in another terminal, this is what >>> > happens: >>> > >>> > $ java LongResponseHeaderDemoHttpURLConnection >>> > Exception in thread "main" java.lang.NegativeArraySizeException: >>> > -1610612736 >>> > at java.base/sun.net.www.MessageHeader.mergeHeader(MessageHeader.java:526) >>> > at java.base/sun.net.www.MessageHeader.parseHeader(MessageHeader.java:481) >>> > at >>> > java.base/sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:804) >>> > at java.base/sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:726) >>> > at >>> > java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1688) >>> > at >>> > java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1589) >>> > at java.base/java.net.URL.openStream(URL.java:1161) >>> > at >>> > LongResponseHeaderDemoHttpURLConnection.main(LongResponseHeaderDemoHttpURLConnection.java:12) >>> > >>> > So HttpURLConnection doesn't handle things gracefully either, but at >>> > least it doesn't OOM. That seems like a bug, too, but perhaps less severe. >>> > >>> > For reference, here's my java version: >>> > >>> > $ java -version >>> > openjdk version "21.0.2" 2024-01-16 LTS >>> > OpenJDK Runtime Environment Corretto-21.0.2.13.1 (build 21.0.2+13-LTS) >>> > OpenJDK 64-Bit Server VM Corretto-21.0.2.13.1 (build 21.0.2+13-LTS, mixed >>> > mode, sharing) >>> > >>> > Can anyone check my work, and maybe reproduce? And ideally, can someone >>> > with more knowledge than me about java.net.http.HttpClient and/or >>> > java.net.HttpURLConnection please comment? Is this real, or have I made a >>> > mistake somewhere along the way? If it's real, what's next? A bug report? >>> > >>> > Andy Boothe >>> > Email: andy.boo...@gmail.com <mailto:andy.boo...@gmail.com> >>> > Mobile: (979) 574-1089 >>> >>> <LongResponseHeaderDemoHttpClient.java><LongResponseHeaderDemoHttpURLConnection.java><LongResponseHeaderDemoServer.java> >> >