Hello!

While working on a piece of PHP software which uses curl functions, I have 
reached a point where I need some feedback. Either I have found a problem in 
libcurl, or I am not using it the right way, and need a bit of advice. I am 
writing to _this_ list, because the issue is within libcurl (or how it is 
used), and not limited to curl functions in PHP.

I have seen the behavior with curl 7.64.1 on Linux and macOS (curl from 
Homebrew), in a program linked to libcurl directly and in a PHP script which 
uses the same logic.

What I am trying to do: I would like to use libcurl to download a resource with 
HTTP. The resource is reached after an HTTP redirect. During the download of 
the target resource, I like to limit the number of bytes downloaded. That is, I 
like to abort the connection from a progress function when a certain number of 
downloaded HTTP payload bytes has been reached. CURLOPT_XFERINFOFUNCTION is 
used, and CURLOPT_NOPROGRESS is set to 0L.

I am using a local web server which returns an HTTP 302 response for a GET 
request to the original URI. This response has a large payload (more than 5 kB) 
– it may be a loooong message telling users about the redirection. It includes 
a Location header field with a URI reference to the location of the target 
resource. When the server receives an HTTP GET request for the _target_ 
resource, it responds with HTTP 200 and a document which is also larger than 5 
kB. (Yes, I know about the libcurl receive buffer and its default size, but am 
under the impression that the problem is unrelated.)

For testing, I am simply using docs/examples/progressfunc.c, which I have 
modified a tiny bit: I am adding the following line:

    curl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION, 1L);

A full patch against curl-7_64_1 is attached, but I doubt the rest of my 
changes matters much. I have tried with different download size limits 
(STOP_DOWNLOAD_AFTER_THIS_MANY_BYTES).

My testing shows that libcurl does not follow the redirection, but stops 
processing entirely when the progress function returns nonzero. Well, we can 
argue that this is OK, of course. If the message body in the first HTTP 
response is small enough, libcurl continues and everything works as intended.

But for now I can not understand the necessity of reading the HTTP response 
body for a 302 response _at all_(*) when there is a usable Location header 
field. My current stance is that, if CURLOPT_FOLLOWLOCATION is enabled, libcurl 
should follow the redirection as soon as it has reached the line separating 
header fields from the message-body, that is, when header processing has ended. 
Does that sound reasonable?

Or am I using the wrong means for accomplishing my goal described above? Do I 
need to implement redirection following myself?

I have not tried it yet, but could think of using a header callback function 
which monitors the headers for 3xx responses and for Location header fields. As 
soon as _both_ of them have be seen, I could set a flag which modifies the 
progress function logic until a new HTTP response is seen (which starts with 
"HTTP/"). This will still download the first big response payload entirely, 
won’t it?

(*) Having thought about it a little more – is this behavior related to 
keep-alive connections and not wanting to close them/open new ones?

Cheers
-- 
Nico

Nicolas Roeser
kiz – Information Systems Department, Ulm University
>From dcd20097467a33770fd73a07451b22d76f7dec30 Mon Sep 17 00:00:00 2001
From: Nicolas Roeser <nicolas.roe...@uni-ulm.de>
Date: Sun, 7 Apr 2019 22:24:38 +0200
Subject: Redirection following in progressfunction test

It seems that if an xferfunction is used *and* CURLOPT_FOLLOWLOCATION is
enabled, the redirection is not taken after the 'Location:' header field
has been seen, but after the redirect*ing* document has been read. This
behavior is not in line with libcurl returning the redirected-*to*
document (instead of the redirect*ing* one or everything) on redirects.

The latter behavior is fine (see also
<https://curl.haxx.se/mail/lib-2014-08/0171.html>), but the former is
not.
---
 docs/examples/progressfunc.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/docs/examples/progressfunc.c b/docs/examples/progressfunc.c
index 86ad0d9ca..8bb17d886 100644
--- a/docs/examples/progressfunc.c
+++ b/docs/examples/progressfunc.c
@@ -42,7 +42,7 @@
 #define MINIMAL_PROGRESS_FUNCTIONALITY_INTERVAL     3
 #endif
 
-#define STOP_DOWNLOAD_AFTER_THIS_MANY_BYTES         6000
+#define STOP_DOWNLOAD_AFTER_THIS_MANY_BYTES         5000
 
 struct myprogress {
   TIMETYPE lastruntime; /* type depends on version, see above */
@@ -108,7 +108,7 @@ int main(void)
     prog.lastruntime = 0;
     prog.curl = curl;
 
-    curl_easy_setopt(curl, CURLOPT_URL, "https://example.com/";);
+    curl_easy_setopt(curl, CURLOPT_URL, "http://127.0.0.1:8102/reallylong";);
 
 #if LIBCURL_VERSION_NUM >= 0x072000
     /* xferinfo was introduced in 7.32.0, no earlier libcurl versions will
@@ -132,6 +132,8 @@ int main(void)
 #endif
 
     curl_easy_setopt(curl, CURLOPT_NOPROGRESS, 0L);
+    curl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION, 1L);
+    //curl_easy_setopt(curl, CURLOPT_HEADER, 1L);
     res = curl_easy_perform(curl);
 
     if(res != CURLE_OK)
-- 
2.21.0

-------------------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette:   https://curl.haxx.se/mail/etiquette.html

Reply via email to