Hello again!

I am still writing my code in PHP, but looking at the code of the PHP curl extension, I have found my question to be a general libcurl question; this is why I am writing to this list.

My code uses a write callback function, a header callback function, and a progress callback function. The latter may cause a download to be canceled (see my earlier question in thread <mid:20190407213813.gd11...@imap.uni-ulm.de>). I have enabled CURLOPT_FOLLOWLOCATION, and CURLOPT_HEADER. (The code also enables CURLOPT_RETURNTRANSFER, but that is specific to the curl extension of PHP, and merely causes the internal buffer to be output if there is no error.) The write callback function appends the received data to a dynamic buffer.

I would like to parse the received data (or the first part of it) even if the download has been aborted.

My problem is that I do not know where the boundary between header and body is if the download has been aborted. To make things worse, I have the feeling that it may be difficult to properly detect.

With the prerequisites listed above, consider the following scenario:
1. client sends a HTTP GET request to the server,
2. server responds with 3xx, Location header field, no Content-Length, and a body with chunked transfer coding,
3. client reads the chunked body and then follows the redirection,
4. server responds with 200 and sends a huge document (which _might_ contain parts that look like message/http content 😉), 5. client starts reading the resource, but aborts after a certain amount of bytes.

I would like to clear the receive buffer each time the client starts reading a new resource. But I am not sure when this can safely be done. From the man pages for CURLOPT_WRITEFUNCTION and CURLOPT_HEADERFUNCTION, I can see that while the header callback function is called once per header line (to simplify their handling), the write callback function may be called with big blocks of data. So I assume that it is _not_ safe to clear the receive buffer as soon as I see an HTTP status-line.

Guessing the start of the document is not an option of course.

I first thought that I might disable CURLOPT_HEADER and handle some headers differently from what is done now. But this seems not to help with my problem of identifying when to clear my receive buffer as long as CURLOPT_FOLLOWLOCATION is on.

Now I am a bit lost, and assume that I am missing something here. This is why I would like to ask for help:

How can I extract the body of my target resource which has been partially received? Are the man pages or my interpretation of them too strict? Do I need to switch to a completely different approach?

I have a feeling that the write callback function will never be called with data from two HTTP responses at once (that is, will never cross redirections). Is my guess correct? If yes, is this guaranteed/will this stay?

Cheers
--
Nico

Nicolas Roeser
kiz – Information Systems Department, Ulm University
-------------------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette:   https://curl.haxx.se/mail/etiquette.html

Reply via email to