Hi libcurl devs, I'm writing an application that uses libcurl, and I have no prior expertise with HTTP, so I'd like to make sure I got things right.
I'm working on the internal http client of casync [1]. This client is simple, it basically has a list of files to download, and its job is to download it efficiently. We're talking about small chunks of data (around 64KB), but the list is possibly huge (60,000 chunks is very possible). And we talk to only ONE server. Since we live in a modern world, I explicitly enable `CURL_HTTP_VERSION_2_0` and `CURLPIPE_MULTIPLEX`, and I assume that the server supports it. In a **first implementation**, I just create a curl easy handle for each chunk I need to download (so, possibly 60k easy handles), add it to the curl multi, and then I let curl deal with it. I also make sure to set `CURLMOPT_MAX_TOTAL_CONNECTIONS` to ensure that the whole thing doesn't go crazy (I used 64 at first, but after more reading I wonder if I should lower that to 8). It works good this way. Even too good. My issue then, during local tests (with both client and server on my machine), is that the client isn't fast enough to handle all the incoming chunks. Indeed, the client needs to give the chunks to another co-process, through a custom IPC, and this proved to be the bottleneck. So what happened is that all my chunks were downloaded very quickly, and then sat in RAM until the client had time to forward it to its co-process. Even though it works, it possibly uses a lot of RAM and it's not nice. Of course, this doesn't happen in "real-life", when the server is away and the latency is higher. Then the client has time to handle the chunks, and everything works beautifully. I didn't find a way to to tell libcurl to pause or slow down in case things go too fast, so I went for a **second implementation**, slightly different. I decided that instead of creating one easy handle per chunk request and feed it all to the curl multi handle, I would only create a small number of easy handles (let's say 8) and give it to curl multi. Only when a chunk is downloaded and handled by the client, then I re-use the easy handle (ie. remove it from the multi handle, set a new URL, and give it back to the curl multi for processing). This implementation works good as well. Now, I take a bit of time to think, and I wonder if this second implementation is really the smart thing to do. More precisely: by feeding handles one by one (even though we might have 8 active handles in curl multi at the same time), do I prevent internal optimization within libcurl? How can libcurl multiplex efficiently if I don't tell it in advance the list of chunks I want to download? So basically, I think that my first implementation was better than the second one, can you agree or disagree, based on your knowledge of libcurl internals? I also take this chance to ask a second question, out of curiosity: with HTTP/2 multiplex enabled, will libcurl also attempt to open concurrent connections, and do multiplex on all these connections? Or does it stick to one connection? Thanks! Arnaud ---- [1]: http://0pointer.net/blog/casync-a-tool-for-distributing-file-system-images.html ------------------------------------------------------------------- Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library Etiquette: https://curl.haxx.se/mail/etiquette.html