Re: CURLOPT_READFUNCTION performance issue

Daniel Stenberg Tue, 23 Jul 2013 11:21:08 -0700

On Mon, 22 Jul 2013, Michael Dowling wrote:

I've noticed that there appears to be a significant performance hit whenusing CURLOPT_READFUNCTION. This issue seems to be platform dependent asI've only been able to get poor performance on Linux (Amazon Linux m1.large64-bit ami-0358ce33) across multiple versions of cURL and PHP. I've not seenany performance issues on my Mac running PHP 5.3.15 and cURL 7.21.4.


Hi Michael,

Thanks for your email and detailed report. I have some troubles to sort it allout, and the many levels of different software with unknown behaviours doesn'treally make things easier. Let me start out with a bunch of questions...

So this version pair on Linux has problems while not on Mac? And if you runanother version set on your Mac, you get the perceived problems?


Which versions on Mac are fine ?

When sending PUT requests containing a 10 byte body (testing123) to a node.js
server (others have reported issues with Jetty as well) using
CURLOPT_READFUNCTION, the read and write times returned from
CURLINFO_SPEED_UPLOAD and CURLINFO_SPEED_DOWNLOAD are very poor: ~833 upload and
1333 download.

Doing transfer performance measurements on 10 bytes is going to be very shakyand unreliable. Send 10 million bytes or something and you can start gettingsomething to measure!

Also, I'm not convinced both ways will count the numbers exactly the sameinternally since the postfields approach will send the body as part of theinitial request send.


I suggest you use an external measuring method!

I wrote a very simple test script that demonstrates the performance issue:https://gist.github.com/mtdowling/6059009. You'll need to have a node.jsserver running to handle the requests. I've written up a simple bash scriptthat will install PHP, node.js, start the test server, and run theperformance test: https://gist.github.com/anonymous/6059035.

I would really prefer to have a test case without anything at all requiredthan just a libcurl-using appliction in the client side. I don't want PHP inthere, it makes my life far too complicated and things are much harder tofollow. For the server side, we can just send it to whatever that can just eatwhat we send to it.

Thinking that this might be an issue with a specific version of cURL or PHP,I manually compiled different versions of PHP and cURL and ran theperformance tests. There was no improvement using the version combination Ihad success with on my mac or using the latest version of cURL (7.31) andPHP (5.5.1). This does not appear to be version dependent. Here are theresults of that testing:https://github.com/guzzle/guzzle/issues/349#issuecomment-21284834

For the plain HTTP (without SSL) POST case, there's basically no differencebetween the Mac and the Linux version. They run the same code. But if you sawa machine specific difference, then surely you'd see the same differences evenwhen you run other versions.

I ran strace on the PHP script and found that using CURLOPT_POSTFIELDSappears to send the headers and the entire payload before receiving anythingfrom the server, while CURLOPT_READFUNCTION appears to send the requestheaders, receive the response headers, then sends the body afterwards.

Yes, and that seems quite natural to me. If you send a small POST withPOSTFIELDS, you will then get away with less system calls and less checking onthe socket as everything is sent off in a single go.

When using the callback approach, we don't have the data around so it has tobe split up in multiple writes.

The loop used to execute the curl_multi handles is very simple and can befound in the test script athttps://gist.github.com/mtdowling/6059009#file-readfuction_perf-php-L5.

It isn't exactly following best practices when it comes to using libcurl's APIbut I doubt it matters a lot in your case. (It is written to use olderlibcurl, and it has no timeout in the curl_multi_select use and ifcurl_multi_exec returns something else than OK it'll busy-loop etc.)

I converted your test case to plain C and used the plain easy API instead [1],and then I had it send the POST 10000 times and measured how long time it tookon my old laptop, sending the data to the curl test suite's HTTP server (whichcertainly isn't in any way a fast server implementation). The response to therequest is very small, just a bunch of headers and a couple of bytes of body.


My results contradict your results quite significantly:

$ time ./debugit

real    0m9.412s
user    0m1.752s
sys     0m1.732s

$ time ./debugit 1
runs fixed string version

real    0m9.457s
user    0m1.528s
sys     0m1.712s

Roughly 1000 requests per second with both solutions.

This test ran on a dual-core 1.83GHz thing, Linux kernel 3.9.8 in 32bit mode.

curl -V:

curl 7.32.0-DEV (i686-pc-linux-gnu) libcurl/7.32.0-DEV OpenSSL/1.0.1ezlib/1.2.8 c-ares/1.9.2-DEV libidn/1.25 libssh2/1.4.3_DEV librtmp/2.3Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3pop3s rtmp rtsp scp sftp smtp smtps telnet tftpFeatures: AsynchDNS Debug TrackMemory IDN IPv6 Largefile NTLM NTLM_WB SSL libzTLS-SRP


Can you modify this test code to show the differences you saw?

[1] = I chose the easy interface just out of laziness since it made it have towrite less code, we can of course make it use the multi API instead to make itmimic your code even closer - but I seriously doubt it will make anyperformance difference.


--

 / daniel.haxx.se

#include <stdio.h>
#include <string.h>
#include <curl/curl.h>

const char data[]="0123456789";

struct WriteThis {
  const char *readptr;
  long sizeleft;
};

static size_t
write_callback(void *contents, size_t size, size_t nmemb, void *userp)
{
  return size*nmemb;
}

static size_t read_callback(void *ptr, size_t size, size_t nmemb, void *userp)
{
  struct WriteThis *pooh = (struct WriteThis *)userp;

  if(size*nmemb < sizeof(data))
    return 0;

  if(pooh->sizeleft) {
    memcpy(ptr, data, sizeof(data));
    pooh->sizeleft -= 10;
    return 10;
  }

  return 0;                          /* no more data left to deliver */
}

static int runonce(CURL *curl,
                   struct WriteThis *p)
{
  CURLcode res;

  p->sizeleft = (long)strlen(data);

  /* First set the URL that is about to receive our POST. */
  curl_easy_setopt(curl, CURLOPT_URL, "http://127.0.0.1:8999/1";);

  /* Now specify we want to POST data */
  curl_easy_setopt(curl, CURLOPT_POST, 1L);

  /* we want to use our own read function */
  curl_easy_setopt(curl, CURLOPT_READFUNCTION, read_callback);

  curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, write_callback);

  /* pointer to pass to our read function */
  curl_easy_setopt(curl, CURLOPT_READDATA, p);

  /* get verbose debug output please */
  curl_easy_setopt(curl, CURLOPT_VERBOSE, 0L);

  /* Set the expected POST size. If you want to POST large amounts of data,
     consider CURLOPT_POSTFIELDSIZE_LARGE */
  curl_easy_setopt(curl, CURLOPT_POSTFIELDSIZE, p->sizeleft);
  
  /*
    Using POST with HTTP 1.1 implies the use of a "Expect: 100-continue"
    header.  You can disable this header with CURLOPT_HTTPHEADER as usual.
    NOTE: if you want chunked transfer too, you need to combine these two
    since you can only set one list of headers with CURLOPT_HTTPHEADER. */
  
  /* A less good option would be to enforce HTTP 1.0, but that might also
     have other implications. */
  {
    struct curl_slist *chunk = NULL;

    chunk = curl_slist_append(chunk, "Expect:");
    res = curl_easy_setopt(curl, CURLOPT_HTTPHEADER, chunk);
    /* use curl_slist_free_all() after the *perform() call to free this
       list again */
  }

  /* Perform the request, res will get the return code */
  res = curl_easy_perform(curl);
  /* Check for errors */
  if(res != CURLE_OK)
    fprintf(stderr, "curl_easy_perform() failed: %s\n",
            curl_easy_strerror(res));
}


static int runfixed(CURL *curl,
                    struct WriteThis *p)
{
  CURLcode res;

  /* First set the URL that is about to receive our POST. */
  curl_easy_setopt(curl, CURLOPT_URL, "http://127.0.0.1:8999/1";);

  curl_easy_setopt(curl, CURLOPT_POSTFIELDS, "0123456789");

  curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, write_callback);

  /* get verbose debug output please */
  curl_easy_setopt(curl, CURLOPT_VERBOSE, 0L);

  /* Set the expected POST size. If you want to POST large amounts of data,
     consider CURLOPT_POSTFIELDSIZE_LARGE */
  curl_easy_setopt(curl, CURLOPT_POSTFIELDSIZE, 10);
  
  /*
    Using POST with HTTP 1.1 implies the use of a "Expect: 100-continue"
    header.  You can disable this header with CURLOPT_HTTPHEADER as usual.
    NOTE: if you want chunked transfer too, you need to combine these two
    since you can only set one list of headers with CURLOPT_HTTPHEADER. */
  
  /* A less good option would be to enforce HTTP 1.0, but that might also
     have other implications. */
  {
    struct curl_slist *chunk = NULL;

    chunk = curl_slist_append(chunk, "Expect:");
    res = curl_easy_setopt(curl, CURLOPT_HTTPHEADER, chunk);
    /* use curl_slist_free_all() after the *perform() call to free this
       list again */
  }

  /* Perform the request, res will get the return code */
  res = curl_easy_perform(curl);
  /* Check for errors */
  if(res != CURLE_OK)
    fprintf(stderr, "curl_easy_perform() failed: %s\n",
            curl_easy_strerror(res));
}

#define LOOPS 10000

int main(int argc, char **argv)
{
  CURL *curl;
  CURLcode res;
  struct WriteThis pooh;
  int alt=0;

  if(argc > 1) {
    alt = 1;
    printf("runs fixed string version\n");
  }

  pooh.readptr = data;

  /* In windows, this will init the winsock stuff */
  res = curl_global_init(CURL_GLOBAL_DEFAULT);
  /* Check for errors */
  if(res != CURLE_OK) {
    fprintf(stderr, "curl_global_init() failed: %s\n",
            curl_easy_strerror(res));
    return 1;
  }

  /* get a curl handle */
  curl = curl_easy_init();
  if(curl) {
    int i;
    if(alt) {
      for(i=0; i< LOOPS; i++)
        runfixed(curl, &pooh);
    }
    else {
      for(i=0; i< LOOPS; i++)
        runonce(curl, &pooh);
    }

    /* always cleanup */
    curl_easy_cleanup(curl);
  }
  curl_global_cleanup();
  return 0;
}

-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html

Re: CURLOPT_READFUNCTION performance issue

Reply via email to