Hi,

On Tue, Nov 24, 2020 at 5:37 PM Tomalak Geret'kal via curl-library <
curl-library@cool.haxx.se> wrote:

> On 23/11/2020 20:16, James Read via curl-library wrote:
> > I have attempted to make two minimal codes that
> > demonstrate my problem.
> >
> > The first can be
> > downloaded from https://github.com/JamesRead5737/fast
> > <https://github.com/JamesRead5737/fast>
> > It basically recursively downloads http://www.google.com
> > <http://www.google.com>, http://www.yahoo.com
> > <http://www.yahoo.com> and http://www.bing.com
> > <http://www.bing.com>
> > I am able to achieve download speeds of up to 7Gbps with
> > this simple program
> >
> > The second can be downloaded
> > from https://github.com/JamesRead5737/slow
> > <https://github.com/JamesRead5737/slow>
> > The program extends the first program with an asynchronous
> > DNS component and instead of recursively downloading the
> > same URLs over and over again downloads from a list of
> > URLs provided in the http001 file. Full instructions are
> > in the README. What's troubling me is that this second
> > version of the program only achieves average download
> > speed of 16Mbps.
> >
> > I have no idea why this is happening. Shouldn't the second
> > program run just as fast as the first?
> >
> > Any ideas what I'm doing wrong?
>
> That's a lot of code you're asking us to debug.
>
>
Sorry, I've tried my best to produce a minimal reproducer. The code is
largely based on the example at https://curl.se/libcurl/c/ephiperfifo.html


> Have you profiled it?


The fast program produces the following flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 26.72      0.04     0.04        1    40.08   125.25  crawler_init
 23.38      0.08     0.04 11051513     0.00     0.00  event_cb
 23.38      0.11     0.04 11072333     0.00     0.00  check_multi_info
  6.68      0.12     0.01 11083187     0.00     0.00  mcode_or_die
  6.68      0.13     0.01
_curl_easy_getinfo_err_curl_off_t
  3.34      0.14     0.01    21722     0.00     0.00  timer_cb
  3.34      0.14     0.01                             multi_timer_cb
  3.34      0.15     0.01                             write_cb
  0.00      0.15     0.00    24830     0.00     0.00  print_progress
  0.00      0.15     0.00    22447     0.00     0.00  remsock
  0.00      0.15     0.00    10854     0.00     0.00  new_conn
  0.00      0.15     0.00    10854     0.00     0.00  transfers_dec
  0.00      0.15     0.00    10854     0.00     0.00  transfers_inc
  0.00      0.15     0.00     1561     0.00     0.00
 concurrent_connections_dec
  0.00      0.15     0.00     1561     0.00     0.00
 concurrent_connections_inc
  0.00      0.15     0.00     1561     0.00     0.00  setsock
  0.00      0.15     0.00     1224     0.00     0.00  addsock

The slow program produces the following flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 38.51      0.05     0.05        1    50.06   115.13  crawler_init
 15.40      0.07     0.02  6491517     0.00     0.00  check_multi_info
 15.40      0.09     0.02  6479151     0.00     0.00  event_cb
  7.70      0.10     0.01  6500971     0.00     0.00  mcode_or_die
  7.70      0.11     0.01    13729     0.00     0.00  timer_cb
  7.70      0.12     0.01                             multi_timer_cb
  3.85      0.13     0.01    11581     0.00     0.00  remsock
  3.85      0.13     0.01     6041     0.00     0.00  new_body_conn
  0.00      0.13     0.00    31665     0.00     0.00  starts_with
  0.00      0.13     0.00    29448     0.00     0.00  print_progress
  0.00      0.13     0.00     9454     0.00     0.00  get_host_from_url
  0.00      0.13     0.00     9454     0.00     0.00  transfers_dec
  0.00      0.13     0.00     9454     0.00     0.00  transfers_inc
  0.00      0.13     0.00     5270     0.00     0.00
 concurrent_connections_dec
  0.00      0.13     0.00     5270     0.00     0.00
 concurrent_connections_inc
  0.00      0.13     0.00     5270     0.00     0.00  setsock
  0.00      0.13     0.00     4633     0.00     0.00  addsock
  0.00      0.13     0.00     3413     0.00     0.00  new_head_conn
  0.00      0.13     0.00      416     0.00     0.00  parsed_sites_inc




> Have you tried narrowing down the
> problem to a smaller testcase?


I've done my best to cut this down. My program is much larger. If I was to
cut anything else out the epoll wouldn't work and I wouldn't be able to
illustrate the performance problems I'm getting.


> I find it hard to believe
> that these are minimal.
>
> Also, there is no recursion here.
>
>
My mistake. I meant repeatedly. The fast program repeatedly downloads the
same URLs so I guess there is a slight speed up from reusing connections.
But not of the order of magnitude of the problems I am witnessing. I
suspect there may be a problem with the way libcurl handles multiple new
connections but I am hoping there is some kind of mistake my end.

James Read


> Cheers
>
> -------------------------------------------------------------------
> Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
> Etiquette:   https://curl.se/mail/etiquette.html
-------------------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette:   https://curl.se/mail/etiquette.html

Reply via email to