Hi, On Tue, Nov 24, 2020 at 5:37 PM Tomalak Geret'kal via curl-library < curl-library@cool.haxx.se> wrote:
> On 23/11/2020 20:16, James Read via curl-library wrote: > > I have attempted to make two minimal codes that > > demonstrate my problem. > > > > The first can be > > downloaded from https://github.com/JamesRead5737/fast > > <https://github.com/JamesRead5737/fast> > > It basically recursively downloads http://www.google.com > > <http://www.google.com>, http://www.yahoo.com > > <http://www.yahoo.com> and http://www.bing.com > > <http://www.bing.com> > > I am able to achieve download speeds of up to 7Gbps with > > this simple program > > > > The second can be downloaded > > from https://github.com/JamesRead5737/slow > > <https://github.com/JamesRead5737/slow> > > The program extends the first program with an asynchronous > > DNS component and instead of recursively downloading the > > same URLs over and over again downloads from a list of > > URLs provided in the http001 file. Full instructions are > > in the README. What's troubling me is that this second > > version of the program only achieves average download > > speed of 16Mbps. > > > > I have no idea why this is happening. Shouldn't the second > > program run just as fast as the first? > > > > Any ideas what I'm doing wrong? > > That's a lot of code you're asking us to debug. > > Sorry, I've tried my best to produce a minimal reproducer. The code is largely based on the example at https://curl.se/libcurl/c/ephiperfifo.html > Have you profiled it? The fast program produces the following flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 26.72 0.04 0.04 1 40.08 125.25 crawler_init 23.38 0.08 0.04 11051513 0.00 0.00 event_cb 23.38 0.11 0.04 11072333 0.00 0.00 check_multi_info 6.68 0.12 0.01 11083187 0.00 0.00 mcode_or_die 6.68 0.13 0.01 _curl_easy_getinfo_err_curl_off_t 3.34 0.14 0.01 21722 0.00 0.00 timer_cb 3.34 0.14 0.01 multi_timer_cb 3.34 0.15 0.01 write_cb 0.00 0.15 0.00 24830 0.00 0.00 print_progress 0.00 0.15 0.00 22447 0.00 0.00 remsock 0.00 0.15 0.00 10854 0.00 0.00 new_conn 0.00 0.15 0.00 10854 0.00 0.00 transfers_dec 0.00 0.15 0.00 10854 0.00 0.00 transfers_inc 0.00 0.15 0.00 1561 0.00 0.00 concurrent_connections_dec 0.00 0.15 0.00 1561 0.00 0.00 concurrent_connections_inc 0.00 0.15 0.00 1561 0.00 0.00 setsock 0.00 0.15 0.00 1224 0.00 0.00 addsock The slow program produces the following flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 38.51 0.05 0.05 1 50.06 115.13 crawler_init 15.40 0.07 0.02 6491517 0.00 0.00 check_multi_info 15.40 0.09 0.02 6479151 0.00 0.00 event_cb 7.70 0.10 0.01 6500971 0.00 0.00 mcode_or_die 7.70 0.11 0.01 13729 0.00 0.00 timer_cb 7.70 0.12 0.01 multi_timer_cb 3.85 0.13 0.01 11581 0.00 0.00 remsock 3.85 0.13 0.01 6041 0.00 0.00 new_body_conn 0.00 0.13 0.00 31665 0.00 0.00 starts_with 0.00 0.13 0.00 29448 0.00 0.00 print_progress 0.00 0.13 0.00 9454 0.00 0.00 get_host_from_url 0.00 0.13 0.00 9454 0.00 0.00 transfers_dec 0.00 0.13 0.00 9454 0.00 0.00 transfers_inc 0.00 0.13 0.00 5270 0.00 0.00 concurrent_connections_dec 0.00 0.13 0.00 5270 0.00 0.00 concurrent_connections_inc 0.00 0.13 0.00 5270 0.00 0.00 setsock 0.00 0.13 0.00 4633 0.00 0.00 addsock 0.00 0.13 0.00 3413 0.00 0.00 new_head_conn 0.00 0.13 0.00 416 0.00 0.00 parsed_sites_inc > Have you tried narrowing down the > problem to a smaller testcase? I've done my best to cut this down. My program is much larger. If I was to cut anything else out the epoll wouldn't work and I wouldn't be able to illustrate the performance problems I'm getting. > I find it hard to believe > that these are minimal. > > Also, there is no recursion here. > > My mistake. I meant repeatedly. The fast program repeatedly downloads the same URLs so I guess there is a slight speed up from reusing connections. But not of the order of magnitude of the problems I am witnessing. I suspect there may be a problem with the way libcurl handles multiple new connections but I am hoping there is some kind of mistake my end. James Read > Cheers > > ------------------------------------------------------------------- > Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library > Etiquette: https://curl.se/mail/etiquette.html
------------------------------------------------------------------- Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library Etiquette: https://curl.se/mail/etiquette.html