So, it looks to me like 1 million URLS will spin up 1 million goroutines. That seems unnecessary, and likely to be a problem. I usually default to the simplest model, unless there is a reason to do otherwise. IMHO, here that would be to spin up as many goroutines as you want to be working at once (500), then feed data through a channel. Untested pseudo-code: func fetchAll(ctx context.Context) { workerCount := min(500, len(urls)) c := make(chan string)
// Spin up workers for i := 0; i < workerCount; i++ { go doWork(ctx, c) } // Feed them work for _, u := range urls { if u == "" { continue } _, err := url.Parse(u) if err != nil { log.Printf("%s returned an error- %v", u, err) continue } c <- u } // All work sent, signal workers to die. Note, workers may still be // processing at this point. close(c) } func doWork(ctx context.Context, c chan string) { for { u := <-c if u == "" { // channel was closed ... no more work. return } fetch(ctx, u) } } func fetch(ctx context.Context, u string) { req, err := http.NewRequest(http.MethodGet, u, nil) if err != nil { log.Printf("%s returned an error while creating a request- %v", u, err) return } req = req.WithContext(ctx) res, err := http.DefaultClient.Do(req) if err != nil { log.Printf("%s returned an error while performing a request - %v", u, err) return } //Close response body as soon as function returns to prevent resource //leakage. https://golang.org/pkg/net/http/#Response defer res.Body.Close() } I find this basic model simple and easy top reason about. And it should avoid wasted goroutines. I have not really paid much attention to "fetch". I assume it was already correct code. I'm not sure if you are expecting to handle a cancel on the Context. If so, then you will want to add a check for that in your main fetchAll() loop, and bail early if it was canceled. On Saturday, May 26, 2018 at 11:48:28 AM UTC-4, Karthik Rao wrote: > > I am writing an application that has to fetch data from - say a million > URLs. I currently have an implementation which looks like the code below > > //Make sure that we just have 500 or less goroutines fetching from URLs > sem := make(chan struct{}, min(500, len(urls))) > //Check if all URLs in the request are valid and if so spawn a goroutine > to fetch data. > for _, u := range urls { > _, err := url.Parse(u) > if err != nil { > log.Printf("%s returned an error- %v", u, err) > continue > } > go fetch(ctx, sem, u) > } > > func fetch(ctx context.Context, sem chan struct{}, u string) { > sem <- struct{}{} > defer func() { <-sem }() > req, err := http.NewRequest(http.MethodGet, u, nil) > if err != nil { > log.Printf("%s returned an error while creating a request- %v", u, err) > return > } > req = req.WithContext(ctx) > res, err := http.DefaultClient.Do(req) > if err != nil { > log.Printf("%s returned an error while performing a request - %v", u, err) > return > } > //Close response body as soon as function returns to prevent resource > lekage. > //https://golang.org/pkg/net/http/#Response > defer res.Body.Close() > } > > Would this application choke when a million goroutines are spawned and are > waiting for a place on the sem channel? I have profiled my code using > pprof and see no problems when I tested it with 50k URLs. > > What is the cost of a goroutine waiting on the semaphore channel? Would it > be ~2KB? > > Is using a worker pool like the one mentioned here > <http://marcio.io/2015/07/handling-1-million-requests-per-minute-with-golang/> > better? > What would be the advantages? I am of the opinion that the runtime > scheduler is a better judge when it comes to managing goroutines. > > Another question is - Would it be better that acquire the semaphore within > the loop such that I limit the number of goroutines spawned? Mr Dave Cheney > suggested otherwise in his talk here > <https://www.youtube.com/watch?time_continue=1336&v=yKQOunhhf4A>. > > Any other suggestions are also welcome. > > TIA! > -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.