I am writing an application that has to fetch data from - say a million 
URLs. I currently have an implementation which looks like the code below 

//Make sure that we just have 500 or less goroutines fetching from URLs
sem := make(chan struct{}, min(500, len(urls)))
//Check if all URLs in the request are valid and if so spawn a goroutine to 
fetch data.
for _, u := range urls {
_, err := url.Parse(u)
if err != nil {
log.Printf("%s returned an error- %v", u, err)
continue
}
go fetch(ctx, sem, u)
}

func fetch(ctx context.Context, sem chan struct{}, u string) {
sem <- struct{}{}
defer func() { <-sem }()
req, err := http.NewRequest(http.MethodGet, u, nil)
if err != nil {
log.Printf("%s returned an error while creating a request- %v", u, err)
return
}
req = req.WithContext(ctx)
res, err := http.DefaultClient.Do(req)
if err != nil {
log.Printf("%s returned an error while performing a request  - %v", u, err)
return
}
//Close response body as soon as function returns to prevent resource 
lekage.
//https://golang.org/pkg/net/http/#Response
defer res.Body.Close()
}

Would this application choke when a million goroutines are spawned and are 
waiting for a place on the sem channel?  I have profiled my code using 
pprof and see no problems when I tested it with 50k URLs.

What is the cost of a goroutine waiting on the semaphore channel? Would it 
be ~2KB?

Is using a worker pool like the one mentioned here 
<http://marcio.io/2015/07/handling-1-million-requests-per-minute-with-golang/> 
better? 
What would be the advantages? I am of the opinion that the runtime 
scheduler is a better judge when it comes to managing goroutines.

Another question is - Would it be better that acquire the semaphore within 
the loop such that I limit the number of goroutines spawned? Mr Dave Cheney 
suggested otherwise in his talk here 
<https://www.youtube.com/watch?time_continue=1336&v=yKQOunhhf4A>.

Any other suggestions are also welcome.

TIA!

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to