So, it looks to me like 1 million URLS will spin up 1 million goroutines. 
That seems unnecessary, and likely to be a problem. I usually default to 
the simplest model, unless there is a reason to do otherwise. IMHO, here 
that would be to spin up as many goroutines as you want to be working at 
once (500), then feed data through a channel. Untested pseudo-code:
func fetchAll(ctx context.Context) {
    workerCount := min(500, len(urls))
    c := make(chan string)

    // Spin up workers
    for i := 0; i < workerCount; i++ {
        go doWork(ctx, c)
    }
    // Feed them work
    for _, u := range urls {
        if u == "" {
            continue
        }
        _, err := url.Parse(u)
        if err != nil {
            log.Printf("%s returned an error- %v", u, err)
            continue
        }
        c <- u
    }
    // All work sent, signal workers to die. Note, workers may still be
    // processing at this point.
    close(c)
}

func doWork(ctx context.Context, c chan string) {
    for {
        u := <-c
        if u == "" {
            // channel was closed ... no more work.
            return
        }
        fetch(ctx, u)
    }
}

func fetch(ctx context.Context, u string) {
    req, err := http.NewRequest(http.MethodGet, u, nil)
    if err != nil {
        log.Printf("%s returned an error while creating a request- %v", u, 
err)
        return
    }
    req = req.WithContext(ctx)
    res, err := http.DefaultClient.Do(req)
    if err != nil {
        log.Printf("%s returned an error while performing a request  - %v", 
u, err)
        return
    }
    //Close response body as soon as function returns to prevent resource
    //leakage. https://golang.org/pkg/net/http/#Response
    defer res.Body.Close()
}

I find this basic model simple and easy top reason about. And it should 
avoid wasted goroutines. I have not really paid much attention to "fetch". 
I assume it was already correct code. I'm not sure if you are expecting to 
handle a cancel on the Context. If so, then you will want to add a check 
for that in your main fetchAll() loop, and bail early if it was canceled. 


On Saturday, May 26, 2018 at 11:48:28 AM UTC-4, Karthik Rao wrote:
>
> I am writing an application that has to fetch data from - say a million 
> URLs. I currently have an implementation which looks like the code below 
>
> //Make sure that we just have 500 or less goroutines fetching from URLs
> sem := make(chan struct{}, min(500, len(urls)))
> //Check if all URLs in the request are valid and if so spawn a goroutine 
> to fetch data.
> for _, u := range urls {
> _, err := url.Parse(u)
> if err != nil {
> log.Printf("%s returned an error- %v", u, err)
> continue
> }
> go fetch(ctx, sem, u)
> }
>
> func fetch(ctx context.Context, sem chan struct{}, u string) {
> sem <- struct{}{}
> defer func() { <-sem }()
> req, err := http.NewRequest(http.MethodGet, u, nil)
> if err != nil {
> log.Printf("%s returned an error while creating a request- %v", u, err)
> return
> }
> req = req.WithContext(ctx)
> res, err := http.DefaultClient.Do(req)
> if err != nil {
> log.Printf("%s returned an error while performing a request  - %v", u, err)
> return
> }
> //Close response body as soon as function returns to prevent resource 
> lekage.
> //https://golang.org/pkg/net/http/#Response
> defer res.Body.Close()
> }
>
> Would this application choke when a million goroutines are spawned and are 
> waiting for a place on the sem channel?  I have profiled my code using 
> pprof and see no problems when I tested it with 50k URLs.
>
> What is the cost of a goroutine waiting on the semaphore channel? Would it 
> be ~2KB?
>
> Is using a worker pool like the one mentioned here 
> <http://marcio.io/2015/07/handling-1-million-requests-per-minute-with-golang/>
>  better? 
> What would be the advantages? I am of the opinion that the runtime 
> scheduler is a better judge when it comes to managing goroutines.
>
> Another question is - Would it be better that acquire the semaphore within 
> the loop such that I limit the number of goroutines spawned? Mr Dave Cheney 
> suggested otherwise in his talk here 
> <https://www.youtube.com/watch?time_continue=1336&v=yKQOunhhf4A>.
>
> Any other suggestions are also welcome.
>
> TIA!
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to