Here is some code I wrote which demonstrates a solution.  This is a
recursive concurrent directory lister which I think is an identical
problem to yours - you start with one directory - you find new
directories in the course of listing them which you also want to list,
but you want to limit the concurrency and not return until they are all
done.

https://github.com/ncw/rclone/blob/master/dircache/list.go#L27

I used another WaitGroup and whenever I put anything into the job queue
I added 1 to it and whenever I removed something I ran .Done() it in.
This gives you something to wait on at the end.

Note I use an extra go routine to put the new directories into the queue
- this is to avoid a deadlock when all the workers are busy trying to do
the same.

On 29/06/16 17:52, Inspectre Gadget wrote:
> Hey everyone,
> 
> Here’s my issue, I will try to keep this short and concise:
> 
> I have written a program that will accept a URL, spider that URL’s
> domain and scheme (http/https), and return back all input fields found
> throughout to the console. The purpose is largely for web application
> security testing, as input fields are the most common vulnerability
> entry points (sinks), and this program automates that part of the
> reconnaissance phase.
> 
> Here is the problematic
> code: 
> https://github.com/insp3ctre/input-field-finder/blob/ce7983bd336ad59b2e2b613868e49dfb44110d09/main.go
> 
> 
> The issue lies in the last for loop in the main() function. If you were
> to run this program, it would check the queue and workers so frequently
> that it is bound to find a point where there are both no workers
> working, and no URLs in the queue (as proved by the console output
> statements before it exits). Nevermind that the problem is exacerbated
> by network latency. The number of URLs actually checked varies on every
> run, which causes some serious inconsistencies, and prevents the program
> from being at all reliable.
> 
> The issue was fixed
> here: 
> https://github.com/insp3ctre/input-field-finder/blob/f0032bb550ced0b323e63be9c4f40d644257abcd/main.go
> 
> 
> I fixed it by removing all concurrency from network requests, leaving it
> only in the internal HTML processing functions.
> 
> So, the question is- how does one run efficient concurrent code when the
> number of wait groups is dynamic, and unknown at program initialization?
> 
> I have tried:
> 
>   * Using “worker pools”, which consist of channels of workers. The for
>     loop checks the length of the URL queue and the number of workers
>     available. If the URL queue is empty and all the workers are
>     available, then it exits the loop.
>   * Dynamically adding wait groups (wg.Add(1)) every time I pull a URL
>     from the URL queue. *I can’t set the wait group numbers before the
>     loop, because I can never know how many URLs are going to be checked.*
> 
> 
> So I have tried using both channels and wait groups to check alongside
> the URL queue length to determine whether more concurrent network
> requests are needed. In both cases, the for loop checks the values so
> fast that it eventually stumbles upon a non-satisfied loop condition,
> and exits. This usually results in either the program hanging as it
> waits for wait groups to exit that never do, or it simply exits
> prematurely, as more URLs are added to the queue after the fact.
> 
> I would really like to know if there is a way to actually do this well
> in Go.
> 
> Cheers,
> 
> Inspectre 
> 
> -- 
> You received this message because you are subscribed to the Google
> Groups "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to golang-nuts+unsubscr...@googlegroups.com
> <mailto:golang-nuts+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.


-- 
Nick Craig-Wood <n...@craig-wood.com> -- http://www.craig-wood.com/nick

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to