I want to set up Executor thread pools spawned by threads in other Executor thread pools and to scope certain variables to these nested pools. I wish something I would call "thread-pool scope" were easily achieved in Clojure. Is it?
I'm writing a Web crawler (let's pretend) that has three top-level functions: - *start-crawlers* will spawn several crawlers by invoking (crawl) several times in an Executor pool, and it'll give each crawler its own http user-agent to crawl with - *crawl, *each time it's invoked, will construct a queue of hyperlinks and a new Executor thread pool; it'll then invoke (visit) on several threads in this pool, and it'll expect them to use this queue and the user-agent that was supplied when (crawl) was invoked - *visit* will consume its pool's queue, using its pool's user-agent to visit Web pages So there are nested thread pools, and the question is how to scope the data structures holding user-agents and queues to them. How does start-crawlers tell each (crawl) invocation which user-agent to use? And how does crawl pass this value to each (visit) invocation in the thread pool it's spawned and share its queue with these visitors? These are the options I've come up with (three "can't"s and one lonely "can"): 1. Reference passing: why can't start-crawlers simply pass a user-agent when it invokes (crawl)? and why can't crawl pass this user-agent and a reference to a queue when it invokes (visit)? Because Executors can execute no-arguments functions only. If they want to be threaded, crawl and visit cannot take arguments. (In the land of broken threading interfaces, Python: 1; Java: 0.) 2. Dynamic var bindings: these won't work, because var bindings are thread-local. If start-crawlers created a dynamic binding for user-agent, it wouldn't be visible to the (crawl)-invoking threads that needed it. 3. Global vars and atoms: these would work only if my thread pools weren't nested. But I can't have a single global var called user-agent or a single global atom called queue. Each crawl-spawned thread pool needs its own queue and its own user-agent, so globals won't work. 4. Closures: these will work. Rather than have start-crawlers evaluate (.execute pool crawl), it can do (.execute pool (get-crawl-fn user-agent)), where (get-crawl-fn) returns a zero-arguments crawl-like function that closes over the user-agent supplied. Likewise, crawl can do (.execute pool (get-visit-fn user-agent queue)), where get-visit-fn serves up a zero-arguments visitor function. But I wish I didn't have to use closures. It's more abstraction than I think this problem should require. Or the wrong kind of abstraction. It's equivalent to what I imagine the Java solution would be: to have a Runnable implementor capture some arguments in its constructor call and then access these in its no-arguments run() method. Conceptually, it isn't what I want. I don't want lexical, thread-local, or global scope; my program consists of nested thread pools, and I want thread-pool scope. Is there a nice Clojure way to get this? Would you use dynamically-generated namespaces or something else I haven't dug into yet? Thanks for your help, and thanks for Clojure, Dan --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~----------~----~----~----~------~----~------~--~---