thread-pool scope

Dan Fichter Mon, 20 Jul 2009 16:47:22 -0700

I want to set up Executor thread pools spawned by threads in other Executor
thread pools and to scope certain variables to these nested pools.  I wish
something I would call "thread-pool scope" were easily achieved in Clojure.
 Is it?


I'm writing a Web crawler (let's pretend) that has three top-level
functions:

- *start-crawlers* will spawn several crawlers by invoking (crawl) several
times in an Executor pool, and it'll give each crawler its own http
user-agent to crawl with

- *crawl, *each time it's invoked, will construct a queue of hyperlinks and
a new Executor thread pool; it'll then invoke (visit) on several threads in
this pool, and it'll expect them to use this queue and the user-agent that
was supplied when (crawl) was invoked

- *visit* will consume its pool's queue, using its pool's user-agent to
visit Web pages

So there are nested thread pools, and the question is how to scope the data
structures holding user-agents and queues to them.  How does start-crawlers
tell each (crawl) invocation which user-agent to use?  And how does crawl
pass this value to each (visit) invocation in the thread pool it's spawned
and share its queue with these visitors?

These are the options I've come up with (three "can't"s and one lonely
"can"):

1. Reference passing: why can't start-crawlers simply pass a user-agent when
it invokes (crawl)? and why can't crawl pass this user-agent and a reference
to a queue when it invokes (visit)?  Because Executors can execute
no-arguments functions only.  If they want to be threaded, crawl and visit
cannot take arguments.  (In the land of broken threading interfaces, Python:
1; Java: 0.)

2. Dynamic var bindings: these won't work, because var bindings are
thread-local.  If start-crawlers created a dynamic binding for user-agent,
it wouldn't be visible to the (crawl)-invoking threads that needed it.

3. Global vars and atoms: these would work only if my thread pools weren't
nested.  But I can't have a single global var called user-agent or a single
global atom called queue.  Each crawl-spawned thread pool needs its own
queue and its own user-agent, so globals won't work.

4. Closures: these will work.  Rather than have start-crawlers evaluate
(.execute pool crawl), it can do (.execute pool (get-crawl-fn user-agent)),
where (get-crawl-fn) returns a zero-arguments crawl-like function that
closes over the user-agent supplied.  Likewise, crawl can do (.execute pool
(get-visit-fn user-agent queue)), where get-visit-fn serves up a
zero-arguments visitor function.

But I wish I didn't have to use closures.  It's more abstraction than I
think this problem should require.  Or the wrong kind of abstraction.  It's
equivalent to what I imagine the Java solution would be: to have a Runnable
implementor capture some arguments in its constructor call and then access
these in its no-arguments run() method.  Conceptually, it isn't what I want.

I don't want lexical, thread-local, or global scope; my program consists of
nested thread pools, and I want thread-pool scope.  Is there a nice Clojure
way to get this?  Would you use dynamically-generated namespaces or
something else I haven't dug into yet?

Thanks for your help, and thanks for Clojure,

Dan

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---

thread-pool scope

Reply via email to