Deadlocks are tough. When I have had similar problems in the past, (not related to bigints), it turned out 1 thread in the thread pool had a stack trace that was different. If there are a lot of threads in the pool, it can be easy to miss. If there is a stacktrace that is different, that'll be a big hint.
On Oct 3, 6:59 am, Lee Spector <lspec...@hampshire.edu> wrote: > Although this isn't yet making any real sense to me I believe I MAY have > traced an elusive problem in code that I ported from clojure 1.1 to 1.2 to > the way in which big integers are handled in 1.2. I've seen (and participated > in) some conversations about handling bignums but I don't recall what > particular changes were slated to happen in the change from 1.1 to 1.2 or to > 1.3-in-progress -- can someone point me to a summary? > > Regardless of what is supposed to change which each version (and if I recall > correctly some of the changes are things that I won't love...) I think that > the behavior that I've observed may point to a bug because I'm not getting > numeric exceptions or incorrect numeric results but rather hung processes > awaiting agent send results. The computations being performed in the forms > that I send to the agents aren't stuck in infinite computations -- the CPU > usage goes to near zero and eclipse's debugger shows the thread pool threads > suspended (but I don't get any "an agent had errors" messages). They look to > me like they're deadlocked, but I can't understand how they could be, since > none of the computations in the send forms uses any concurrency primitives > that might block. > > I don't have any direct evidence that it's the numerics at all, and I have > not yet been able to produce a repeatable example that reliably produces the > problem -- this emerges only after a long time in a system that involves a > lot of randomness -- but after exploring many other options I found that if I > could prevent the problem by lowering one of my numeric limits. I think I've > ruled out other obvious candidates for this behavior, and lowering the limits > does seem to fix the problem, so I'm beginning to think that it really might > be the numerics. > > Does anyone have an idea of anything in the 1.2 numerics that might be > responsible for this? If so, then I hope I'm correct in assuming that that > should be fixed. Producing an exception or an incorrect result are both bad > enough -- I'd personally have automatic promotion to prevent such things even > at a small cost for all math -- but mysteriously hung processes are certainly > worse. > > Here are the stacks as shown in the eclipse debugger when I'm in the hung > state. Each of my thread pool threads looks like this: > > Thread [pool-1-thread-2] (Suspended) > Unsafe.park(boolean, long) line: not available [native method] > LockSupport.park(Object) line: 158 > AbstractQueuedSynchronizer$ConditionObject.await() line: 1925 [local > variables unavailable] > LinkedBlockingQueue<E>.take() line: 399 [local variables unavailable] > > ThreadPoolExecutor.getTask() line: 947 [local variables unavailable] > > ThreadPoolExecutor$Worker.run() line: 907 [local variables > unavailable] > Thread.run() line: 637 [local variables unavailable] > > And my main thread -- the one that is stuck at "await" -- looks like this: > > Thread [main] (Suspended) > Unsafe.park(boolean, long) line: not available [native method] > LockSupport.park(Object) line: 158 > > CountDownLatch$Sync(AbstractQueuedSynchronizer).parkAndCheckInterrupt() line: > 747 > > CountDownLatch$Sync(AbstractQueuedSynchronizer).doAcquireSharedInterruptibly(int) > line: 905 > > CountDownLatch$Sync(AbstractQueuedSynchronizer).acquireSharedInterruptibly(int) > line: 1217 > CountDownLatch.await() line: 207 [local variables unavailable] > core.clj line: 2485 > core$await(RestFn).applyTo(ISeq) line: 138 > core.clj line: 540 > clojush.clj line: 1549 > clojush$pushgp(RestFn).invoke(Object, Object, Object, Object, Object, > Object, Object, Object, Object, Object, Object, Object, Object, Object, > Object, Object, Object, Object, Object, Object) line: 1178 > regression.clj line: 281 > Compiler.eval(Object, boolean) line: 5424 > Compiler.load(Reader, String, String) line: 5857 > RT.loadResourceScript(Class, String, boolean) line: 340 > RT.loadResourceScript(String, boolean) line: 327 > RT.loadResourceScript(String) line: 319 > main.clj line: 220 > repl_ln.clj line: 107 > repl_ln.clj line: 117 > repl_ln.clj line: 144 > main.clj line: 193 > main.clj line: 192 > main$repl(RestFn).invoke(Object, Object, Object, Object, Object, > Object, Object, Object, Object, Object, Object, Object, Object, Object, > Object, Object) line: 906 > repl_ln.clj line: 263 > repl_ln$repl(RestFn).invoke(Object, Object) line: 422 > repl_ln.clj line: 140 > repl_ln$_main(RestFn).applyTo(ISeq) line: 138 > repl_ln.main(String[]) line: not available > > Any help would be appreciated! > > Thanks, -Lee -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en