Deadlocks are tough.  When I have had similar problems in the past,
(not related to bigints), it turned out 1 thread in the thread pool
had a stack trace that was different.  If there are a lot of threads
in the pool, it can be easy to miss.  If there is a stacktrace that is
different, that'll be a big hint.


On Oct 3, 6:59 am, Lee Spector <lspec...@hampshire.edu> wrote:
> Although this isn't yet making any real sense to me I believe I MAY have 
> traced an elusive problem in code that I ported from clojure 1.1 to 1.2 to 
> the way in which big integers are handled in 1.2. I've seen (and participated 
> in) some conversations about handling bignums but I don't recall what 
> particular changes were slated to happen in the change from 1.1 to 1.2 or to 
> 1.3-in-progress -- can someone point me to a summary?
>
> Regardless of what is supposed to change which each version (and if I recall 
> correctly some of the changes are things that I won't love...) I think that 
> the behavior that I've observed may point to a bug because I'm not getting 
> numeric exceptions or incorrect numeric results but rather hung processes 
> awaiting agent send results. The computations being performed in the forms 
> that I send to the agents aren't stuck in infinite computations -- the CPU 
> usage goes to near zero and eclipse's debugger shows the thread pool threads 
> suspended (but I don't get any "an agent had errors" messages). They look to 
> me like they're deadlocked, but I can't understand how they could be, since 
> none of the computations in the send forms uses any concurrency primitives 
> that might block.
>
> I don't have any direct evidence that it's the numerics at all, and I have 
> not yet been able to produce a repeatable example that reliably produces the 
> problem -- this emerges only after a long time in a system that involves a 
> lot of randomness -- but after exploring many other options I found that if I 
> could prevent the problem by lowering one of my numeric limits. I think I've 
> ruled out other obvious candidates for this behavior, and lowering the limits 
> does seem to fix the problem, so I'm beginning to think that it really might 
> be the numerics.
>
> Does anyone have an idea of anything in the 1.2 numerics that might be 
> responsible for this? If so, then I hope I'm correct in assuming that that 
> should be fixed. Producing an exception or an incorrect result are both bad 
> enough -- I'd personally have automatic promotion to prevent such things even 
> at a small cost for all math -- but mysteriously hung processes are certainly 
> worse.
>
> Here are the stacks as shown in the eclipse debugger when I'm in the hung 
> state. Each of my thread pool threads looks like this:
>
> Thread [pool-1-thread-2] (Suspended)    
>         Unsafe.park(boolean, long) line: not available [native method]  
>         LockSupport.park(Object) line: 158      
>         AbstractQueuedSynchronizer$ConditionObject.await() line: 1925 [local 
> variables unavailable]    
>         LinkedBlockingQueue<E>.take() line: 399 [local variables unavailable] 
>    
>         ThreadPoolExecutor.getTask() line: 947 [local variables unavailable]  
>   
>         ThreadPoolExecutor$Worker.run() line: 907 [local variables 
> unavailable]
>         Thread.run() line: 637 [local variables unavailable]    
>
> And my main thread -- the one that is stuck at "await" -- looks like this:
>
> Thread [main] (Suspended)      
>         Unsafe.park(boolean, long) line: not available [native method]  
>         LockSupport.park(Object) line: 158      
>         
> CountDownLatch$Sync(AbstractQueuedSynchronizer).parkAndCheckInterrupt() line: 
> 747      
>         
> CountDownLatch$Sync(AbstractQueuedSynchronizer).doAcquireSharedInterruptibly(int)
>  line: 905        
>         
> CountDownLatch$Sync(AbstractQueuedSynchronizer).acquireSharedInterruptibly(int)
>  line: 1217
>         CountDownLatch.await() line: 207 [local variables unavailable]  
>         core.clj line: 2485    
>         core$await(RestFn).applyTo(ISeq) line: 138      
>         core.clj line: 540      
>         clojush.clj line: 1549  
>         clojush$pushgp(RestFn).invoke(Object, Object, Object, Object, Object, 
> Object, Object, Object, Object, Object, Object, Object, Object, Object, 
> Object, Object, Object, Object, Object, Object) line: 1178        
>         regression.clj line: 281        
>         Compiler.eval(Object, boolean) line: 5424      
>         Compiler.load(Reader, String, String) line: 5857        
>         RT.loadResourceScript(Class, String, boolean) line: 340
>         RT.loadResourceScript(String, boolean) line: 327        
>         RT.loadResourceScript(String) line: 319
>         main.clj line: 220      
>         repl_ln.clj line: 107  
>         repl_ln.clj line: 117  
>         repl_ln.clj line: 144  
>         main.clj line: 193      
>         main.clj line: 192      
>         main$repl(RestFn).invoke(Object, Object, Object, Object, Object, 
> Object, Object, Object, Object, Object, Object, Object, Object, Object, 
> Object, Object) line: 906      
>         repl_ln.clj line: 263  
>         repl_ln$repl(RestFn).invoke(Object, Object) line: 422  
>         repl_ln.clj line: 140  
>         repl_ln$_main(RestFn).applyTo(ISeq) line: 138  
>         repl_ln.main(String[]) line: not available      
>
> Any help would be appreciated!
>
> Thanks, -Lee

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to