On Jun 15, 6:27 am, Simon Marlow <[email protected]> wrote: > On 15/06/2010 06:09, braver wrote: > > > In fact, the tag cafe2, when run on the full dataset, gets stuck at 11 > > days, with RAM slowly getting into 50 GB; a previous version caused > > ghc 6.12.1 to segfault around day 12 -- -debug showing an assert > > failure in Storage.c. ghc 6.10 got stuck at 30 days for good, and > > when profiling crashed twice with a "strange closure" or a stack > > overflow. So allocation is a problem still. > > I'd be happy to help you track this down, but I don't have a machine big > enough. Do you have any runs that display a problem with a smaller heap > (< 16GB)? > > If the program is apparently hung, try connecting to it with 'gdb > --pid=<pid>' and doing 'info thread' and 'where'. That might give me > enough clues to find out where the problem is. > > Is this with -threaded, BTW? With residency on that scale, I'd expect > the parallel GC to help quite a lot. But obviously getting it to not > crash/hang is the first priority :)
Simon - thanks for the tips, this is what gdb says when it's stuck at 45 GB when limited with -A5G -M40G: ... 0x00000000004c3c21 in free_mega_group () (gdb) info thread * 1 Thread 0x2b21c1da4dc0 (LWP 10210) 0x00000000004c3c21 in free_mega_group () (gdb) where #0 0x00000000004c3c21 in free_mega_group () #1 0x00000000004c3ff9 in freeChain () #2 0x00000000004c5ab0 in GarbageCollect () #3 0x00000000004bff96 in scheduleDoGC () #4 0x00000000004c0b25 in scheduleWaitThread () #5 0x00000000004bea09 in real_main () #6 0x00000000004beb17 in hs_main () #7 0x00000037d5a1d974 in __libc_start_main () from /lib64/libc.so.6 #8 0x0000000000402ca9 in _start () I'll also supply heap profiles for small runs shortly. -- Alexy _______________________________________________ Haskell-Cafe mailing list [email protected] http://www.haskell.org/mailman/listinfo/haskell-cafe
