I found this email thanks to the Parallel Haskell Digest, thanks Eric & Nick!

On 01/04/2011 11:37, Herbert Valerio Riedel wrote:
I'm experiencing an unexpected behaviour with GHC-7.0.3/x86_64's
multithreading runtime when running some more demanding single-thread
computations.

I've isolated the part of the larger application which exposes the
problem down to the parsing of a ~8MiB JSON file which causes quite a
bit of memory allocations (source at end of email). The program is
compiled with

ghc --make -O2 -threaded -eventlog -rtsopts parse_aeson.hs

and I ran it with and w/o the -N12 option on a 12-core machine (the
problem is visible on a 4-core machine as well):

$ time ./parse_aeson test.json
0.000392s after readFile
0.001016s parse was a success...
6.050836s NFed parse-tree...
6.050968s NFed parse-tree...
no exception

real    0m6.100s
user    0m5.680s
sys     0m0.419s


$ time ./parse_aeson test.json +RTS -N12
0.001249s after readFile
0.001777s parse was a success...
10.048607s NFed parse-tree...
10.048738s NFed parse-tree...
no exception

real    0m10.125s
user    0m57.496s
sys     0m39.927s


When observing top(1)'s output, I saw that all HECs consumed a
substantial amount of cpu cycle, so I loaded up threadscope to see what
the remaining 11 HECs were doing all the time.


I've put the resulting eventlog files and screenshots of threadscope's
visualization online at

http://www.narf.at/~hvr/Haskell/parse_aeson.N0.eventlog.png
http://www.narf.at/~hvr/Haskell/parse_aeson.N0.eventlog.xz

http://www.narf.at/~hvr/Haskell/parse_aeson.N12.eventlog.png
http://www.narf.at/~hvr/Haskell/parse_aeson.N12.eventlog.xz


What's happening there? The actual processing work seems to be done in a
single HEC... but what are the remaining 11 HECs doing exactly? Am I
doing something wrong?

The answer is, they're all doing GC. When you say -N, the parallel GC is turned on, so every GC requires the cooperation of all cores. When you're running parallel code this is a big win, but for sequential code it could well be a slowdown.

You probably want to turn off parallel GC for the young generation with "+RTS -qg1". If that doesn't help, try turning it off entirely: "+RTS -qg".

I have experimented with making this happen automatically, but so far haven't managed to find a heuristic that improves performance consistently.

What does it mean when the GC-bars in threadscope becomes green as in
the middle of the screenshot below?

http://www.narf.at/~hvr/Haskell/parse_aeson.N4.eventlog.pdf

The green bit is where the GC is traversing the heap, as opposed to the other administrative tasks that the GC does (such as waiting for the other HECs to stop, dealing with weak pointers, freeing memory, etc.). The green bars let you see when one core is doing most of the work during parallel GC, which can be an indication that your data structures are mainly linear (like lists) rather than tree-like.

Cheers,
        Simon

_______________________________________________
Haskell-Cafe mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to