The parallel GC currently doesn't behave well with concurrent programs that uses multiple capabilities (aka OS threads), and the behaviour you see is the known symptom of this.. I believe that Simon Marlow has some fixes in hand that may go into 6.12.2.
Are you saying that you see two different classes of undesirable performance, one with -qg and one without? How are your threads in your real program communicating with each other? We've seen problems there when there's a lot of contention for e.g. IORefs among thousands of threads. On Mon, Mar 1, 2010 at 7:59 AM, Michael Lesniak <mlesn...@uni-kassel.de>wrote: > Hello haskell-cafe, > > Sorry for this long post, but I can't think of a way to describe and > explain > the problem in a shorter way. > > I've (again) a very strange behaviour with the parallel GC and would be > glad > if someone could either reproduce (and explain) it or provide a solution. A > similar but unrelated problem has been described in [1]. > > > EXAMPLE CODE > The following demonstration program, which is a much smaller and > single-threaded version of my real problem behaves as my real program. > It does some number crunching by calculating pi to a definable precision: > > > -- File Pi.hs > > -- you need the numbers package from hackage. > > module Main where > > import Data.Number.CReal > > import System.Environment > > import GHC.Conc > > > > main = do > > digits <- (read . head) `fmap` getArgs :: IO Int > > calcPi digits > > > > calcPi digits = showCReal (fromEnum digits) pi `pseq` return () > > Compile it with > > ghc --make -threaded -O2 Pi.hs -o pi > > > BENCHMARKS > On my two-core machine I get the following quite strange and > unpredictable results: > > * Using one thread: > > $ for i in `seq 1 5`;do time pi 5000 +RTS -N1;done > > real 0m1.441s > user 0m1.390s > sys 0m0.020s > > real 0m1.449s > user 0m1.390s > sys 0m0.000s > > real 0m1.399s > user 0m1.370s > sys 0m0.010s > > real 0m1.401s > user 0m1.380s > sys 0m0.000s > > real 0m1.404s > user 0m1.380s > sys 0m0.000s > > > * Using two threads, hence the parallel GC is used: > > for i in `seq 1 5`;do time pi 5000 +RTS -N2;done > > real 0m2.540s > user 0m2.490s > sys 0m0.010s > > real 0m1.527s > user 0m1.530s > sys 0m0.010s > > real 0m1.966s > user 0m1.900s > sys 0m0.010s > > real 0m5.670s > user 0m5.620s > sys 0m0.010s > > real 0m2.966s > user 0m2.910s > sys 0m0.020s > > > * Using two threads, but disabling the parallel GC: > > for i in `seq 1 5`;do time pi 5000 +RTS -N2 -qg;done > > real 0m1.383s > user 0m1.380s > sys 0m0.010s > > real 0m1.420s > user 0m1.360s > sys 0m0.010s > > real 0m1.406s > user 0m1.360s > sys 0m0.010s > > real 0m1.421s > user 0m1.380s > sys 0m0.000s > > real 0m1.360s > user 0m1.360s > sys 0m0.000s > > > THREADSCOPE > I've additionally attached the threadscope profile of a really bad run, > started with > > $ time pi 5000 +RTS -N2 -ls > > real 0m15.594s > user 0m15.490s > sys 0m0.010s > > as file pi.pdf > > > FURTHER INFORMATION/QUESTION > Just disabling the parallel GC leads to very bad performance in my original > code, which forks threads with forkIO and does a lot of communications. > Hence, > using -qg is not a real option for me. > > Do I have overlooked some cruical aspect of this problem? If you've > read this far, thank you for reading ... this far ;-) > > Cheers, > Michael > > > > [1] http://osdir.com/ml/haskell-cafe@haskell.org/2010-02/msg00850.html > > > -- > Dipl.-Inf. Michael C. Lesniak > University of Kassel > Programming Languages / Methodologies Research Group > Department of Computer Science and Electrical Engineering > > Wilhelmshöher Allee 73 > 34121 Kassel > > Phone: +49-(0)561-804-6269 > > _______________________________________________ > Haskell-Cafe mailing list > Haskell-Cafe@haskell.org > http://www.haskell.org/mailman/listinfo/haskell-cafe > >
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe