Hello, In one of my example programs I have a strange behaviour: it is a very simple taskpool using STM; in pseudocode it's
1. generate data structures 2. initialize data structures 3. fork threads 4. wait (using STM) until the pool is empty and all threads are finished 5. print a final message In very few cases, which depend on the number of threads spawned, the program hangs *after* the final message of step 5 has been printed. "Few cases" means, for example, 50.000 good, terminating runs before it hangs. If you increment the number of spawned threads (to a few hundred or thousands), it hangs much faster. Since forked threads terminate after the main thread terminates (which it should after printing the message), this behaviour is quite unexpected. Since I've experienced strange behaviour in the past which was the fault of my system configuration[1], I am a bit cautious before reporting a bug on GHC's bugtracker, especially since its reproduction is so difficult and random. So my question is how much circumspection is expected/needed before one should enter a bug in the bug tracker? I've tested the attached code on three different systems (with different linux systems, but always GHC 6.12.1 (since it's a bit costly to install the older versions)) and observed the mentioned behaviour. Is this enough to justify a bug report? Or, on the other hand, could someone spot the error in the attached code. Given my history with strange parallel behaviour, I am much more sure that it's the fault of my code, but I can't spot the error and the described behaviour (halting *after* the final message) is really strange. Cheers, Michael [1] http://www.haskell.org/pipermail/haskell-cafe/2010-March/073938.html
-- minimal STM taskpool example which behaves strange: sometimes it hangs -- after "finished" has been printed, i.e. after all worker are finished -- (and the forked threads should be terminated anyway since the main -- thread has finished). -- -- I'm testing the implementation in bash with -- -- i=0; while true;do -- printf "%4d:\n" $((i=$i+1)) -- pool 8 64 1 +RTS -N8 -- done -- -- on an 8-core machine. The parameters are -- -- - the number of forked (light-weighted) threads -- - the number of tasks in the pool -- - the size of the tasks. for easy testing, this value should be -- very small, e.g. 1 .. 10 -- -- This is just a simplified version of a more complex problem. module Main where import Control.Concurrent import GHC.Conc import Control.Monad import qualified Data.Set as Set import Data.Set (Set) import Control.Concurrent.STM import Data.Number.CReal import System.Environment main :: IO () main = do [threads,num,len] <- map read `fmap` getArgs -- generate data structures and fill channel with tasks chan <- newTChanIO working <- newTVarIO Set.empty atomically $ mapM (writeTChan chan) $ replicate num len -- fork threads and WAIT until all threads are finished. replicateM_ threads (forkIO $ thread chan working) wait chan working putStrLn "finished" -- execute a task until the channel is empty and all workers are finished. thread chan working = do task <- get chan working case task of Nothing -> return () Just t -> do calcPi t thread chan working where calcPi digits = showCReal (fromEnum digits) pi `pseq` return () wait chan working = do atomically $ do work <- readTVar working empty <- isEmptyTChan chan check (Set.null work && empty) get chan working = do tid <- myThreadId -- atomically commit that this thread is not working anymore (since we -- try to get a task we must be quasi-idle! atomically $ do work <- Set.delete tid `fmap` readTVar working writeTVar working work -- waits for a new task. if all threads are idle and the pool is empty, -- return. atomically $ do empty <- isEmptyTChan chan work <- readTVar working if (not empty) then do task <- readTChan chan writeTVar working (Set.insert tid work) return (Just task) else if Set.null work then return Nothing else retry
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe