Hello,

In one of my example programs I have a strange behaviour: it is a very
simple taskpool using STM; in pseudocode it's

1. generate data structures
2. initialize data structures
3. fork threads
4. wait (using STM) until the pool is empty and all threads are finished
5. print a final message

In very few cases, which depend on the number of threads spawned, the
program hangs *after* the final message of step 5 has been printed.
"Few cases" means, for example, 50.000 good, terminating runs before
it hangs. If you increment the number of spawned threads (to a few
hundred or thousands), it hangs much faster. Since forked threads
terminate after the main thread terminates (which it should after
printing the message), this behaviour is quite unexpected.

Since I've experienced strange behaviour in the past which was the
fault of my system configuration[1], I am a bit cautious before
reporting a bug on GHC's bugtracker, especially since its reproduction
is so difficult and random.

So my question is how much circumspection is expected/needed before
one should enter a bug in the bug tracker? I've tested the attached
code on three different systems (with different linux systems, but
always GHC 6.12.1 (since it's a bit costly to install the older
versions)) and observed the mentioned behaviour. Is this enough to
justify a bug report? Or, on the other hand, could someone spot the
error in the attached code. Given my history with strange parallel
behaviour, I am much more sure that it's the fault of my code, but I
can't spot the error and the described behaviour (halting *after* the
final message) is really strange.


Cheers,
  Michael

[1] http://www.haskell.org/pipermail/haskell-cafe/2010-March/073938.html
-- minimal STM taskpool example which behaves strange: sometimes it hangs
-- after "finished" has been printed, i.e. after all worker are finished 
-- (and the forked threads should be terminated anyway since the main
-- thread has finished).
--
-- I'm testing the implementation in bash with
--
--   i=0; while true;do 
--      printf "%4d:\n" $((i=$i+1))
--      pool 8 64 1 +RTS -N8
--   done
--
-- on an 8-core machine. The parameters are
--
--   - the number of forked (light-weighted) threads
--   - the number of tasks in the pool
--   - the size of the tasks. for easy testing, this value should be
--     very small, e.g. 1 .. 10
--
-- This is just a simplified version of a more complex problem.

module Main where
import Control.Concurrent
import GHC.Conc
import Control.Monad
import qualified Data.Set as Set
import Data.Set (Set)
import Control.Concurrent.STM
import Data.Number.CReal
import System.Environment


main :: IO ()
main = do
    [threads,num,len] <- map read `fmap` getArgs 

    -- generate data structures and fill channel with tasks
    chan    <- newTChanIO
    working <- newTVarIO Set.empty
    atomically $ mapM (writeTChan chan) $ replicate num len

    -- fork threads and WAIT until all threads are finished. 
    replicateM_ threads (forkIO $ thread chan working)
    wait chan working 
    putStrLn "finished"


-- execute a task until the channel is empty and all workers are finished.
thread chan working = do
    task <- get chan working
    case task of
        Nothing -> return ()
        Just t  -> do 
            calcPi t 
            thread chan working
  where calcPi digits = showCReal (fromEnum digits) pi `pseq` return ()


wait chan working = do 
    atomically $ do
        work  <- readTVar working
        empty <- isEmptyTChan chan
        check (Set.null work && empty)


get chan working = do
    tid <- myThreadId

    -- atomically commit that this thread is not working anymore (since we 
    -- try to get a task we must be quasi-idle!
    atomically $ do
        work  <- Set.delete tid `fmap` readTVar working
        writeTVar working work

    -- waits for a new task. if all threads are idle and the pool is empty,
    -- return.
    atomically $ do
        empty <- isEmptyTChan chan
        work  <- readTVar working

        if (not empty) 
            then do 
                task <- readTChan chan
                writeTVar working (Set.insert tid work)
                return (Just task)
            else if Set.null work
                    then return Nothing
                    else retry
             
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to