| In an idle moment I wrote the attached program, which prints the
| song "1 man went to mow".  However if I try it on Hugs
| (version of March 1999, running on a Sparc station with 128MB),
| it is unable to print more than about 91 verses, after which it
| prints
|    ERROR: Garbage collection fails to reclaim sufficient space
| OK, so I suppose the reason is Hugs doesn't do tail recursion elimination,

If that was the case, then you would have seen an error message about
stack depth.  But this error message is about the garbage collected
heap...

| though giving up on a total stack depth of less than 200 seems a bit
| pathetic.  But what does surprise me is that after I've had that ERROR,
| I can't do anything.  If I try (say)
| Main> verse 10
| Hugs bombs out after only a few lines with the same ERROR.  Why doesn't
| it collect all the space it used in the previous step?  Is there a space
| leak somewhere?

Now you're on the right track.  Yes, there is a space leak in your program.
You've asked it compute a rather large expression:

   main = mow 1 >> mow 2 >> mow 3 >> mow 4 >> ...

Hugs does this, and remembers the result, as you would expect with lazy
evaluation.  (Don't evaluate unless you have to, but if you do evaluate
something, remember the result.)  But of course it takes space to store
an infinite expression.  An infinite amount of space, in fact.  Hence
your program leaks space, and the garbage collector eventually gives up.
Fortunately, the garbage collector was designed to give up a little
earlier than the absolute limit, while it still has at least a little
space left.  And that is why your second attempt to run main doesn't
just fail straight away.

Incidentally, although you probably can't tell the difference with a
fast computer and an IO-bound program, Hugs is actually running main
*faster* the second time around because it didn't have to do all the
work to unwind the recursion a second time.

In effect, the program you have written is largely equivalent to the
following:

  while (malloc(1000))       /* Allocate a block of memory  */
    ;                        /* ... until there's none left */
  if (malloc(1000)==NULL)    /* ... and then complain ...   */
    printf("ERROR: strangely enough there's no memory left!\n");

I'd be worried if my program *didn't* display the error message here!

The point at which Hugs gives up will depend on your heap size setting,
and on your patience in waiting for it to reach that point.  Even if
Hugs was able to expand its heap during execution, your program would
still run out of space in the end, although it might well have exhausted
virtual memory by that point, and have the machine crawling along with
enough thrashing to make it completely unusable.

Next you might suggest that Hugs is wrong to cache top-level variables.
After all, you say, other Haskell compilers don't do that.  But Hugs is
an interpreter, not a compiler, and I've long maintained that this should
(and does) make a difference.  In a compiler, it makes sense to let
the garbage collector start reclaiming space for main as the computation
proceeds ... after all, you're never going to evaluate main again.
(Although consider what might happen if your definition for main was
main = mow 0 >> main...)  In other words: it is ok to treat such top-level
definitions as special cases when there is no way for the user to
observe that any special cases are being applied.  Strictness analysis,
for example, is justified by similar principles.  In Hugs, top-level
bindings are cached, just like any other bindings; doing otherwise would
create an inconsistency in behavior.  In particular, Hugs has to cache
the result it computes for main because, for all the interpreter knows,
you might want to run it a second time within the same session.  (Just
as you did, in fact!)  What's more, this feature can even be quite useful
for some applications as a mechanism for memoizing the results of certain
computations.

This is an old debate, and the type of definitions that we're talking
about even have a special name: CAF's, or constant applicative forms.
It is easy enough to arrange for Hugs to reset all CAFs at the beginning
of each evaluation; your program would exhibit the same behavior as it
does now, but the second run would go for just as long as the first before
it stopped.  It would take a little more work to allow space for CAFs to
be recovered during evaluation if the garbage collector can prove that
they are no longer referenced, but that is also feasible.  But Hugs
doesn't take either of these steps because, it my opinion, neither one
is the correct behavior.

This doesn't mean that there is no way to reset the CAFs --- i.e., to
to recover the memory that you have used.  Of course, you could :quit
Hugs and restart, but that's not satisfactory.  Instead, you just have
to tell Hugs that you want to start a fresh session with your program,
and you do this by using the :load command.  Try:

    :load mow
    :gc
    main
    :load mow
    :gc

You should see almost exactly the same number of free cells after both
uses of :gc ... all the memory you had to begin with is yours to use
again ...

Finally, you might suggest that it is sometimes hard for programmers to
understand how and why space is being used in their Haskell programs.
On that point, I would strongly agree.  We like to think declaratively,
and free ourselves from concerns about `mundane trivia' like memory
allocation.  But as long as memory remains finite, this will always be
an issue.  We need a better model for Haskell programs so that
programmers can understand and predict how their code will use memory
at a suitably abstract level.  But it's a hard nut to crack, and it
still seems a long way off right now.

Staying with that last theme, a much more interesting question comes
from observing that we wouldn't be having this conversation at all if
you had just written:

  mow n = do verse n
             mow(n+1)

instead of:

  mow n = verse n >> mow(n+1)

Or even if you had just added an extra line to your definition of main:

  main = do putStr "George's song:\n\n"
            mow 1 :: IO Int

If you use these "do" forms, your program will not leak space.  I won't
explain why this happens here, and will leave it as a puzzle for anyone
who has read this far to ponder and enjoy!

All the best,
Mark

Reply via email to