Re: wip-cse

2012-04-25 Thread Andy Wingo
Hi,

On Wed 25 Apr 2012 01:14, l...@gnu.org (Ludovic Courtès) writes:

> Would be nice to check with the micro-benchs in vlists.bm as well.

Forgot to mention that.

Before:

  ;; running guile version 2.0.5.94-a8004d
  ;; calibrating the benchmarking framework...
  ;; calibration: ("empty initialization benchmark" 1000 real 0.342058814 
real/iteration 3.42058814e-8 run/iteration 3.4155491e-8 core/iteration 0.0 gc 
0.0)
  ("vlists.bm: constructors: cons (srfi-1)" 2 real 0.057408929 real/iteration 
0.0287044645 run/iteration 0.028649658 core/iteration 0.028649623844509 gc 
0.015796477)
  ("vlists.bm: constructors: cons (vlist)" 2 real 0.284835105 real/iteration 
0.1424175525 run/iteration 0.1421727855 core/iteration 0.142172751344509 gc 
0.027867707)
  ("vlists.bm: constructors: acons (srfi-1)" 2 real 0.14062694 real/iteration 
0.07031347 run/iteration 0.070181649 core/iteration 0.070181614844509 gc 
0.035069774)
  ("vlists.bm: constructors: acons (vlist)" 2 real 0.84986149 real/iteration 
0.424930745 run/iteration 0.424175853 core/iteration 0.424175818844509 gc 
0.049258546)
  ("vlists.bm: iteration: fold (srfi-1)" 2 real 0.04667501 real/iteration 
0.023337505 run/iteration 0.0232955625 core/iteration 0.023295528344509 gc 0.0)
  ("vlists.bm: iteration: fold (vlist)" 2 real 0.117599712 real/iteration 
0.058799856 run/iteration 0.0585624875 core/iteration 0.058562453344509 gc 0.0)
  ("vlists.bm: iteration: assoc (srfi-1)" 70 real 3.308754515 real/iteration 
0.0472679216428571 run/iteration 0.0471808222857143 core/iteration 
0.0471807881302233 gc 0.0)
  ("vlists.bm: iteration: assoc (vhash)" 70 real 0.0021592 real/iteration 
3.08457142857143e-5 run/iteration 3.07982571428571e-5 core/iteration 
3.07641016518571e-5 gc 0.0)

After:

  ;; running guile version 2.0.5.123-g4bd53c1
  ;; calibrating the benchmarking framework...
  ;; calibration: ("empty initialization benchmark" 1000 real 0.352669089 
real/iteration 3.52669089e-8 run/iteration 3.51752466e-8 core/iteration 0.0 gc 
0.0)
  ("vlists.bm: constructors: cons (srfi-1)" 2 real 0.0531704 real/iteration 
0.0265852 run/iteration 0.0265315645 core/iteration 0.0265315293247534 gc 
0.012732576)
  ("vlists.bm: constructors: cons (vlist)" 2 real 0.250039641 real/iteration 
0.1250198205 run/iteration 0.1247150905 core/iteration 0.124715055324753 gc 
0.025619954)
  ("vlists.bm: constructors: acons (srfi-1)" 2 real 0.134855313 real/iteration 
0.0674276565 run/iteration 0.067306533 core/iteration 0.0673064978247534 gc 
0.041115138)
  ("vlists.bm: constructors: acons (vlist)" 2 real 0.549644456 real/iteration 
0.27488 run/iteration 0.2741667145 core/iteration 0.274166679324753 gc 
0.016484469)
  ("vlists.bm: iteration: fold (srfi-1)" 2 real 0.0454016 real/iteration 
0.0227008 run/iteration 0.022658765 core/iteration 0.0226587298247534 gc 0.0)
  ("vlists.bm: iteration: fold (vlist)" 2 real 0.086939778 real/iteration 
0.043469889 run/iteration 0.043402648 core/iteration 0.0434026128247534 gc 0.0)
  ("vlists.bm: iteration: assoc (srfi-1)" 70 real 3.325209262 real/iteration 
0.0475029894571429 run/iteration 0.0474030040428571 core/iteration 
0.0474029688676105 gc 0.0)
  ("vlists.bm: iteration: assoc (vhash)" 70 real 0.00121 real/iteration 
1.73174571428571e-5 run/iteration 1.72831142857143e-5 core/iteration 
1.72479390391143e-5 gc 0.0)

I don't think it's useful to run srfi-1 and vlist tests the same number
of times when their complexity varies, as in the assoc case.  Anyway,
those are the numbers!

Regards,

Andy
-- 
http://wingolog.org/



Re: wip-cse

2012-04-25 Thread Ludovic Courtès
Hi,

Andy Wingo  skribis:

> I don't think it's useful to run srfi-1 and vlist tests the same number
> of times when their complexity varies, as in the assoc case.

Right, and the ‘vhash’ bench is too short to draw any sort of conclusion.

Could you try with an appropriate value of ‘--iteration-factor’?

Unfortunately, vlists.bm doesn’t appear at
.

Thanks!

Ludo’.



Let's Talk About Backtraces and Stacks

2012-04-25 Thread Noah Lavine
Hello all,

There has been some talk on this list about letting Guile show useful
backtraces for tail-recursive functions. I was thinking about how to
optimize local variable allocation, and realized that this question is
related to that, and to other things too. So I'd like to ask people
now how we want to handle this set of related issues, so I can keep
working on them.

The standard implementation of this is keeping a ribcage structure,
where the backbone is the standard stack and the ribs are bounded
lists of tail calls made for each stack frame - say the last 50 tail
calls. I can think of two ways we might do this: the ribcage could be
part of the standard Scheme stack, with a rib hanging off of each
stack frame, or there could be a separate ribcage structure. A
separate ribcage would duplicate information and might be slower (in
the debug VM at least), but would also give us more freedom for
variable allocation in the regular stack (see issue 1).

Here are the issues:

1) Variable allocation.

I had an idea to allocate lexical variables to slots on the stack in a
way that would conserve our stack space as much as possible. It isn't
too hard to implement, but if we free lexical variables aggressively,
it will make debugging harder because local variable information won't
be around. We could provide an option to turn smart allocation off,
and choose this option at the REPL. However, if we implement a ribcage
separate from the Scheme stack, then we can allocate as aggressively
as we want because the debugging information will still be there when
we need it.

2) Backtraces for tail calls.

In order to give useful backtraces for tail calls, we need to record
information about them. We could accomplish this equally well with
either of the two implementation strategies.

3) Backtraces from the evaluator.

Ideally, I'd like backtraces from the evaluator not to show the
evaluator's own functions, unless the user asks for them. That would
make backtraces the same in evaluated and compiled code, and would
also be easier for the user to understand. If we choose a separate
ribcage, I had thought of having the evaluator write to its own
ribcage structure, which would be separate from the standard one. The
standard one would still be around, but it would only appear on
request. If we choose to make the ribcage part of the Scheme stack,
the other solution is to let the evaluator provide a filter to the
frames on the stack, to make its function calls appear different than
they really are, and let the user optionally remove this filter if
they want to debug the evaluator itself.

It seems to me that it's best for the ribcage to be part of the Scheme
stack. Issues 2 and 3 can be addressed reasonably  well in either
implementation. Issue 1 is easier if the ribcage is separate, but if
you're recording information about every variable separately, then
you're still using extra storage space no matter how cleverly the
regular Scheme stack is allocated, so you might as well just use naive
allocation in the regular stack. The one thing that makes me hesitate
is that this probably means reserving an extra word in every stack
frame (to hold the rib), even if that stack frame is internal to Guile
and not meant to be debuggable. Also, I think that MIT Scheme uses a
separate ribcage, and they have a working implementation of this.

What does everyone else think?
Noah



Re: wip-cse

2012-04-25 Thread Andy Wingo
On Wed 25 Apr 2012 16:10, l...@gnu.org (Ludovic Courtès) writes:

>> I don't think it's useful to run srfi-1 and vlist tests the same number
>> of times when their complexity varies, as in the assoc case.
>
> Right, and the ‘vhash’ bench is too short to draw any sort of conclusion.
>
> Could you try with an appropriate value of ‘--iteration-factor’?

What is an appropriate value?  If I did 100, for example, then the
srfi-1 alist test would take 350 seconds.

Why not take this opportunity to adjust the iterations specified in the
benchmarks, as I proposed in my mail "our benchmarking suite"?

We could also make the benchmarking suite automatically set the
iteration-factor to an appropriate value based on the calibration.
WDYT?

Andy
-- 
http://wingolog.org/



Re: our benchmark-suite

2012-04-25 Thread Ludovic Courtès
Hi Andy!

Andy Wingo  skribis:

> For what it's worth, the current overhead of the benchmark appears to be
> about 35 microseconds per iteration, on my laptop.  If we inline the
> iteration into the benchmark itself, rather than calling a thunk
> repeatedly, we can bring that down to around 13 microseconds.

There are a few benchmarks doing it already.  See, for instance,
‘repeat’ in ‘arithmetic.bm’.

> So, those are the problems: benchmarks running for inappropriate,
> inconsistent durations;

I don’t really see such a problem.  It doesn’t matter to me if
‘arithmetic.bm’ takes 2mn while ‘vlists.bm’ takes 40s, since I’m not
comparing them.

> inappropriate benchmarks;

I agree that things like ‘if.bm’ are not very relevant now.  But there
are also appropriate benchmarks, and benchmarks are always better than
wild guess.  ;-)

> and benchmarks being optimized out.

That should be fixed.

> My proposal is to rebase the iteration count in 0-reference.bm to run
> for 0.5s on some modern machine, and adjust all benchmarks to match,
> removing those benchmarks that do not measure anything useful.

Sounds good.  However, adjusting iteration counts of the benchmarks
themselves should be done rarely, as it breaks performance tracking like
.

> Finally we should perhaps enable automatic scaling of the iteration
> count.  What do folks think about that?
>
> On the positive side, all of our benchmarks are very clear that they are
> a time per number of iterations, and so this change should not affect
> users that measure time per iteration.

If the reported time is divided by the global iteration count, then
automatic scaling of the global iteration count would be good, yes.

Thanks,
Ludo’.




Re: Problems with compilation on Trisquel 5.5

2012-04-25 Thread Ludovic Courtès
Hi,

Sjoerd van Leent  skribis:

> When compiling I get to the generation stage of GEN guile-procedures.texi.
> The stage ends up with a Segmentation fault. I have been debugging the
> lt_guile process with gdb, and some interesting things happened. The
> process received SIGPWR and SIGXCPU signals (in a loop),

These signals are used by libgc.  In GDB, you should type:

  handle SIGPWR nostop noprint pass
  handle SIGXCPU nostop noprint pass

> and eventually ends up with a SIGSEGV message from the procedure:
> GC_generic_malloc_inner.

Could you show the backtrace?

This is with libgc 7.1, right?

Thanks,
Ludo’.