Re: wip-ports-refactor

Andy Wingo Sun, 17 Apr 2016 01:51:00 -0700

Hi :)

On Thu 14 Apr 2016 16:03, l...@gnu.org (Ludovic Courtès) writes:


> Andy Wingo <wi...@pobox.com> skribis:
>
>> I have been working on a refactor to ports.  The goal is to have a
>> better concurrency story.  Let me tell that story then get down to the
>> details.
>
> In addition to concurrency and thread-safety, I’m very much interested
> in the impact of this change on the API (I’ve always found the port API
> in C to be really bad), on the flexibility it would provide, and on
> performance—‘read-char’ and ‘get-u8’ are currently prohibitively slow!

Yeah.  Of course improving the port internals is technically a breaking
change, but I think probably the set of people that have implemented
ports using the C API can be counted on two hands, and I hope to find
everyone and help them adapt :)

>From the speed side, I think that considering read-char to be
prohibitively slow is an incorrect diagnosis.  First let's define a
helper:

    (define-syntax-rule (do-times n exp)
      (let lp ((i 0)) (let ((res exp)) (if (< i n) (lp (1+ i)) res))))

I want to test four things.

    ;; 1. How long a loop up to 10 million takes (baseline measurement).
    (let ((port (open-input-string "s"))) (do-times #e1e7 1))

    ;; 2. A call to a simple Scheme function.
    (define (foo port) 42)
    (let ((port (open-input-string "s"))) (do-times #e1e7 (foo port)))

    ;; 3. A call to a port subr.
    (let ((port (open-input-string "s"))) (do-times #e1e7 (port-line port)))

    ;; 4. A call to a port subr that touches the buffer.
    (let ((port (open-input-string "s"))) (do-times #e1e7 (peek-char port)))

The results:

                      | baseline | foo    | port-line | peek-char
    ------------------+----------+--------+-----------+----------
    guile 2.0         | 0.269s   | 0.845s | 1.067s    | 1.280s
    guile master      | 0.058s   | 0.224s | 0.225s    | 0.433s
    wip-port-refactor | 0.058s   | 0.220s | 0.226s    | 0.375s

These were single measurements at the REPL on my i7-5600U, run with
--no-debug.  The results were fairly consistent.  Note that because this
is a loop, Guile 2.2's compiler gets some "unfair" advantages related to
loop-invariant code motion and peeling; but real parsers etc written on
top of read-char will also have loops, so to a degree it's OK.

Conclusions:

  1. Guile 2.2 makes calling a subr just as cheap as calling a Scheme
     function.

  2. The overhead of using Guile 2.0 is much greater than the overhead
     of calling peek-char.

  3. peek-char is slower than other leaf functions in Guile 2.2 but only
     by 2x or so; I am sure it can be faster but I don't know by how
     much.  Consider that it has to:

       1. type-check the argument
       2. get the port buffer and cursors
       3. if there is enough data in the buffer to decode a char, do
          it.  otherwise, slow-path.

     If we consider implementing this in Scheme, it might get slower
     than it currently is in 2.2, because of the switch from C->C calls
     (internal to ports.c and other C files) to Scheme->Scheme calls,
     probably with some additional subr calls to get state from the
     port.  We might gain some of that back by removing the lock; dunno.

It would be nice to be able to decode chars from UTF-8 or ISO-8859-1
ports from Scheme.  But we always have to be able to call out to iconv
too.  Mark has mused on making the port buffer always UTF-8, but I don't
quite see how this could work.  I guess you could have a second port
buffer for decoded UTF-8 chars, but that starts to look quite
complicated to me.

Anyway.  I think that given the huge performance window opened up to us
by the 2.0->2.2 switch, we should consider speed considerations as
important but not primary -- when given a choice between speed and
maintainability, or speed and the ability to suspend a port, we
shouldn't choose speed.

That said, the real way to make port operations fast is (1) to buffer
the port, and (2) to operate on the buffer directly instead of fetching
data octet-by-octet.  Exposing the port buffer to Scheme allows this
kind of punch-through optimization to be implemented where needed.

Cheers,

Andy

Re: wip-ports-refactor

Reply via email to