> On 29-12-11 10:02, Marijn wrote: > >>> What I found was that it is much slower to treat a string as a > >>> port and then read-char from that then it is to directly index > >>> the string. > > > >> That string input ports are often noticeably slower than string > >> indexing is one of the banes of my existence. Most reading and > >> parsing operations you implement, you want to work on both ports > >> and strings. But, if you first write a procedure that works on a > >> port, and then write a wrapper procedure that works on a string > >> (by doing an "open-input-string" and calling your procedure that > >> works on ports), the string one can be noticeably slower than if > >> you'd handwritten the string one. But having to write two > >> separate procedures has big development cost, and I always just > >> take the performance hit on strings instead, or write a string > >> procedure and then not have a port procedure when I need it > >> later. One approach that might help is to design a macro that > >> lets people define processing on strings and ports, and expands > >> to produce two closure definitions -- one that works on ports, > >> and one on strings, and avoids a lot of port-related overhead in > >> the string one. > > > > Matthew, any comments on this? Is there a fundamental reason that > > treating a string as a port is so much slower than direct indexing > > or is there something that can be done about it? Or should we look > > into automatically duplicating code with macros?
The biggest overhead in port access compared to byte/string access comes from maintaining the port state, an indirection to support different kinds of ports, and, in the case of comparing strings, potentially decoding characters from UTF-8. Another effect (equally large for byte strings) is that `bytes-ref' and `string-ref' are JIT-inlined, while `read-byte' and `read-char' are not. See the example program below. The `port-like' variants demonstrate the work that `read-byte' and `read-char' actually do in the simple case that line-counting is disabled, no UTF-8 decoding turns out to be needed, etc. On large strings: 'bytes cpu time: 78 real time: 77 gc time: 0 'bytes/no-inline cpu time: 220 real time: 220 gc time: 0 'bytes-port-like cpu time: 430 real time: 429 gc time: 0 'bytes-port cpu time: 373 real time: 372 gc time: 0 'string cpu time: 81 real time: 81 gc time: 0 'string/no-inline cpu time: 201 real time: 201 gc time: 0 'string-port-like cpu time: 560 real time: 560 gc time: 17 'string-port cpu time: 524 real time: 523 gc time: 1 The difference in the last case was bigger prior to v5.2, where we cut the overhead of maintaining port state roughly in half. That improvement is reflected in the `simple-ok?' field of a `port-like' structure that guards the simple case. Currently, I don't see how to improve that further. As strings get smaller, then the cost of creating a port becomes more important; in the case of strings, that creation involves a conversion of the string to UTF-8. If I shift 3 zeros from N to M in the example program (strings 1/1000th of the old size, 1000 times as many iterations): 'bytes cpu time: 77 real time: 78 gc time: 0 'bytes/no-inline cpu time: 226 real time: 225 gc time: 0 'bytes-port-like cpu time: 551 real time: 552 gc time: 21 'bytes-port cpu time: 670 real time: 672 gc time: 21 'string cpu time: 101 real time: 102 gc time: 0 'string/no-inline cpu time: 258 real time: 258 gc time: 0 'string-port-like cpu time: 759 real time: 760 gc time: 8 'string-port cpu time: 902 real time: 904 gc time: 22 I think the `port-like' variants fare better here because the actual implementation has a more complex structure to set up each time. I found one minor improvement (around 10%) that's reflected above, but I don't see other easy improvements. ---------------------------------------- #lang racket (define N 10000) (define bstr (bytes->immutable-bytes (make-bytes N (char->integer #\x)))) (define str (make-string N #\x)) (define M 1000) 'bytes (time (for ([i (in-range M)]) (for ([j (in-range N)]) (bytes-ref bstr j)))) 'bytes/no-inline (define bytes-ref/not-inlined #f) (set! bytes-ref/not-inlined bytes-ref) (time (for ([i (in-range M)]) (for ([j (in-range N)]) (bytes-ref/not-inlined bstr j)))) 'bytes-port-like (struct port-like (proc [simple-ok? #:mutable])) (define (make-port-like-bytes bstr) (define pos 0) (port-like (lambda () (let ([b (bytes-ref bstr pos)]) (set! pos (add1 pos)) b)) #t)) (define (like-read-byte pl) (if (port-like? pl) (if (port-like-simple-ok? pl) ((port-like-proc pl)) (error "not implemented")) (error "bad"))) (time (for ([i (in-range M)]) (define p (make-port-like-bytes bstr)) (for ([j (in-range N)]) (like-read-byte p)))) 'bytes-port (time (for ([i (in-range M)]) (define p (open-input-bytes bstr)) (for ([j (in-range N)]) (read-byte p)))) 'string (time (for ([i (in-range M)]) (for ([j (in-range N)]) (string-ref str j)))) 'string/no-inline (define string-ref/not-inlined #f) (set! string-ref/not-inlined string-ref) (time (for ([i (in-range M)]) (for ([j (in-range N)]) (string-ref/not-inlined str j)))) 'string-port-like (define (make-port-like-string str) (define pos 0) (define bstr (string->bytes/utf-8 str)) (port-like (lambda () (let ([b (bytes-ref bstr pos)]) (if (b . > . 127) (error "complicated case not implemented") (begin (set! pos (add1 pos)) (integer->char b))))) #t)) (define (like-read-char pl) (if (port-like? pl) (if (port-like-simple-ok? pl) ((port-like-proc pl)) (error "not implemented")) (error "bad"))) (time (for ([i (in-range M)]) (define p (make-port-like-string str)) (for ([j (in-range N)]) (like-read-char p)))) 'string-port (time (for ([i (in-range M)]) (define p (open-input-string str)) (for ([j (in-range N)]) (read-char p)))) ____________________ Racket Users list: http://lists.racket-lang.org/users