[PATCH] Move slow path out of 'scm_get_byte_or_eof' et al
Andy suggested this. Okay to push to stable-2.0? Mark >From 0b70eba61ee26654f442199e7cc49838c5b0030e Mon Sep 17 00:00:00 2001 From: Mark H Weaver Date: Tue, 2 Apr 2013 05:33:24 -0400 Subject: [PATCH] Move slow path out of 'scm_get_byte_or_eof' et al. Suggested by Andy Wingo. * libguile/inline.h (scm_get_byte_or_eof, scm_peek_byte_or_eof): Keep only the fast path here, with fallback to 'scm_i_get_byte_or_eof' and 'scm_i_peek_byte_or_eof'. * libguile/ports.c (scm_i_get_byte_or_eof, scm_i_peek_byte_or_eof): New internal functions. * libguile/ports.h (scm_i_get_byte_or_eof, scm_i_peek_byte_or_eof): Add prototypes. --- libguile/inline.h | 44 ++-- libguile/ports.c | 41 + libguile/ports.h |2 ++ 3 files changed, 53 insertions(+), 34 deletions(-) diff --git a/libguile/inline.h b/libguile/inline.h index 88ba7f7..17d8a0c 100644 --- a/libguile/inline.h +++ b/libguile/inline.h @@ -96,50 +96,26 @@ scm_is_string (SCM x) SCM_INLINE_IMPLEMENTATION int scm_get_byte_or_eof (SCM port) { - int c; scm_t_port *pt = SCM_PTAB_ENTRY (port); - if (pt->rw_active == SCM_PORT_WRITE) -/* may be marginally faster than calling scm_flush. */ -scm_ptobs[SCM_PTOBNUM (port)].flush (port); - - if (pt->rw_random) -pt->rw_active = SCM_PORT_READ; - - if (pt->read_pos >= pt->read_end) -{ - if (SCM_UNLIKELY (scm_fill_input (port) == EOF)) - return EOF; -} - - c = *(pt->read_pos++); - - return c; + if (SCM_LIKELY ((pt->rw_active == SCM_PORT_READ || !pt->rw_random) + && pt->read_pos < pt->read_end)) +return *pt->read_pos++; + else +return scm_i_get_byte_or_eof (port); } /* Like `scm_get_byte_or_eof' but does not change PORT's `read_pos'. */ SCM_INLINE_IMPLEMENTATION int scm_peek_byte_or_eof (SCM port) { - int c; scm_t_port *pt = SCM_PTAB_ENTRY (port); - if (pt->rw_active == SCM_PORT_WRITE) -/* may be marginally faster than calling scm_flush. */ -scm_ptobs[SCM_PTOBNUM (port)].flush (port); - - if (pt->rw_random) -pt->rw_active = SCM_PORT_READ; - - if (pt->read_pos >= pt->read_end) -{ - if (SCM_UNLIKELY (scm_fill_input (port) == EOF)) - return EOF; -} - - c = *pt->read_pos; - - return c; + if (SCM_LIKELY ((pt->rw_active == SCM_PORT_READ || !pt->rw_random) + && pt->read_pos < pt->read_end)) +return *pt->read_pos; + else +return scm_i_peek_byte_or_eof (port); } SCM_INLINE_IMPLEMENTATION void diff --git a/libguile/ports.c b/libguile/ports.c index becdbed..76ec83f 100644 --- a/libguile/ports.c +++ b/libguile/ports.c @@ -1439,6 +1439,47 @@ scm_fill_input (SCM port) return scm_ptobs[SCM_PTOBNUM (port)].fill_input (port); } +/* Slow-path fallback for 'scm_get_byte_or_eof' in inline.h */ +int +scm_i_get_byte_or_eof (SCM port) +{ + scm_t_port *pt = SCM_PTAB_ENTRY (port); + + if (pt->rw_active == SCM_PORT_WRITE) +scm_flush (port); + + if (pt->rw_random) +pt->rw_active = SCM_PORT_READ; + + if (pt->read_pos >= pt->read_end) +{ + if (SCM_UNLIKELY (scm_fill_input (port) == EOF)) + return EOF; +} + + return *pt->read_pos++; +} + +/* Slow-path fallback for 'scm_peek_byte_or_eof' in inline.h */ +int +scm_i_peek_byte_or_eof (SCM port) +{ + scm_t_port *pt = SCM_PTAB_ENTRY (port); + + if (pt->rw_active == SCM_PORT_WRITE) +scm_flush (port); + + if (pt->rw_random) +pt->rw_active = SCM_PORT_READ; + + if (pt->read_pos >= pt->read_end) +{ + if (SCM_UNLIKELY (scm_fill_input (port) == EOF)) + return EOF; +} + + return *pt->read_pos; +} /* scm_lfwrite * diff --git a/libguile/ports.h b/libguile/ports.h index 53d5081..54bf595 100644 --- a/libguile/ports.h +++ b/libguile/ports.h @@ -328,6 +328,8 @@ scm_i_default_port_conversion_handler (void); /* Use HANDLER as the default conversion strategy for future ports. */ SCM_INTERNAL void scm_i_set_default_port_conversion_handler (scm_t_string_failed_conversion_handler); +SCM_INTERNAL int scm_i_get_byte_or_eof (SCM port); +SCM_INTERNAL int scm_i_peek_byte_or_eof (SCM port); SCM_API SCM scm_port_conversion_strategy (SCM port); SCM_API SCM scm_set_port_conversion_strategy_x (SCM port, SCM behavior); -- 1.7.10.4
Re: array-copy! is slow & array-map.c (was: Extremly slow for format & string-join)
On Apr 1, 2013, at 19:15, Daniel Llorens wrote: > Third, none of the above are causing the slowness of array-copy!. I noticed > that there's a double indirection in racp(). The second patch removes it. > Actually this double indirection goes on all over array-map.c and I don't > understand why it's needed... This patch does the same for array-fill!. The improvement is similar. Before: scheme@(guile-user)> (define a (make-typed-array 'a *unspecified* 100 10)) scheme@(guile-user)> ,time (array-fill! a #\v) ;; 1.211000s real time, 1.20s run time. 0.00s spent in GC. After: scheme@(guile-user)> ,time (array-fill! a #\v) ;; 0.855000s real time, 0.85s run time. 0.00s spent in GC. Regards Daniel 0004-Remove-double-indirection-in-array-fill.patch Description: Binary data
Re: [PATCH] Move slow path out of 'scm_get_byte_or_eof' et al
> From: Mark H Weaver > >Andy suggested this. Okay to push to stable-2.0? Hi Mark, I can't imagine any possible way that this patch could make performance worse, but, it is in the very core of the ports. I don't think you can get away without at least a bit of profiling. -Mike
Re: array-copy! is slow & array-map.c (was: Extremly slow for format & string-join)
On Apr 2, 2013, at 12:19, Daniel Llorens wrote: > On Apr 1, 2013, at 19:15, Daniel Llorens wrote: > >> Third, none of the above are causing the slowness of array-copy!. I noticed >> that there's a double indirection in racp(). The second patch removes it. >> Actually this double indirection goes on all over array-map.c and I don't >> understand why it's needed... > > This patch does the same for array-fill!. The improvement is similar. These two patches do it for array-map!. The first patch avoids cons for the 1-argument case. The second patch removes the double indirection for the first two arguments in every case. Since there's some work done inside the loop, the improvement is smaller than for array-copy! or array-fill!. Before the first patch: scheme@(guile-user)> (define a (make-array 0. 100 10)) scheme@(guile-user)> (define b (make-array 0. 100 10)) scheme@(guile-user)> (define c (make-array *unspecified* 100 10)) scheme@(guile-user)> (define d (transpose-array (make-array *unspecified* 10 100) 1 0)) scheme@(guile-user)> ,time (array-map! c (const 0.)) ;; 0.558498s real time, 0.556974s run time. 0.00s spent in GC. scheme@(guile-user)> ,time (array-map! c values a) ;; 1.467717s real time, 1.463470s run time. 0.269786s spent in GC. scheme@(guile-user)> ,time (array-map! c values d) ;; 1.500785s real time, 1.496482s run time. 0.254156s spent in GC. scheme@(guile-user)> ,time (array-map! c (lambda (a b) (+ a b)) a b) ;; 2.278655s real time, 2.271948s run time. 0.528379s spent in GC. After the first patch: scheme@(guile-user)> (define a (make-array 0. 100 10)) scheme@(guile-user)> (define b (make-array 0. 100 10)) scheme@(guile-user)> (define c (make-array *unspecified* 100 10)) scheme@(guile-user)> (define d (transpose-array (make-array *unspecified* 10 100) 1 0)) scheme@(guile-user)> ,time (array-map! c (const 0.)) ;; 0.559337s real time, 0.557811s run time. 0.00s spent in GC. scheme@(guile-user)> ,time (array-map! c values a) ;; 1.038836s real time, 1.035859s run time. 0.134621s spent in GC. scheme@(guile-user)> ,time (array-map! c values d) ;; 1.041937s real time, 1.039000s run time. 0.125787s spent in GC. scheme@(guile-user)> ,time (array-map! c (lambda (a b) (+ a b)) a b) ;; 2.294594s real time, 2.287883s run time. 0.518576s spent in GC. After both patches: scheme@(guile-user)> (define a (make-array 0. 100 10)) scheme@(guile-user)> (define b (make-array 0. 100 10)) scheme@(guile-user)> (define c (make-array *unspecified* 100 10)) scheme@(guile-user)> (define d (transpose-array (make-array *unspecified* 10 100) 1 0)) scheme@(guile-user)> ,time (array-map! c (const 0.)) ;; 0.467621s real time, 0.466343s run time. 0.00s spent in GC. scheme@(guile-user)> ,time (array-map! c values a) ;; 0.823515s real time, 0.821152s run time. 0.135106s spent in GC. scheme@(guile-user)> ,time (array-map! c values d) ;; 0.825138s real time, 0.822763s run time. 0.124970s spent in GC. scheme@(guile-user)> ,time (array-map! c (lambda (a b) (+ a b)) a b) ;; 2.066735s real time, 2.060635s run time. 0.525089s spent in GC. 0005-Avoid-per-element-cons-for-1-arg-case-of-array-map.patch Description: Binary data 0006-Remove-double-indirection-in-array-map-with-2-args.patch Description: Binary data
Re: array-copy! is slow & array-map.c (was: Extremly slow for format & string-join)
On Apr 2, 2013, at 12:19, Daniel Llorens wrote: > On Apr 1, 2013, at 19:15, Daniel Llorens wrote: > >> Third, none of the above are causing the slowness of array-copy!. I noticed >> that there's a double indirection in racp(). The second patch removes it. >> Actually this double indirection goes on all over array-map.c and I don't >> understand why it's needed... > > This patch does the same for array-fill!. The improvement is similar. These two patches do it for array-map!. The first patch avoids cons for the 1-argument case. The second patch removes the double indirection for the first two arguments in every case. Since there's some work done inside the loop, the improvement is smaller than for array-copy! or array-fill!. Before the first patch: scheme@(guile-user)> (define a (make-array 0. 100 10)) scheme@(guile-user)> (define b (make-array 0. 100 10)) scheme@(guile-user)> (define c (make-array *unspecified* 100 10)) scheme@(guile-user)> (define d (transpose-array (make-array *unspecified* 10 100) 1 0)) scheme@(guile-user)> ,time (array-map! c (const 0.)) ;; 0.558498s real time, 0.556974s run time. 0.00s spent in GC. scheme@(guile-user)> ,time (array-map! c values a) ;; 1.467717s real time, 1.463470s run time. 0.269786s spent in GC. scheme@(guile-user)> ,time (array-map! c values d) ;; 1.500785s real time, 1.496482s run time. 0.254156s spent in GC. scheme@(guile-user)> ,time (array-map! c (lambda (a b) (+ a b)) a b) ;; 2.278655s real time, 2.271948s run time. 0.528379s spent in GC. After the first patch: scheme@(guile-user)> (define a (make-array 0. 100 10)) scheme@(guile-user)> (define b (make-array 0. 100 10)) scheme@(guile-user)> (define c (make-array *unspecified* 100 10)) scheme@(guile-user)> (define d (transpose-array (make-array *unspecified* 10 100) 1 0)) scheme@(guile-user)> ,time (array-map! c (const 0.)) ;; 0.559337s real time, 0.557811s run time. 0.00s spent in GC. scheme@(guile-user)> ,time (array-map! c values a) ;; 1.038836s real time, 1.035859s run time. 0.134621s spent in GC. scheme@(guile-user)> ,time (array-map! c values d) ;; 1.041937s real time, 1.039000s run time. 0.125787s spent in GC. scheme@(guile-user)> ,time (array-map! c (lambda (a b) (+ a b)) a b) ;; 2.294594s real time, 2.287883s run time. 0.518576s spent in GC. After both patches: scheme@(guile-user)> (define a (make-array 0. 100 10)) scheme@(guile-user)> (define b (make-array 0. 100 10)) scheme@(guile-user)> (define c (make-array *unspecified* 100 10)) scheme@(guile-user)> (define d (transpose-array (make-array *unspecified* 10 100) 1 0)) scheme@(guile-user)> ,time (array-map! c (const 0.)) ;; 0.467621s real time, 0.466343s run time. 0.00s spent in GC. scheme@(guile-user)> ,time (array-map! c values a) ;; 0.823515s real time, 0.821152s run time. 0.135106s spent in GC. scheme@(guile-user)> ,time (array-map! c values d) ;; 0.825138s real time, 0.822763s run time. 0.124970s spent in GC. scheme@(guile-user)> ,time (array-map! c (lambda (a b) (+ a b)) a b) ;; 2.066735s real time, 2.060635s run time. 0.525089s spent in GC. 0005-Avoid-per-element-cons-for-1-arg-case-of-array-map.patch Description: Binary data 0006-Remove-double-indirection-in-array-map-with-2-args.patch Description: Binary data
Re: array-copy! is slow & array-map.c (was: Extremly slow for format & string-join)
This patch is for array-for-each. Only the first argument. Before: scheme@(guile-user)> (define a (make-array 1 100 10)) scheme@(guile-user)> ,time (let ((x 0)) (array-for-each (lambda (a) (set! x (+ a x))) a) x) $8 = 1000 ;; 0.621887s real time, 0.620181s run time. 0.00s spent in GC. After: scheme@(guile-user)> (define a (make-array 1 100 10)) scheme@(guile-user)> ,time (let ((x 0)) (array-for-each (lambda (a) (set! x (+ a x))) a) x) $2 = 1000 ;; 0.529748s real time, 0.528301s run time. 0.00s spent in GC. Regards, Daniel 0007-Remove-double-indirection-for-1st-arg-of-array-for-e.patch Description: Binary data
Re: array-copy! is slow & array-map.c (was: Extremly slow for format & string-join)
Hello, sorry for the repeated message earlier. Ok, i think it's clear that all these double indirections can be eliminated globally by composing the strides of the array descriptor with the stride of SCM_I_ARRAY_V (array) and using that directly in the array descriptor. Is there any reason not to do this? Regards, Daniel
Re: Extremly slow for format & string-join
Mark H Weaver skribis: > Indeed, the implementation of 'string-join' was very bad: about O(n^2) [...] > Before: > > scheme@(guile-user)> ,time (define s (string-join (make-list 1 "test") > "-")) > ;; 0.998800s real time, 0.996677s run time. 0.984885s spent in GC. > scheme@(guile-user)> ,time (define s (string-join (make-list 10 "test") > "-")) > ;; 98.006569s real time, 97.817077s run time. 97.795970s spent in GC. > > After: > > scheme@(guile-user)> ,time (define s (string-join (make-list 1 "test") > "-")) > ;; 0.006362s real time, 0.006351s run time. 0.00s spent in GC. > scheme@(guile-user)> ,time (define s (string-join (make-list 10 "test") > "-")) > ;; 0.028513s real time, 0.028457s run time. 0.022235s spent in GC. > scheme@(guile-user)> ,time (define s (string-join (make-list 100 "test") > "-")) > ;; 0.303098s real time, 0.302543s run time. 0.289639s spent in GC. > scheme@(guile-user)> ,time (define s (string-join (make-list 1000 "test") > "-")) > ;; 3.288105s real time, 3.281922s run time. 3.174460s spent in GC. Nice! Ludo’.
Re: [PATCH] Move slow path out of 'scm_get_byte_or_eof' et al
Mike Gran skribis: >> From: Mark H Weaver >> >>Andy suggested this. Okay to push to stable-2.0? > > Hi Mark, > > I can't imagine any possible way that this patch could make > performance worse, but, it is in the very core of the ports. > > I don't think you can get away without at least a bit of > profiling. Yes, it’s most likely the Right Thing, but figures to confirm this would be nice. Ludo’.
Re: [PATCH] Move slow path out of 'scm_get_byte_or_eof' et al
Mike Gran writes: > I can't imagine any possible way that this patch could make > performance worse, but, it is in the very core of the ports. > > I don't think you can get away without at least a bit of > profiling. Fair enough. First of all, this patch reduces the size of the built libguile.so by about 42 kilobytes, and libguile.a by about 96 kilobytes. As for performance: the updated patches (attached below) slows things down by about one quarter of one percent on my machine. The specific benchmark I did was to call 'read-string' on an 11 megabyte ASCII text file, with the port-encoding set to UTF-8. In this case, all the reads are done using 'scm_get_byte_or_eof' (called from 'get_utf8_codepoint'). So I think this is an acceptable cost for such a great reduction in code size. What do you think? Thanks for nudging me to do the measurement. To be honest, the original patch I posted slowed things down by 4.5%, which I found extremely surprising. I fixed it by adding an internal 'static' version of 'scm_fill_input'. So for those who doubt the cost of function calls though the shared library PLT (which, within a shared library, is *every* function call to a non-static function), let this be a lesson. That's the only thing that changed here, and it made more than a 4% difference, even though the procedure call in question was done only once per 4 kilobyte buffer! Here's the raw data: before: 12767060 libguile-2.0.a 6999271 libguile-2.0.so.22.6.0 after: 12671162 libguile-2.0.a (96 kbytes smaller) 6956989 libguile-2.0.so.22.6.0 (42 kbytes smaller) test file (the contents of 'psyntax.scm' repeated many times): -rw-r--r-- 1 mhw mhw 11503780 Apr 2 12:46 /home/mhw/BIG-FILE before: scheme@(guile-user)> ,time (define s (call-with-input-file "/home/mhw/BIG-FILE" read-string)) ;; 2.953214s real time, 2.951423s run time. 0.027767s spent in GC. scheme@(guile-user)> ,time (define s (call-with-input-file "/home/mhw/BIG-FILE" read-string)) ;; 2.942170s real time, 2.940581s run time. 0.011873s spent in GC. scheme@(guile-user)> ,time (define s (call-with-input-file "/home/mhw/BIG-FILE" read-string)) ;; 2.954309s real time, 2.950164s run time. 0.011952s spent in GC. avg: 2.94738936 after: scheme@(guile-user)> ,time (define s (call-with-input-file "/home/mhw/BIG-FILE" read-string)) ;; 2.964777s real time, 2.957597s run time. 0.012116s spent in GC. scheme@(guile-user)> ,time (define s (call-with-input-file "/home/mhw/BIG-FILE" read-string)) ;; 2.971099s real time, 2.968656s run time. 0.012020s spent in GC. scheme@(guile-user)> ,time (define s (call-with-input-file "/home/mhw/BIG-FILE" read-string)) ;; 2.942377s real time, 2.940015s run time. 0.012085s spent in GC. avg: 2.95542263 0.27% slower I've attached the new patches. Okay to push now? Thanks, Mark >From 187fa0b9e7ff9b2d6204517a9daa9009245c7511 Mon Sep 17 00:00:00 2001 From: Mark H Weaver Date: Tue, 2 Apr 2013 13:33:14 -0400 Subject: [PATCH 1/2] Add a static version of 'scm_fill_input' to ports.c. * libguile/ports.c (scm_i_fill_input): New static function, containing the code that was previously in 'scm_fill_input'. (scm_fill_input): Simply call 'scm_i_fill_input'. (scm_c_read): Use 'scm_i_fill_input'. --- libguile/ports.c | 24 +++- 1 file changed, 15 insertions(+), 9 deletions(-) diff --git a/libguile/ports.c b/libguile/ports.c index becdbed..13a993e 100644 --- a/libguile/ports.c +++ b/libguile/ports.c @@ -1419,8 +1419,8 @@ scm_getc (SCM port) /* this should only be called when the read buffer is empty. it tries to refill the read buffer. it returns the first char from the port, which is either EOF or *(pt->read_pos). */ -int -scm_fill_input (SCM port) +static int +scm_i_fill_input (SCM port) { scm_t_port *pt = SCM_PTAB_ENTRY (port); @@ -1439,6 +1439,12 @@ scm_fill_input (SCM port) return scm_ptobs[SCM_PTOBNUM (port)].fill_input (port); } +int +scm_fill_input (SCM port) +{ + return scm_i_fill_input (port); +} + /* scm_lfwrite * @@ -1547,8 +1553,8 @@ scm_c_read (SCM port, void *buffer, size_t size) if (size == 0) return n_read; - /* Now we will call scm_fill_input repeatedly until we have read the - requested number of bytes. (Note that a single scm_fill_input + /* Now we will call scm_i_fill_input repeatedly until we have read the + requested number of bytes. (Note that a single scm_i_fill_input call does not guarantee to fill the whole of the port's read buffer.) */ if (pt->read_buf_size <= 1 && pt->encoding == NULL) @@ -1556,12 +1562,12 @@ scm_c_read (SCM port, void *buffer, size_t size) /* The port that we are reading from is unbuffered - i.e. does not have its own persistent buffer - but we have a buffer, provided by our caller, that is the right size for the data - that is wanted. For the following scm_fill_input calls, + that is wanted. For the following scm_i_
Re: [PATCH] Move slow path out of 'scm_get_byte_or_eof' et al
> From: Mark H Weaver > Thanks for nudging me to do the measurement. To be honest, the original > patch I posted slowed things down by 4.5%, which I found extremely > surprising. I fixed it by adding an internal 'static' version of > 'scm_fill_input'. So for those who doubt the cost of function calls > though the shared library PLT (which, within a shared library, is > *every* function call to a non-static function), let this be a lesson. > That's the only thing that changed here, and it made more than a 4% > difference, even though the procedure call in question was done only > once per 4 kilobyte buffer! > I've attached the new patches. Okay to push now? I don't think I get a vote, but, I vote yes anyway. -Mike