-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 27-12-11 18:59, Sam Phillips wrote: > On Tue, Dec 27, 2011 at 09:55, Marijn <hk...@gentoo.org> wrote: >> Sometimes I wish for python-like string-strip to be a standard >> function. For example when I want to parse a list of numbers from >> a text-field, I could get the string value of the text-field, >> strip space from the beginning and end, split on whitespace, map >> string->number over it and check the result with `every'. >> >> Is there a better way? If not then perhaps some form of >> string-strip could be added (perhaps under regexp-strip name?) >> since it isn't exactly extremely simple unless you're a regexp >> wizard. > > string-trim-both in srfi-13 does that.
Hi Sam, now that you remind me of it I do recall that I figured this same thing out a long time ago. Thanks for that! Perhaps the docs could put a cross-reference under string-split for srfi-13's string-trim-both. > http://docs.racket-lang.org/srfi-std/srfi-13.html#string-trim > > Cheers, Sam On 27-12-11 20:56, Neil Van Dyke wrote: > Tangential comment: These string operations are convenient to have > around, and they can be expedient, but composing with them can make > for inefficient evaluation. For example, my first attempt to > implement "string->number" (after briefly considering just using > "read"), was 17 times faster in a shootout on my computer, compared > to the one that was built using "string-strip" and other string > operations. (It also helped that I used Racket regexps rather than > pregexps.) > > (define (string->number-list/mk2 str) (let loop ((start 0)) (cond > ((regexp-match-positions #rx"^[ \t\r\n\f]*([0-9]+(?:[0-9]+)?)" str > start) => (lambda (m) (let* ((range (cadr m)) (end (cdr range))) > (cons (string->number (substring str (car range) (cdr range))) > (loop end))))) (else (if (regexp-match? #rx"^[ \t\r\n\f]*$" str > start) '() (error 'string->number-list "could not parse string ~S > at start position ~S" str start)))))) > > If I was writing performance-sensitive code, I would also see > whether a procedure that scanned a character at a time, and didn't > need "string->number", was faster. Thank you, Neil, that was quite inspirational. Your use of regexp-match-positions is instructive (I was never sure how to use that function and didn't consider it for this use case). I did not measure a noticeable difference between pregexp and regexp btw and I changed your first regexp to either #rx"^[ \t\r\n\f]*([1-9][0-9]*)" or #px"^[[:space:]]*([1-9][0-9]*)" which I think is equivalent. > As everyone knows, if the code is not performance-sensitive, do > whatever is easiest to develop in a correct and maintainable way. > But, when performance matters -- such as in code that takes a > significant percentage of time in serving a HTTP request, or is in > a PLaneT package that other people might use in > performance-sensitive context -- we also have to think about > algorithmic efficiency, as well as costs specific to the language > implementation. I don't think my use of this code is very performance, but I couldn't help myself, so I looked into making it faster according to your suggestion. What I found was that it is much slower to treat a string as a port and then read-char from that then it is to directly index the string. Also peek-char seems to take almost as much time as read-char, so you're basically reading everything twice since there is no such thing as putting back a char into a port (doesn't C/C++ have that?). Also while for/fold is extremely nice, it isn't as efficient as a named-let in the cases where both work. I found that out when my code was still collecting chars into a list and later walking that list to construct a number. Of course its better to construct the number straight away, so you don't have to cons that list at all. In the end I was able to construct code that is another factor 5.5 faster than your version: (define (string->number-list/mk5 str) (define length (string-length str)) (define (match-unsigned-integer str pos) (define (match-digits result pos) (if (< pos length) (let* ((char (string-ref str pos)) (pos (+ pos 1))) (case char ((#\0) (match-digits (* 10 result) pos)) ((#\1) (match-digits (+ (* 10 result) 1) pos)) ((#\2) (match-digits (+ (* 10 result) 2) pos)) ((#\3) (match-digits (+ (* 10 result) 3) pos)) ((#\4) (match-digits (+ (* 10 result) 4) pos)) ((#\5) (match-digits (+ (* 10 result) 5) pos)) ((#\6) (match-digits (+ (* 10 result) 6) pos)) ((#\7) (match-digits (+ (* 10 result) 7) pos)) ((#\8) (match-digits (+ (* 10 result) 8) pos)) ((#\9) (match-digits (+ (* 10 result) 9) pos)) (else (values result pos)))) (values result pos))) (define (match-first-digit pos) (if (< pos length) (let* ((char (string-ref str pos)) (pos (+ pos 1))) (case char ((#\1) (match-digits 1 pos)) ((#\2) (match-digits 2 pos)) ((#\3) (match-digits 3 pos)) ((#\4) (match-digits 4 pos)) ((#\5) (match-digits 5 pos)) ((#\6) (match-digits 6 pos)) ((#\7) (match-digits 7 pos)) ((#\8) (match-digits 8 pos)) ((#\9) (match-digits 9 pos)) ((#\space #\tab #\return #\newline #\page) (match-first-digit pos)) (else (values #f pos)))) (values #f pos))) (match-first-digit pos)) (let loop ((pos 0) (result '())) (let-values (((unt pos) (match-unsigned-integer str pos))) (if unt (loop pos (cons unt result)) (if (< pos length) #f (reverse result))))) ) Marijn -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk77S0EACgkQp/VmCx0OL2wOMwCfbEXpEd+sEXw33tCSYMRDWHii H8EAn3WKNBjxiqBuSKFfxMhg7w1US1s2 =I4Mo -----END PGP SIGNATURE----- ____________________ Racket Users list: http://lists.racket-lang.org/users