Re: [racket-users] performance, json

WarGrey Gyoudmon Ju Fri, 22 Feb 2019 14:00:35 -0800

I have tried my best to find the "best practice" to do Racket IO.


Here are some tips I found in writing CSV reader:
https://github.com/wargrey/schema/blob/master/digitama/exchange/csv/reader/port.rkt
With a MacBook Pro 15, 2013, it takes 3.5s to read a 70MB file.

I agreed that `read-char` is the first choice, but `peek-char` may be slow
somehow.
Instead, just read the `peek`ing chars and pass it or them as the leading
ones to the parsing routine.
This strategy may require a re-design of your parsing workflow
since every subroutine should accept another input argument and return one
more value.


On Sat, Feb 23, 2019 at 5:34 AM Jon Zeppieri <zeppi...@gmail.com> wrote:

> On a related (but not too related) note: is there an efficient way to skip
> multiple bytes in an input stream? It looks like there are two choices:
>   - You can read the bytes you want to skip, but that implies either
> allocating a useless byte array or keeping one around for this very purpose.
>   - You can use (I think?) port-commit-peeked, bit given the API, it seems
> like that was designed with a particular (and more complicated) use in mind.
>
>
> On Fri, Feb 22, 2019 at 3:35 PM Matthew Flatt <mfl...@cs.utah.edu> wrote:
>
>> I think the bigger bottleneck is the main parsing loop, which uses
>> `regexp-try-match` even more. Although `regexp-try-match` is
>> convenient, it's much slower than using `peek-char` directly to check
>> for one character. I'll experiment with improvements there.
>>
>> At 22 Feb 2019 13:36:20 -0500, "'John Clements' via Racket Users" wrote:
>> > I’m not that surprised :).
>> >
>> > My guess is that our json reader could be sped up quite a bit. This
>> looks like
>> > the heart of the read-json implementation:
>> >
>> > (define (read-json* who i jsnull)
>> >   ;; Follows the specification (eg, at json.org) -- no extensions.
>> >   ;;
>> >   (define (err fmt . args)
>> >     (define-values [l c p] (port-next-location i))
>> >     (raise-read-error (format "~a: ~a" who (apply format fmt args))
>> >                       (object-name i) l c p #f))
>> >   (define (skip-whitespace) (regexp-match? #px#"^\\s*" i))
>> >   ;;
>> >   ;; Reading a string *could* have been nearly trivial using the racket
>> >   ;; reader, except that it won't handle a "\/"...
>> >   (define (read-string)
>> >     (define result (open-output-bytes))
>> >     (let loop ()
>> >       (define esc
>> >         (let loop ()
>> >           (define c (read-byte i))
>> >           (cond
>> >             [(eof-object? c) (err "unterminated string")]
>> >             [(= c 34) #f]               ;; 34 = "
>> >             [(= c 92) (read-bytes 1 i)] ;; 92 = \
>> >             [else (write-byte c result) (loop)])))
>> >       (cond
>> >         [(not esc) (bytes->string/utf-8 (get-output-bytes result))]
>> >         [(case esc
>> >            [(#"b") #"\b"]
>> >            [(#"n") #"\n"]
>> >            [(#"r") #"\r"]
>> >            [(#"f") #"\f"]
>> >            [(#"t") #"\t"]
>> >            [(#"\\") #"\\"]
>> >            [(#"\"") #"\""]
>> >            [(#"/") #"/"]
>> >            [else #f])
>> >          => (λ (m) (write-bytes m result) (loop))]
>> >         [(equal? esc #"u")
>> >          (let* ([e (or (regexp-try-match #px#"^[a-fA-F0-9]{4}" i)
>> >                        (err "bad string \\u escape"))]
>> >                 [e (string->number (bytes->string/utf-8 (car e)) 16)])
>> >            (define e*
>> >              (if (<= #xD800 e #xDFFF)
>> >                  ;; it's the first part of a UTF-16 surrogate pair
>> >                  (let* ([e2 (or (regexp-try-match
>> #px#"^\\\\u([a-fA-F0-9]{4})"
>> > i)
>> >                                 (err "bad string \\u escape, ~a"
>> >                                      "missing second half of a UTF16
>> pair"))]
>> >                         [e2 (string->number (bytes->string/utf-8 (cadr
>> e2))
>> > 16)])
>> >                    (if (<= #xDC00 e2 #xDFFF)
>> >                        (+ (arithmetic-shift (- e #xD800) 10) (- e2
>> #xDC00)
>> > #x10000)
>> >                        (err "bad string \\u escape, ~a"
>> >                             "bad second half of a UTF16 pair")))
>> >                  e)) ; single \u escape
>> >            (write-string (string (integer->char e*)) result)
>> >            (loop))]
>> >         [else (err "bad string escape: \"~a\"" esc)])))
>> >   ;;
>> >   (define (read-list what end-rx read-one)
>> >     (skip-whitespace)
>> >     (if (regexp-try-match end-rx i)
>> >         '()
>> >         (let loop ([l (list (read-one))])
>> >           (skip-whitespace)
>> >           (cond [(regexp-try-match end-rx i) (reverse l)]
>> >                 [(regexp-try-match #rx#"^," i) (loop (cons (read-one)
>> l))]
>> >                 [else (err "error while parsing a json ~a" what)]))))
>> >   ;;
>> >   (define (read-hash)
>> >     (define (read-pair)
>> >       (define k (read-json))
>> >       (unless (string? k) (err "non-string value used for json object
>> key"))
>> >       (skip-whitespace)
>> >       (unless (regexp-try-match #rx#"^:" i)
>> >         (err "error while parsing a json object pair"))
>> >       (list (string->symbol k) (read-json)))
>> >     (apply hasheq (apply append (read-list 'object #rx#"^}"
>> read-pair))))
>> >   ;;
>> >   (define (read-json [top? #f])
>> >     (skip-whitespace)
>> >     (cond
>> >       [(and top? (eof-object? (peek-char i))) eof]
>> >       [(regexp-try-match #px#"^true\\b"  i) #t]
>> >       [(regexp-try-match #px#"^false\\b" i) #f]
>> >       [(regexp-try-match #px#"^null\\b"  i) jsnull]
>> >       [(regexp-try-match
>> >         #rx#"^-?(?:0|[1-9][0-9]*)(?:\\.[0-9]+)?(?:[eE][+-]?[0-9]+)?" i)
>> >        => (λ (bs) (string->number (bytes->string/utf-8 (car bs))))]
>> >       [(regexp-try-match #rx#"^[\"[{]" i)
>> >        => (λ (m)
>> >             (let ([m (car m)])
>> >               (cond [(equal? m #"\"") (read-string)]
>> >                     [(equal? m #"[")  (read-list 'array #rx#"^\\]"
>> read-json)]
>> >                     [(equal? m #"{")  (read-hash)])))]
>> >       [else (err (format "bad input~n ~e" (peek-bytes (sub1
>> > (error-print-width)) 0 i)))]))
>> >   ;;
>> >   (read-json #t))
>> >
>> >
>> > … and my guess is that the JS performance would be similar, if the json
>> reader
>> > in JS was written in JS. I think there are probably a lot of
>> provably-unneeded
>> > checks, and you could probably get rid of the byte-at-a-time reading.
>> >
>> > It would be interesting to see how much faster (if at all) it is to run
>> the TR
>> > version of this code.
>> >
>> > John
>> >
>> >
>> > > On Feb 22, 2019, at 9:47 AM, Brian Craft <craft.br...@gmail.com>
>> wrote:
>> > >
>> > > I'm doing a few performance tests, just to get an idea of racket
>> > performance. The following result surprised me a bit. Parsing 1M
>> strings from
>> > a json array, like
>> > >
>> > > (define samples (time (read-json (open-input-file "test.json"))))
>> > >
>> > > running with 'racket test.rkt'
>> > >
>> > > Comparing to js, java, and clojure:
>> > >
>> > > js 0.128s
>> > > java 0.130s
>> > > clojure 1.3s
>> > > racket 10s
>> > >
>> > > This is pretty slow. Is this typical? Are there other steps I should
>> be
>> > taking, for performance?
>> > >
>> > > --
>> > > You received this message because you are subscribed to the Google
>> Groups
>> > "Racket Users" group.
>> > > To unsubscribe from this group and stop receiving emails from it,
>> send an
>> > email to racket-users+unsubscr...@googlegroups.com.
>> > > For more options, visit https://groups.google.com/d/optout.
>> >
>> >
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> Groups
>> > "Racket Users" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> an
>> > email to racket-users+unsubscr...@googlegroups.com.
>> > For more options, visit https://groups.google.com/d/optout.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Racket Users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to racket-users+unsubscr...@googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.
>>
> --
> You received this message because you are subscribed to the Google Groups
> "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to racket-users+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [racket-users] performance, json

Reply via email to