(Welcome to Racket v8.0.0.1 [cs]. ) All results are measured on my laptop on the 10x file with `$ time racket <file.rkt>`, thus including the Racket VM.
* Bogdan's version with #lang racket/base: 1s. * Dominik's version with vectors of length 256 (instead of 26) and splitting on spaces/return/newline only and #lang racket/base: 1.6s But Bogdan's printing takes only ~100ms while Dominik's takes ~400ms. * I also implemented a variant of Dominik's 'discrimination tree' to use a hasheqv instead of a vector at each node: 6.2s (but the memory footprint is likely nicer :-p ) This also uses `in-lines` instead of Bogdan's buffers so there may still be something to gain here. * Replacing a hasheqv with an assoc: 4.2s. * Starting with an assoc and switching to a hasheqv when there are too many elements (n=20, couldn't do better): 3.9s Code is here: https://gist.github.com/Metaxal/ae0a6937d8f388f3f40ec7396041be55 I also noticed that `dict-ref` is *really* slow (35s) compared to `assv`. @Bogdan: You can use `#:key` in `sort`. On Fri, Mar 19, 2021 at 12:08 PM Bogdan Popa <[email protected]> wrote: > Nice! It's worth pointing out, though, that by limiting yourself to > alpha chars, you're processing about 8% less data and the results don't > pass the tests. :P > > $ wc kjvbible_x10.txt > 998170 8211330 43325060 > > $ sed 's/[a-zA-Z ]//g' < kjvbible_x10.txt | wc > 998170 739310 3600800 > > I think your version would still be faster, but I'm curious what the > numbers would look like if only whitespace chars were considered word > separators. > > Dominik Pantůček writes: > > > Another attack of [1]. But yeah, why not do some [2]. > > > > Trees to the rescue [3]. > > > > $ racket --version > > Welcome to Racket v8.0 [cs]. > > > > $ racket countwords-bogdan2.rkt <kjvbible.txt |tail -1 > > cpu time: 135 real time: 135 gc time: 8 > > > > $ racket countwords-dzoe2.rkt <kjvbible.txt | tail -1 > > cpu time: 69 real time: 69 gc time: 3 > > > > I just changed (countwords) to (time (countwords)) in Bogdan's code to > > measure the running time. > > > > The difference is that I am positively defining which letters form words > > (a-z, A-Z) and that all others are treated as word separators. The > > buffer size is the same - and honestly, the speedup between 1024 and > > 1024^2 bytes buffer is barely measurable. > > > > The only option for further speedup I can immediately think of is to > > allocate a huge vector of wtnodes and change chld field to be a starting > > index into this big vector (should reduce allocations). > > > > Btw, making it unsafe does not speed it up at all (probably CS > > recognizes the vectors and all those refs are inlined anyway). > > > > > > Cheers, > > Dominik > > > > [1] https://xkcd.com/386/ > > [2] http://phdcomics.com/comics/archive.php?comicid=1735 > > [3] https://gist.github.com/dzoep/0e081d0544afac539a4829179c601e0e > > > > On 19. 03. 21 11:18, Bogdan Popa wrote: > >> I updated the gist with some cleanups and additional improvements that > >> get the runtime down to a little over 1s (vs ~350ms for the optimized C > >> and Rust code) on my maxed-out 2019 MBP and ~600ms on my M1 Mac Mini. > >> > >> Pawel Mosakowski writes: > >> > >>> Hi Bogdan, > >>> > >>> This is a brilliant solution and also completely over my head. It > finishes > >>> in ~3.75s on my PC and is faster than the Python version which > basically > >>> delegates all the work to C. I will need to spend some time on > >>> understanding it but I am looking forward to learning something new. > >>> > >>> Many thanks, > >>> Pawel > >>> > >>> On Thursday, March 18, 2021 at 7:22:10 PM UTC bogdan wrote: > >>> > >>>> I managed to get it about as fast as Python by making it really > >>>> imperative and rolling my own hash: > >>>> > >>>> https://gist.github.com/Bogdanp/fb39d202037cdaadd55dae3d45737571 > >>>> > >>>> Sam Tobin-Hochstadt writes: > >>>> > >>>>> Here are several variants of the code: > >>>>> https://gist.github.com/d6fbe3757c462d5b4d1d9393b72f9ab9 > >>>>> > >>>>> The enabled version is about the fastest I can get without using > >>>>> `unsafe` (which the rules said not to do). It's possible to optimize > a > >>>>> tiny bit more by avoiding sorting, but only a few milliseconds -- it > >>>>> would be more significant if there were more different words. > >>>>> > >>>>> Switching to bytes works correctly for the given task, but wouldn't > >>>>> always work in the case of general UTF8 input. But those versions > >>>>> appeared not be faster for me. Also, writing my own string-downcase > >>>>> didn't help. And using a big buffer and doing my own newline > splitting > >>>>> didn't help either. > >>>>> > >>>>> The version using just a regexp matching on a port (suggested by > >>>>> Robby) turned out not to be faster either, so my suspicion is that > the > >>>>> original slowness is just using regexps for splitting words. > >>>>> > >>>>> Sam > >>>>> > >>>>> On Thu, Mar 18, 2021 at 11:28 AM Sam Tobin-Hochstadt > >>>>> <[email protected]> wrote: > >>>>>> > >>>>>> Here's a somewhat-optimized version of the code: > >>>>>> > >>>>>> #lang racket/base > >>>>>> (require racket/string racket/vector racket/port) > >>>>>> > >>>>>> (define h (make-hash)) > >>>>>> > >>>>>> (time > >>>>>> (for* ([l (in-lines)] > >>>>>> [w (in-list (string-split l))] > >>>>>> [w* (in-value (string-downcase w))]) > >>>>>> (hash-update! h w* add1 0))) > >>>>>> > >>>>>> (define v > >>>>>> (time > >>>>>> (for/vector #:length (hash-count h) > >>>>>> ([(k v) (in-hash h)]) > >>>>>> (cons k v)))) > >>>>>> (time (vector-sort! v > #:key cdr)) > >>>>>> (define p (current-output-port) #;(open-output-nowhere)) > >>>>>> (time > >>>>>> (for ([pair (in-vector v)]) > >>>>>> (write-string (car pair) p) > >>>>>> (write-string (number->string (cdr pair)) p) > >>>>>> (newline p))) > >>>>>> > >>>>>> It's much more imperative, but also pretty nice and compact. The > >>>>>> `printf` optimization is significant for that portion of the > program, > >>>>>> but that isn't much of the running time. The overall running time > for > >>>>>> 10 copies of the KJV is about 9 seconds on my laptop. > >>>>>> > >>>>>> I think the remaining difference between Racket and other languages > is > >>>>>> likely the `string-split` and `string-downcase` functions, plus the > >>>>>> relatively-inefficient string representation that Racket uses. > >>>>>> > >>>>>> Sam > >>>>>> > >>>>>> > >>>>>> On Thu, Mar 18, 2021 at 10:28 AM Pawel Mosakowski < > [email protected]> > >>>> wrote: > >>>>>>> > >>>>>>> Hi David, > >>>>>>> > >>>>>>> Yes, the 21 seconds includes the interpreter startup time. I have > >>>> done a simple test to see how long it takes: > >>>>>>> > >>>>>>> $ time racket -e '(displayln "Hello, world")' > >>>>>>> Hello, world > >>>>>>> > >>>>>>> real 0m0.479s > >>>>>>> user 0m0.449s > >>>>>>> sys 0m0.030s > >>>>>>> > >>>>>>> I have also put my code inside a main function and profiled it: > >>>>>>> > >>>>>>> Profiling results > >>>>>>> ----------------- > >>>>>>> Total cpu time observed: 20910ms (out of 20970ms) > >>>>>>> Number of samples taken: 382 (once every 55ms) > >>>>>>> (Hiding functions with self<1.0% and local<2.0%: 1 of 12 hidden) > >>>>>>> > >>>>>>> ============================================================== > >>>>>>> Caller > >>>>>>> Idx Total Self Name+src Local% > >>>>>>> ms(pct) ms(pct) Callee > >>>>>>> ============================================================== > >>>>>>> [1] 20910(100.0%) 0(0.0%) [running body] > >>>> ...word-occurences-profile.rkt":##f > >>>>>>> profile-thunk [2] 100.0% > >>>>>>> -------------------------------------------------------------- > >>>>>>> [running body] [1] 100.0% > >>>>>>> [2] 20910(100.0%) 0(0.0%) profile-thunk > >>>> ...ket/pkgs/profile-lib/main.rkt:9:0 > >>>>>>> run [3] 100.0% > >>>>>>> -------------------------------------------------------------- > >>>>>>> profile-thunk [2] 100.0% > >>>>>>> [3] 20910(100.0%) 0(0.0%) run > >>>> ...share/racket/pkgs/profile-lib/main.rkt:39:2 > >>>>>>> main [4] 100.0% > >>>>>>> -------------------------------------------------------------- > >>>>>>> run [3] 100.0% > >>>>>>> [4] 20910(100.0%) 50(0.2%) main > >>>> ...cket/count-word-occurences-profile.rkt:5:0 > >>>>>>> read-from-stdin-it [5] 98.5% > >>>>>>> ??? [6] 0.2% > >>>>>>> -------------------------------------------------------------- > >>>>>>> main [4] 100.0% > >>>>>>> [5] 20606(98.5%) 11796(56.4%) read-from-stdin-it > >>>> ...-occurences-profile.rkt:19:6 > >>>>>>> internal-split [7] 42.8% > >>>>>>> -------------------------------------------------------------- > >>>>>>> main [4] 100.0% > >>>>>>> [6] 51(0.2%) 0(0.0%) ??? > >>>> ...cket/collects/racket/private/sort.rkt:369:3 > >>>>>>> generic-sort/key [8] 100.0% > >>>>>>> -------------------------------------------------------------- > >>>>>>> read-from-stdin-it [5]100.0% > >>>>>>> [7] 8810(42.1%) 3528(16.9%) internal-split > >>>> ...collects/racket/string.rkt:117:0 > >>>>>>> regexp-split [9] 59.9% > >>>>>>> -------------------------------------------------------------- > >>>>>>> ??? [6] 100.0% > >>>>>>> [8] 51(0.2%) 0(0.0%) generic-sort/key > >>>> .../racket/private/sort.rkt:156:2 > >>>>>>> copying-mergesort [10]100.0% > >>>>>>> -------------------------------------------------------------- > >>>>>>> internal-split [7] 100.0% > >>>>>>> [9] 5282(25.3%) 2810(13.4%) regexp-split > >>>> ...ts/racket/private/string.rkt:338:2 > >>>>>>> loop [11] 46.8% > >>>>>>> -------------------------------------------------------------- > >>>>>>> generic-sort/key [8] 10.0% > >>>>>>> copying-mergesort [10] 90.0% > >>>>>>> [10] 51(0.2%) 51(0.2%) copying-mergesort > >>>> ...racket/private/sort.rkt:129:8 > >>>>>>> copying-mergesort [10] 90.0% > >>>>>>> -------------------------------------------------------------- > >>>>>>> regexp-split [9] 100.0% > >>>>>>> [11] 2471(11.8%) 2471(11.8%) loop > >>>> ...t/collects/racket/private/string.rkt:169:7 > >>>>>>> -------------------------------------------------------------- > >>>>>>> > >>>>>>> Kind regards, > >>>>>>> Pawel > >>>>>>> > >>>>>>> > >>>>>>> On Thursday, March 18, 2021 at 2:09:35 PM UTC [email protected] > >>>> wrote: > >>>>>>>> > >>>>>>>> Hi Pawel, > >>>>>>>> > >>>>>>>> I'll take a look at the code later, but did that 21 seconds > include > >>>> startup time for the interpreter? > >>>>>>>> > >>>>>>>> On Thu, Mar 18, 2021, 9:24 AM Pawel Mosakowski < > [email protected]> > >>>> wrote: > >>>>>>>>> > >>>>>>>>> Hello, > >>>>>>>>> > >>>>>>>>> I am a Racket beginner and I have come across this article: > >>>>>>>>> > >>>>>>>>> https://benhoyt.com/writings/count-words/ > >>>>>>>>> > >>>>>>>>> This is my attempt at solving the challenge: > >>>>>>>>> > >>>>>>>>> https://pastebin.com/kL16w5Hc > >>>>>>>>> > >>>>>>>>> However when I have benchmarked it, it takes ~21 seconds to run > >>>> compared to the Python and Ruby versions which take around 3-4 > seconds. > >>>>>>>>> > >>>>>>>>> I understand that both Ruby and Python probably have the string > >>>> operations and hash tables implemented in optimized C but is there > anything > >>>> I can do to improve performance of my program? > >>>>>>>>> > >>>>>>>>> Many thanks for all help and suggestions. > >>>>>>>>> > >>>>>>>>> -- > >>>>>>>>> You received this message because you are subscribed to the > Google > >>>> Groups "Racket Users" group. > >>>>>>>>> To unsubscribe from this group and stop receiving emails from it, > >>>> send an email to [email protected]. > >>>>>>>>> To view this discussion on the web visit > >>>> > https://groups.google.com/d/msgid/racket-users/118c1340-66d1-421d-92a4-6b66c56cb88fn%40googlegroups.com > >>>> . > >>>>>>> > >>>>>>> -- > >>>>>>> You received this message because you are subscribed to the Google > >>>> Groups "Racket Users" group. > >>>>>>> To unsubscribe from this group and stop receiving emails from it, > >>>> send an email to [email protected]. > >>>>>>> To view this discussion on the web visit > >>>> > https://groups.google.com/d/msgid/racket-users/09c58a34-bd2d-49e7-bfbd-d3253c1e6dd1n%40googlegroups.com > >>>> . > >>>> > >> > > -- > You received this message because you are subscribed to the Google Groups > "Racket Users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/racket-users/m2o8ff2vnt.fsf%40192.168.0.142 > . > -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/racket-users/CABNTSaEWoK7ePvD%2BQBo4qHBi-ogu65qFrg0oiAE1ey3Zp2gPXg%40mail.gmail.com.

