string-split always uses regex. I wonder if a fast path when the splitter
is a regular string will be worth it.

On Fri, Mar 19, 2021 at 4:19 AM Pawel Mosakowski <[email protected]>
wrote:

> Hi Bogdan,
>
> This is a brilliant solution and also completely over my head. It finishes
> in ~3.75s on my PC and is faster than the Python version which basically
> delegates all the work to C. I will need to spend some time on
> understanding it but I am looking forward to learning something new.
>
> Many thanks,
> Pawel
>
> On Thursday, March 18, 2021 at 7:22:10 PM UTC bogdan wrote:
>
>> I managed to get it about as fast as Python by making it really
>> imperative and rolling my own hash:
>>
>> https://gist.github.com/Bogdanp/fb39d202037cdaadd55dae3d45737571
>>
>> Sam Tobin-Hochstadt writes:
>>
>> > Here are several variants of the code:
>> > https://gist.github.com/d6fbe3757c462d5b4d1d9393b72f9ab9
>> >
>> > The enabled version is about the fastest I can get without using
>> > `unsafe` (which the rules said not to do). It's possible to optimize a
>> > tiny bit more by avoiding sorting, but only a few milliseconds -- it
>> > would be more significant if there were more different words.
>> >
>> > Switching to bytes works correctly for the given task, but wouldn't
>> > always work in the case of general UTF8 input. But those versions
>> > appeared not be faster for me. Also, writing my own string-downcase
>> > didn't help. And using a big buffer and doing my own newline splitting
>> > didn't help either.
>> >
>> > The version using just a regexp matching on a port (suggested by
>> > Robby) turned out not to be faster either, so my suspicion is that the
>> > original slowness is just using regexps for splitting words.
>> >
>> > Sam
>> >
>> > On Thu, Mar 18, 2021 at 11:28 AM Sam Tobin-Hochstadt
>> > <[email protected]> wrote:
>> >>
>> >> Here's a somewhat-optimized version of the code:
>> >>
>> >> #lang racket/base
>> >> (require racket/string racket/vector racket/port)
>> >>
>> >> (define h (make-hash))
>> >>
>> >> (time
>> >> (for* ([l (in-lines)]
>> >> [w (in-list (string-split l))]
>> >> [w* (in-value (string-downcase w))])
>> >> (hash-update! h w* add1 0)))
>> >>
>> >> (define v
>> >> (time
>> >> (for/vector #:length (hash-count h)
>> >> ([(k v) (in-hash h)])
>> >> (cons k v))))
>> >> (time (vector-sort! v > #:key cdr))
>> >> (define p (current-output-port) #;(open-output-nowhere))
>> >> (time
>> >> (for ([pair (in-vector v)])
>> >> (write-string (car pair) p)
>> >> (write-string (number->string (cdr pair)) p)
>> >> (newline p)))
>> >>
>> >> It's much more imperative, but also pretty nice and compact. The
>> >> `printf` optimization is significant for that portion of the program,
>> >> but that isn't much of the running time. The overall running time for
>> >> 10 copies of the KJV is about 9 seconds on my laptop.
>> >>
>> >> I think the remaining difference between Racket and other languages is
>> >> likely the `string-split` and `string-downcase` functions, plus the
>> >> relatively-inefficient string representation that Racket uses.
>> >>
>> >> Sam
>> >>
>> >>
>> >> On Thu, Mar 18, 2021 at 10:28 AM Pawel Mosakowski <
>> [email protected]> wrote:
>> >> >
>> >> > Hi David,
>> >> >
>> >> > Yes, the 21 seconds includes the interpreter startup time. I have
>> done a simple test to see how long it takes:
>> >> >
>> >> > $ time racket -e '(displayln "Hello, world")'
>> >> > Hello, world
>> >> >
>> >> > real 0m0.479s
>> >> > user 0m0.449s
>> >> > sys 0m0.030s
>> >> >
>> >> > I have also put my code inside a main function and profiled it:
>> >> >
>> >> > Profiling results
>> >> > -----------------
>> >> > Total cpu time observed: 20910ms (out of 20970ms)
>> >> > Number of samples taken: 382 (once every 55ms)
>> >> > (Hiding functions with self<1.0% and local<2.0%: 1 of 12 hidden)
>> >> >
>> >> > ==============================================================
>> >> > Caller
>> >> > Idx Total Self Name+src Local%
>> >> > ms(pct) ms(pct) Callee
>> >> > ==============================================================
>> >> > [1] 20910(100.0%) 0(0.0%) [running body]
>> ...word-occurences-profile.rkt":##f
>> >> > profile-thunk [2] 100.0%
>> >> > --------------------------------------------------------------
>> >> > [running body] [1] 100.0%
>> >> > [2] 20910(100.0%) 0(0.0%) profile-thunk
>> ...ket/pkgs/profile-lib/main.rkt:9:0
>> >> > run [3] 100.0%
>> >> > --------------------------------------------------------------
>> >> > profile-thunk [2] 100.0%
>> >> > [3] 20910(100.0%) 0(0.0%) run
>> ...share/racket/pkgs/profile-lib/main.rkt:39:2
>> >> > main [4] 100.0%
>> >> > --------------------------------------------------------------
>> >> > run [3] 100.0%
>> >> > [4] 20910(100.0%) 50(0.2%) main
>> ...cket/count-word-occurences-profile.rkt:5:0
>> >> > read-from-stdin-it [5] 98.5%
>> >> > ??? [6] 0.2%
>> >> > --------------------------------------------------------------
>> >> > main [4] 100.0%
>> >> > [5] 20606(98.5%) 11796(56.4%) read-from-stdin-it
>> ...-occurences-profile.rkt:19:6
>> >> > internal-split [7] 42.8%
>> >> > --------------------------------------------------------------
>> >> > main [4] 100.0%
>> >> > [6] 51(0.2%) 0(0.0%) ???
>> ...cket/collects/racket/private/sort.rkt:369:3
>> >> > generic-sort/key [8] 100.0%
>> >> > --------------------------------------------------------------
>> >> > read-from-stdin-it [5]100.0%
>> >> > [7] 8810(42.1%) 3528(16.9%) internal-split
>> ...collects/racket/string.rkt:117:0
>> >> > regexp-split [9] 59.9%
>> >> > --------------------------------------------------------------
>> >> > ??? [6] 100.0%
>> >> > [8] 51(0.2%) 0(0.0%) generic-sort/key
>> .../racket/private/sort.rkt:156:2
>> >> > copying-mergesort [10]100.0%
>> >> > --------------------------------------------------------------
>> >> > internal-split [7] 100.0%
>> >> > [9] 5282(25.3%) 2810(13.4%) regexp-split
>> ...ts/racket/private/string.rkt:338:2
>> >> > loop [11] 46.8%
>> >> > --------------------------------------------------------------
>> >> > generic-sort/key [8] 10.0%
>> >> > copying-mergesort [10] 90.0%
>> >> > [10] 51(0.2%) 51(0.2%) copying-mergesort
>> ...racket/private/sort.rkt:129:8
>> >> > copying-mergesort [10] 90.0%
>> >> > --------------------------------------------------------------
>> >> > regexp-split [9] 100.0%
>> >> > [11] 2471(11.8%) 2471(11.8%) loop
>> ...t/collects/racket/private/string.rkt:169:7
>> >> > --------------------------------------------------------------
>> >> >
>> >> > Kind regards,
>> >> > Pawel
>> >> >
>> >> >
>> >> > On Thursday, March 18, 2021 at 2:09:35 PM UTC [email protected]
>> wrote:
>> >> >>
>> >> >> Hi Pawel,
>> >> >>
>> >> >> I'll take a look at the code later, but did that 21 seconds include
>> startup time for the interpreter?
>> >> >>
>> >> >> On Thu, Mar 18, 2021, 9:24 AM Pawel Mosakowski <
>> [email protected]> wrote:
>> >> >>>
>> >> >>> Hello,
>> >> >>>
>> >> >>> I am a Racket beginner and I have come across this article:
>> >> >>>
>> >> >>> https://benhoyt.com/writings/count-words/
>> >> >>>
>> >> >>> This is my attempt at solving the challenge:
>> >> >>>
>> >> >>> https://pastebin.com/kL16w5Hc
>> >> >>>
>> >> >>> However when I have benchmarked it, it takes ~21 seconds to run
>> compared to the Python and Ruby versions which take around 3-4 seconds.
>> >> >>>
>> >> >>> I understand that both Ruby and Python probably have the string
>> operations and hash tables implemented in optimized C but is there anything
>> I can do to improve performance of my program?
>> >> >>>
>> >> >>> Many thanks for all help and suggestions.
>> >> >>>
>> >> >>> --
>> >> >>> You received this message because you are subscribed to the Google
>> Groups "Racket Users" group.
>> >> >>> To unsubscribe from this group and stop receiving emails from it,
>> send an email to [email protected].
>> >> >>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/racket-users/118c1340-66d1-421d-92a4-6b66c56cb88fn%40googlegroups.com.
>>
>> >> >
>> >> > --
>> >> > You received this message because you are subscribed to the Google
>> Groups "Racket Users" group.
>> >> > To unsubscribe from this group and stop receiving emails from it,
>> send an email to [email protected].
>> >> > To view this discussion on the web visit
>> https://groups.google.com/d/msgid/racket-users/09c58a34-bd2d-49e7-bfbd-d3253c1e6dd1n%40googlegroups.com.
>>
>>
> --
> You received this message because you are subscribed to the Google Groups
> "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/racket-users/ca365c42-d698-4e43-863e-7fc95849c31en%40googlegroups.com
> <https://groups.google.com/d/msgid/racket-users/ca365c42-d698-4e43-863e-7fc95849c31en%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/racket-users/CADcuegueAq32wBKKpCFiBwbv%3DZkyQELL00mFodvqDMkj0_hO1A%40mail.gmail.com.

Reply via email to