Can you provide the comparable python code? Perhaps even the data used for testing?
Since you are evaluating Julia, there are two important points to remember: 1) In Julia because the language is fast enough to implement basic functionality in Julia, then the distinction between Base Julia and additional packages is small. Opting to use 'just' the core makes less sense - the core is just a pre-compiled package. 2) The community is part of the language, so it should be regarded when making considerations. On Monday, November 30, 2015 at 4:21:51 PM UTC+2, Attila Zséder wrote: > > Hi, > > Thank you all for the responses. > > 1. I tried simple profiling, but its output was difficult me to interpret, > maybe if i put more time in it. I will try ProfileView later. > 2. FastAnonymous gave me a serious speedup (20-30%). (But since it is an > external dependency, it's kind of cheating, seeing the purpose of this > small word count test) > 3. Using ASCIIString is not a good option right now, since there are > unicode characters there. I am trying with both UTF8String and > AbstractString, I don't see any difference in performance right now. > 4. Using ht_keyindex() is out of scope for me right now, because this is a > pet project, I just wanted to see how fast current implementation is, > without these kind of solutions. > > I think I will keep trying with later versions of julia, but with sticking > to the standard library only, without using any external packages. > > Attila > > 2015. november 29., vasárnap 17:59:42 UTC+1 időpontban Yichao Yu a > következőt írta: >> >> On Sun, Nov 29, 2015 at 11:42 AM, Milan Bouchet-Valat <nali...@club.fr> >> wrote: >> > Le dimanche 29 novembre 2015 à 08:28 -0800, Cedric St-Jean a écrit : >> >> What I would try: >> >> >> >> 1. ProfileView to pinpoint the bottleneck further >> >> 2. FastAnonymous to fix the lambda >> >> 3. http://julia-demo.readthedocs.org/en/latest/manual/performance-tip >> >> s.html In particular, you may check `code_typed`. I don't have >> >> experience with `split` and `eachline`. It's possible that they are >> >> not type stable (the compiler can't predict their output's type). I >> >> would try `for w::ASCIIString in ...` >> >> 4. Dict{ASCIIString, Int}() >> >> 5. Your loop will hash each string twice. I don't know how to fix >> >> that, anyone? >> > You can use the unexported Base.ht_keyindex() function like this: >> > https://github.com/nalimilan/FreqTables.jl/blob/7884c000e6797d7ec621e07 >> > b8da58e7939e39867/src/freqtable.jl#L36 >> > >> > But this is at your own risk, as it may change without warning in a >> > future Julia release. >> > >> > We really need a public API for it. >> >> IIUC, https://github.com/JuliaLang/julia/issues/12157 >> >> > >> > >> > Regards >> > >> >> >> >> Good luck, >> >> >> >> Cédric >> >> >> >> On Saturday, November 28, 2015 at 8:08:49 PM UTC-5, Lampkld wrote: >> >> > Maybe it's the lambda? These are slow in julia right now. >> >