[Pharo-dev] Re: Array sum. is very slow

Guillermo Polito Fri, 07 Jan 2022 02:01:35 -0800

Yes, I just saw also that I used an interval instead of an array… I need to 
sleep more ^^


Anyways, even with a 28k large array wether they are small integers or floats, 
I have "reasonable results” (where reasonable = not taking hours, nor minutes 
but a couple of milliseconds :P)

randarray := Array new: 28800 withAll: 0.
[randarray sum] bench "'2059.176 per second'"

randarray2 := Array new: 28800 withAll: 0.1234567.
[randarray2 sum] bench "'1771.737 per second’"

I join John’s request to see the Python code…
Is that possible?
G

> El 6 ene 2022, a las 23:35, Jimmie Houchin <[email protected]> escribió:
> 
> No, it is an array of floats. The only integers in the test are in the 
> indexes of the loops.
> 
> Number random. "generates a float  0.8188008774329387"
> 
> So in the randarray below it is an array of 28800 floats.
> 
> It just felt so wrong to me that Python3 was so much faster. I don't care if 
> Nim, Crystal, Julia are faster. But...
> 
> 
> I am new to Iceberg and have never shared anything on Github so this is all 
> new to me. I uploaded my language test so you can see what it does. It is a 
> micro-benchmark. It does things that are not realistic in an app. But it does 
> stress a language in areas important to my app.
> 
> 
> https://github.com/jlhouchin/LanguageTestPharo
> 
> 
> Let me know if there is anything else I can do to help solve this problem.
> 
> I am a lone developer in my spare time. So my apologies for any ugly code.
> 
> 
> Thanks for your help.
> 
> Jimmie
> 
> 
> On 1/6/22 15:07, Guillermo Polito wrote:
>> Hi Jummie,
>> 
>> Is it possible that your program is computing a lot of **very** large 
>> integers?
>> 
>> I’m just trying the following with small numbers, and I don’t see the issue. 
>> #sum executes on a 28k large collection around 20 million times per second 
>> on my old 2015 i5.
>> 
>> a := (1 to: 28000).
>> [a sum] bench "'20256552.490 per second’"
>> 
>> If you could share with us more data, we could take a look.
>> Now i’m curious.
>> 
>> Thanks,
>> G
>> 
>>> El 6 ene 2022, a las 21:37, Jimmie Houchin <[email protected]> escribió:
>>> 
>>> I have written a micro benchmark which stresses a language in areas which 
>>> are crucial to my application.
>>> 
>>> I have written this micro benchmark in Pharo, Crystal, Nim, Python, 
>>> PicoLisp, C, C++, Java and Julia.
>>> 
>>> On my i7 laptop Julia completes it in about 1 minute and 15 seconds, 
>>> amazing magic they have done.
>>> 
>>> Crystal and Nim do it in about 5 minutes. Python in about 25 minutes. Pharo 
>>> takes over 2 hours. :(
>>> 
>>> In my benchmarks if I comment out the sum and average of the array. It 
>>> completes in 3.5 seconds.
>>> And when I sum the array it gives the correct results. So I can verify its 
>>> validity.
>>> 
>>> To illustrate below is some sample code of what I am doing. I iterate over 
>>> the array and do calculations on each value of the array and update the 
>>> array and sum and average at each value simple to stress array access and 
>>> sum and average.
>>> 
>>> 28800 is simply derived from time series one minute values for 5 days, 4 
>>> weeks.
>>> 
>>> randarray := Array new: 28800.
>>> 
>>> 1 to: randarray size do: [ :i | randarray at: i put: Number random ].
>>> 
>>> randarrayttr := [ 1 to: randarray size do: [ :i | "other calculations 
>>> here." randarray sum. randarray average ]] timeToRun.
>>> 
>>> randarrayttr. "0:00:00:36.135"
>>> 
>>> 
>>> I do 2 loops with 100 iterations each.
>>> 
>>> randarrayttr * 200. "0:02:00:27"
>>> 
>>> 
>>> I learned early on in this adventure when dealing with compiled languages 
>>> that if you don’t do a lot, the test may not last long enough to give any 
>>> times.
>>> 
>>> Pharo is my preference. But this is an awful big gap in performance. When 
>>> doing backtesting this is huge. Does my backtest take minutes, hours or 
>>> days?
>>> 
>>> I am not a computer scientist nor expert in Pharo or Smalltalk. So I do not 
>>> know if there is anything which can improve this.
>>> 
>>> 
>>> However I have played around with several experiments of my #sum: method.
>>> 
>>> This implementation reduces the time on the above randarray in half.
>>> 
>>> sum: col
>>> | sum |
>>> sum := 0.
>>> 1 to: col size do: [ :i |
>>>      sum := sum + (col at: i) ].
>>> ^ sum
>>> 
>>> randarrayttr2 := [ 1 to: randarray size do: [ :i | "other calculations 
>>> here."
>>>     ltsa sum: randarray. ltsa sum: randarray ]] timeToRun.
>>> randarrayttr2. "0:00:00:18.563"
>>> 
>>> And this one reduces it a little more.
>>> 
>>> sum10: col
>>> | sum |
>>> sum := 0.
>>> 1 to: ((col size quo: 10) * 10) by: 10 do: [ :i |
>>>      sum := sum + (col at: i) + (col at: (i + 1)) + (col at: (i + 2)) + 
>>> (col at: (i + 3)) + (col at: (i + 4))
>>>          + (col at: (i + 5)) + (col at: (i + 6)) + (col at: (i + 7)) + (col 
>>> at: (i + 8)) + (col at: (i + 9))].
>>> ((col size quo: 10) * 10 + 1) to: col size do: [ :i |
>>>      sum := sum + (col at: i)].
>>> ^ sum
>>> 
>>> randarrayttr3 := [ 1 to: randarray size do: [ :i | "other calculations 
>>> here."
>>>     ltsa sum10: randarray. ltsa sum10: randarray ]] timeToRun.
>>> randarrayttr3. "0:00:00:14.592"
>>> 
>>> It closes the gap with plain Python3 no numpy. But that is a pretty low 
>>> standard.
>>> 
>>> Any ideas, thoughts, wisdom, directions to pursue.
>>> 
>>> Thanks
>>> 
>>> Jimmie
>>>

[Pharo-dev] Re: Array sum. is very slow

Reply via email to