Thanks, Dan! Indeed my "x" vector is sorted and your suggestion is really fast!
Best, Charles On 30 December 2015 at 12:08, Dan <getz...@gmail.com> wrote: > Thanks Kristoffer, turns out there is always interesting stuff in the bag > of optimization tricks. > Regarding the original function, a cheat could make it faster: The `x` > vector is sorted, which means: > > function calcSum4(x::Array{Float64,1}, y::Array{Float64,1}, Ei::Float64, > Ef::Float64, N::Int64) > mysum=0.0::Float64; > i=1 > @inbounds while i<=N && x[i]<=Ei i+=1 ; end > j=i > @inbounds while j<=N && x[j]<=Ef j+=1 ; end > @inbounds @simd for k=i:(j-1) mysum += y[k] ; end > return(mysum); > end > > returns the same answer. > there are always more options ;) > > On Wednesday, December 30, 2015 at 12:19:41 PM UTC+2, Charles Santana > wrote: >> >> The magic of @inbounds and @simd :) >> >> Thanks, Kristoffer! >> >> Charles >> >> >> On Wednesday, December 30, 2015, Kristoffer Carlsson <kcarl...@gmail.com> >> wrote: >> >>> If you want to get an even faster version you could do something like: >>> >>> function calcSum_simd{T}(x::Vector{T}, y::Vector{T}, Ei::T, Ef::T) >>> mysum = zero(T) >>> @inbounds @simd for i in eachindex(x, y) >>> mysum += ifelse(Ei < x[i] <= Ef, y[i], zero(T)) >>> >>> end >>> return mysum >>> end >>> >>> which would use SIMD instructions. >>> >>> Timing difference: >>> >>> N = 10000000 >>> y = rand(N); >>> x = rand(N) >>> Ei = 0.2; >>> Ef = 0.7; >>> >>> julia> @time calcSum_simd(x,y,Ei, Ef); >>> 0.021155 seconds (5 allocations: 176 bytes) >>> >>> >>> julia> @time calcSum(x,y,Ei, Ef) >>> 0.069911 seconds (5 allocations: 176 bytes) >>> >>> >>> Regarding map being slow. That is worked on here >>> https://github.com/JuliaLang/julia/pull/13412 >>> >>> >>> On Wednesday, December 30, 2015 at 3:05:47 AM UTC+1, Charles Santana >>> wrote: >>>> >>>> Sorry, there was a typo in the function calcSum2. Please consider the >>>> following code: >>>> >>>> function calcSum2(x::Array{Float64,1}, y::Array{Float64,1}, >>>> Ei::Float64, Ef::Float64, N::Int64) >>>> >>>> return sum(y[map(v -> Ei < v <= Ef, x)]); >>>> end >>>> >>>> >>>> And so the results of the calls for this function change a bit (but not >>>> the performance): >>>> >>>> @time calcSum2(x,y,Ei,Ef,N) >>>> 0.000110 seconds (1.01 k allocations: 20.969 KB) >>>> 246.1975746121703 >>>> >>>> @time calcSum2(x,y,Ei,Ef,N) >>>> 0.000079 seconds (1.01 k allocations: 20.969 KB) >>>> 246.1975746121703 >>>> >>>> @time calcSum2(x,y,Ei,Ef,N) >>>> 0.000051 seconds (1.01 k allocations: 20.969 KB) >>>> 246.1975746121703 >>>> >>>> >>>> Thanks again, sorry for this inconvenience! >>>> >>>> Charles >>>> >>>> On 30 December 2015 at 03:00, Charles Novaes de Santana < >>>> charles...@gmail.com> wrote: >>>> >>>>> Dear all, >>>>> >>>>> In a project I am developing a @profile shows me that the slowest part >>>>> of the code is the sum of elements of an Array that follow some >>>>> conditions. >>>>> >>>>> Please consider the following code: >>>>> >>>>> y = rand(1000); >>>>> x = collect(0.0:0.001:0.999); >>>>> Ei = 0.2; >>>>> Ef = 0.7; >>>>> N = length(x) >>>>> >>>>> I want to calculate the sum of elements in "y" for which elements the >>>>> respective values in "x" are between "Ei" and "Ef". If I was using R, for >>>>> example, I would use something like: >>>>> >>>>> mysum = sum(y[which((x < Ef)&&(x > Ei))]); #(not tested in R, but I >>>>> suppose that is the way to do it) >>>>> >>>>> In Julia, I can think in at least two ways to calculate it: >>>>> >>>>> function calcSum(x::Array{Float64,1}, y::Array{Float64,1}, >>>>> Ei::Float64, Ef::Float64, N::Int64) >>>>> mysum=0.0::Float64; >>>>> for(i in 1:N) >>>>> if( Ei < x[i] <= Ef) >>>>> mysum += y[i]; >>>>> end >>>>> end >>>>> return(mysum); >>>>> end >>>>> >>>>> function calcSum2(x::Array{Float64,1}, y::Array{Float64,1}, >>>>> Ei::Float64, Ef::Float64, N::Int64) >>>>> return sum(y[map(v -> Ei < v < Ef, x)]); >>>>> end >>>>> >>>>> As you can see below, for the first function (calcSum) I got a much >>>>> better performance than for the second one (minimum 10x faster). >>>>> >>>>> >>>>> @time calcSum(x,y,Ei,Ef,N) >>>>> 0.003986 seconds (2.56 k allocations: 125.168 KB) >>>>> 246.19757461217014 >>>>> >>>>> @time calcSum(x,y,Ei,Ef,N) >>>>> 0.000003 seconds (5 allocations: 176 bytes) >>>>> 246.19757461217014 >>>>> >>>>> @time calcSum(x,y,Ei,Ef,N) >>>>> 0.000002 seconds (5 allocations: 176 bytes) >>>>> 246.19757461217014 >>>>> >>>>> @time calcSum2(x,y,Ei,Ef,N) >>>>> 0.003762 seconds (1.61 k allocations: 53.743 KB) >>>>> 245.48156534879303 >>>>> >>>>> @time calcSum2(x,y,Ei,Ef,N) >>>>> 0.000050 seconds (1.01 k allocations: 20.969 KB) >>>>> 245.48156534879303 >>>>> >>>>> @time calcSum2(x,y,Ei,Ef,N) >>>>> 0.000183 seconds (1.01 k allocations: 20.969 KB) >>>>> 245.48156534879303 >>>>> >>>>> Does any one have an idea about how to improve the performance here? >>>>> >>>>> Many thanks for any help! Happy new year to all of you! >>>>> >>>>> Charles >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Um axé! :) >>>>> >>>>> -- >>>>> Charles Novaes de Santana, PhD >>>>> http://www.imedea.uib-csic.es/~charles >>>>> >>>> >>>> >>>> >>>> -- >>>> Um axé! :) >>>> >>>> -- >>>> Charles Novaes de Santana, PhD >>>> http://www.imedea.uib-csic.es/~charles >>>> >>> >> >> -- >> Um axé! :) >> >> -- >> Charles Novaes de Santana, PhD >> http://www.imedea.uib-csic.es/~charles >> >> -- Um axé! :) -- Charles Novaes de Santana, PhD http://www.imedea.uib-csic.es/~charles