I’m trying to optimize my code, and since one of the bottle necks is garbage collection, I’ve been trying to get rid of unnecessary memory allocation.
In the process, I ended up dealing with this case: let X = Int64[1,2,3,14] function sub2(X::Vector{Int64}, nNodes::Int64) N = length(X)::Int64 index = X[N]::Int64 stride = 1::Int64 for k in (N-1):-1:1 stride = stride::Int64 * nNodes::Int64 index += ((X[k]-1)::Int64 * stride::Int64)::Int64 end return index::Int64 end #force compiling sub2(X,1) #profile @time for j in 1:1e7 sub2(X,21) end @time for j in 1:1e7 sub2(X,22) end end The result is: elapsed time: 0.318801734 seconds (96 bytes allocated) elapsed time: 0.434234715 seconds (160000096 bytes allocated, 11.08% gc time) I’m trying to understand why calling the function with n=21 (or lower) does not allocate much memory, whereas calling it with n=22 (or higher) is causing a lot of memory to be allocated (with the value of X I picked, sub2(X,21) =497 and sub2(X,22)=542. I tired with different values of X, and if the return value is larger than 512 it ends up allocating more memory, if it's less than 512 it does not) I did my best to make sure types are asserted to avoid type instability. This does not have a large effect in the big picture, but I am trying to wrap my head around how / when memory is allocated. FWIW, here is the output of running using —track-allocation=all - let X = Int64[1,2,3,14] - - - function sub2(X::Vector{Int64}, nNodes::Int64) 160151688 N = length(X)::Int64 0 index = X[N]::Int64 0 stride = 1::Int64 0 for k in (N-1):-1:1 0 stride = stride::Int64 * nNodes::Int64 0 index += ((X[k]-1)::Int64 * stride::Int64)::Int64 - end 0 return index::Int64 - end - - #force compiling - sub2(X,1) - - #profile - @time for j in 1:1e7 sub2(X,21) end - @time for j in 1:1e7 sub2(X,22) end - - end test.jl.mem (END)