This is a simplified example of a Monte Carlo simulation where random vectors (here 2D vectors, which are all zero) are summed (the result is in r1 and r2 or r, respectively):
def case1(): import numpy as np M = 100000 N = 10000 r1 = np.zeros(M) r2 = np.zeros(M) s1 = np.zeros(N) s2 = np.zeros(N) for n in range(1000): ind = np.random.random_integers(N, size=M) - 1 r1 += s1[ind] r2 += s2[ind] def case2(): import numpy as np M = 100000 N = 10000 r = np.zeros((M, 2)) s = np.zeros((N, 2)) for n in range(1000): ind = np.random.random_integers(N, size=M) - 1 r += s[ind] import timeit print("case1:", timeit.timeit( "case1()", setup="from __main__ import case1", number=1)) print("case2:", timeit.timeit( "case2()", setup="from __main__ import case2", number=1)) Resulting in: case1: 2.6224704339983873 case2: 4.374910838028882 Why is case2 significantly slower (almost by a factor of 2) than case1? There should be the same number of operations (additions) in both cases; the main difference is the indexing. Is there another (faster) way to avoid the separate component arrays r1 and r2? (I also tried r = np.zeros(M, dtype='2d'), which was comparable to case2.) Olaf -- https://mail.python.org/mailman/listinfo/python-list