Hi Magali, (It is a slightly edited version I sent you. Since the aim of Outreachy is also to interact with Community, let enjoy the French proverb: «more crazy people we are, more fun we have». :-))
On Thu, 28 Jan 2021 at 00:53, Magali <magalileme...@gmail.com> wrote: > Another thing is that the command is a bit slower than 'git log' itself. > Thoughts on how that could be improved? The command is “slow”. A first quick analysis about the meaning of “slow”. Basically, I have run twice: --8<---------------cut here---------------start------------->8--- for n in 60000 10000 5000 1000 500 100 50 10 5 1; do time ./pre-inst-env guix git log --oneline \ | head -n $n > /dev/null ; done --8<---------------cut here---------------end--------------->8--- to have kind of warm cache. And again for the equivalent Git command. Then, bit of Emacs edit processing to transform the output in the buffer of ’M-x shell’ to something in the Python file: tguix = np.array([2.871, … tgit = np.array([0.013, … (Well, to be correct, it is not twice but a couple of times to have an average.) Let normalize by removing the additive constant and run a classic linear regression: t ~ B n ^ a => log(t) ~ a log(n) + b where ’a’ and ’b’ have to be estimated. Well, I am surprised by ’a ~ 0.5’ which means that “guix git log” runs with a sublinear time complexity. However, “git log” is linear. Maybe I am doing something wrong. Initially I thought the tree was badly walked but a quick: --8<---------------cut here---------------start------------->8--- $ ./pre-inst-env guix git log --channel-cache-path guix /home/simon/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq $ git -C /home/simon/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq \ log --oneline | wc -l 72791 $ ./pre-inst-env guix git log --oneline | wc -l 72791 --8<---------------cut here---------------end--------------->8--- shows it is correct. Hum?! Well, I suspect noise on the data and the normalization is bad here. Running the experiment in a batch of 10 times then averaging them should give an analysis more meaningful. Hey, that’s a quick one. :-) The conclusion here, it scales well enough… for now. Therefore, the real the question is about the «additive constant». On my machine, it is ~2.9s and this is where there is room of improvement, I guess. I exported ’get-commits’ to have it in the REPL and I did: --8<---------------cut here---------------start------------->8--- $ ./pre-inst-env guix repl GNU Guile 3.0.5 Copyright (C) 1995-2021 Free Software Foundation, Inc. Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'. This program is free software, and you are welcome to redistribute it under certain conditions; type `,show c' for details. Enter `,help' for help. scheme@(guix-user)> ,use(guix scripts git log) scheme@(guix-user)> (define (compute) (begin (get-commits) 'ok)) scheme@(guix-user)> ,time (compute) $1 = ok ;; 2.533936s real time, 3.099156s run time. 0.901027s spent in GC. scheme@(guix-user)> --8<---------------cut here---------------end--------------->8--- And I let you run “,profile (compute)”. Well, this function should be optimized. IMHO. Initially, I thought about “stream” but I do no think it is the issue here. Well, I think that the ’repo’ is open at each commit when folding. Instead, it should be open before, keep alive and close at the end of the ’fold’. Somehow. WDYT? I will give a closer look to ’commit-closure’ because I am not convinced it is useful here, neither. Bonus, attached the Python script and the plot. :-) Cheers, simon
# coding: utf-8 # import numpy as np import matplotlib.pyplot as plt n = np.array([1., 5., 10., 50., 100., 500., 1000., 5000., 10000., 60000.]) tguix = np.array([2.871, 3.006, 2.987, 2.892, 2.991, 3.038, 3.088, 3.315, 3.565, 5.634]) tgit = np.array([0.013, 0.006, 0.013, 0.016, 0.014, 0.029, 0.047, 0.12, 0.233, 1.283]) normalize = lambda x: x - x[0] logify = lambda x: np.log10(x[3:]) x = logify(n) y = logify(normalize(tguix)) z = logify(normalize(tgit)) X = np.concatenate((x.reshape((len(x), 1)), np.ones((len(x), 1))), axis=1) guix, res, rank, s = np.linalg.lstsq(X, y, rcond=None) git, res, rank, s = np.linalg.lstsq(X, z, rcond=None) predict = lambda sol: sol[0] * x + sol[1] plt.plot(x, y, 'bo', label='guix') plt.plot(x, z, 'ro', label='git') plt.plot(x, predict(guix), 'bx-') plt.plot(x, predict(git), 'rx-') plt.legend(loc='lower right') plt.xlabel("log(N) where N in `| head -n N`") plt.ylabel("log(real time)") plt.show()