On 17May2022 22:45, Marco Sulla <marco.sulla.pyt...@gmail.com> wrote: >Well, I've done a benchmark. >>>> timeit.timeit("tail('/home/marco/small.txt')", globals={"tail":tail}, >>>> number=100000) >1.5963431186974049 >>>> timeit.timeit("tail('/home/marco/lorem.txt')", globals={"tail":tail}, >>>> number=100000) >2.5240604374557734 >>>> timeit.timeit("tail('/home/marco/lorem.txt', chunk_size=1000)", >>>> globals={"tail":tail}, number=100000) >1.8944984432309866
This suggests that the file size does not dominate uour runtime. Ah. _Or_ that there are similar numbers of newlines vs text in the files so reading similar amounts of data from the end. If the "line desnity" of the files were similar you would hope that the runtimes would be similar. >small.txt is a text file of 1.3 KB. lorem.txt is a lorem ipsum of 1.2 >GB. It seems the performance is good, thanks to the chunk suggestion. > >But the time of Linux tail surprise me: > >marco@buzz:~$ time tail lorem.txt >[text] > >real 0m0.004s >user 0m0.003s >sys 0m0.001s > >It's strange that it's so slow. I thought it was because it decodes >and print the result, but I timed You're measuring different things. timeit() tries hard to measure just the code snippet you provide. It doesn't measure the startup cost of the whole python interpreter. Try: time python3 your-tail-prog.py /home/marco/lorem.txt BTW, does your `tail()` print output? If not, again not measuring the same thing. If you have the source of tail(1) to hand, consider getting to the core and measuring `time()` immediately before and immediately after the central tail operation and printing the result. Also: does tail(1) do character set / encoding stuff? Does your Python code do that? Might be apples and oranges. Cheers, Cameron Simpson <c...@cskk.id.au> -- https://mail.python.org/mailman/listinfo/python-list