Re: tail

Cameron Simpson Wed, 18 May 2022 14:33:21 -0700

On 17May2022 22:45, Marco Sulla <[email protected]> wrote:
>Well, I've done a benchmark.
>>>> timeit.timeit("tail('/home/marco/small.txt')", globals={"tail":tail}, 
>>>> number=100000)
>1.5963431186974049
>>>> timeit.timeit("tail('/home/marco/lorem.txt')", globals={"tail":tail}, 
>>>> number=100000)
>2.5240604374557734
>>>> timeit.timeit("tail('/home/marco/lorem.txt', chunk_size=1000)", 
>>>> globals={"tail":tail}, number=100000)
>1.8944984432309866


This suggests that the file size does not dominate uour runtime. Ah.  
_Or_ that there are similar numbers of newlines vs text in the files so 
reading similar amounts of data from the end. If the "line desnity" of 
the files were similar you would hope that the runtimes would be 
similar.

>small.txt is a text file of 1.3 KB. lorem.txt is a lorem ipsum of 1.2
>GB. It seems the performance is good, thanks to the chunk suggestion.
>
>But the time of Linux tail surprise me:
>
>marco@buzz:~$ time tail lorem.txt
>[text]
>
>real    0m0.004s
>user    0m0.003s
>sys    0m0.001s
>
>It's strange that it's so slow. I thought it was because it decodes
>and print the result, but I timed

You're measuring different things. timeit() tries hard to measure just 
the code snippet you provide. It doesn't measure the startup cost of the 
whole python interpreter. Try:

    time python3 your-tail-prog.py /home/marco/lorem.txt

BTW, does your `tail()` print output? If not, again not measuring the 
same thing.

If you have the source of tail(1) to hand, consider getting to the core 
and measuring `time()` immediately before and immediately after the 
central tail operation and printing the result.

Also: does tail(1) do character set / encoding stuff? Does your Python 
code do that? Might be apples and oranges.

Cheers,
Cameron Simpson <[email protected]>
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: tail

Reply via email to