[issue44262] tarfile: some content different output

2021-06-01 Thread Filipe Laíns
Change by Filipe Laíns : -- resolution: -> not a bug stage: -> resolved status: open -> closed ___ Python tracker ___ ___ Python-bu

[issue44262] tarfile: some content different output

2021-05-31 Thread Vasco Gervasi
Vasco Gervasi added the comment: Yes, you can close it. For future reference: tar_reset = "/tmp/py_tar_reset.tar" def reset(tarinfo): tarinfo.uid = tarinfo.gid = 0 tarinfo.uname = tarinfo.gname = "root" tarinfo.mtime = 1 return tarinfo with tarfile.open(tar_reset, "w:xz") as

[issue44262] tarfile: some content different output

2021-05-30 Thread Filipe Laíns
Filipe Laíns added the comment: Yeah, I understand. What you want is achieved by making sure the mtime from the tar archive, and files on the archive, is reproducible, like I demonstrated here. Can this be closed? -- ___ Python tracker

[issue44262] tarfile: some content different output

2021-05-30 Thread Vasco Gervasi
Vasco Gervasi added the comment: Dear Filipe, sorry I did not explaing the use case, obiously this is a toy example to show my problem. So I have pipeline, that from a repository generate a tar file, using a python script; if the hash of the tar file is different it will trigger other things.

[issue44262] tarfile: some content different output

2021-05-30 Thread Filipe Laíns
Filipe Laíns added the comment: tarfile will keep the mtime from the file, the issue is that you are touching the files in the beginning of the script. When you write to the files, you change the mtime (last modified time), which produces a different TarInfo. If you comment out the code that

[issue44262] tarfile: some content different output

2021-05-30 Thread Vasco Gervasi
Vasco Gervasi added the comment: Dear Filipe, thanks for your answer. Following your suggestion, I have tried the attached file. The output is: $ python /data/compress.py b'68963e137ced6ee2aa5a93e155b201a3c172e2683d4b15a0eab7c1d8d43e48b4 /tmp/py_gzip.tgz\n' b'68963e137ced6ee2aa5a93e155b201a3c

[issue44262] tarfile: some content different output

2021-05-29 Thread Filipe Laíns
Filipe Laíns added the comment: I modified the script to keep the both Python generated tarballs and ran diffoscope, which presents the issue very clearly: $ diffoscope py.gz py2.gz --- py.gz +++ py2.gz ├── filetype from file(1) │ @@ -1 +1 @@ │ -gzip compressed data, was "py", last modified:

[issue44262] tarfile: some content different output

2021-05-28 Thread Vasco Gervasi
New submission from Vasco Gervasi : Hi, I am seeing some irregularities on the the tar files created using python. Consider the attached script. This is the output from the scripts: ``` # gz b'0f2eb7b3cac63267b1cf51d2bd5e3144f53cc5b172bbad3dccd5adf4ffb2d220 /tmp/py.gz\n' 9bde8fdb44d98c5a838