New submission from C.A.M. Gerlach <cam.gerl...@gerlach.cam>:

I propose changing tarfile.DEFAULT_FORMAT to be tarfile.PAX_FORMAT , rather 
than the legacy tarfile.GNU_FORMAT for Python 3.8. This would offer several 
benefits:

• Removes limitations of the old GNU tar format, including in max UID/GID 
values and bits in device major and minor numbers, and is the most flexible and 
feature-rich tar format currently
• Encodes all filenames as UTF-8 in a portable way, ensuring consistent and 
correct handling on all platforms, avoid errors like [this 
one](https://stackoverflow.com/questions/19902544/tarfile-produce-garbled-file-name-in-the-tar-gz-archivement)
 and generally ensure expected, sensible defaults
• Is the current interoperable POSIX standard, used by all modern platforms 
(Linux, Unix, macOS, and third-party unarchivers on Windows) rather than a 
vendor-specific extension like GNU tar
• Backwards compatible with any unarchiver capable of reading ustar format, 
unlike GNU tar as the extended pax headers will just be ignored
• Fixes bpo-30661, support tarfile.PAX_FORMAT in shutil.make_archive (was 
proposed as a fix to the same, but it was never followed up on and the issue 
remains open)

This change would have no effect on reading existing archives, only writing new 
ones, and should be broadly compatible with any remotely modern system, as pax 
support is included in all the widely used libraries/systems:

* POSIX 2001 (major Unix vendors), released in 2001 (18 years ago)
* GNU tar 1.14 (Linux, etc), released in 2004 (15 years ago)
* bsdtar/libtar ~1.2.51 (BSD, macOS, etc), at least as of 2006 (13 years ago), 
with significant bug fixes up through 2011 (8 years ago)
* 7-zip (Windows) at some point before 2011 (>8 years ago), with significant 
bug fixes up to 2011 (8 years ago)
* Python 2.6, released in 2008 (11 years ago)

Furthermore, essentially every existing archiver supports ustar format, which 
would allow interoperability on very old/exotic platforms that don't support 
pax for some reason (and would certainly not support GNU). Therefore, it should 
be more than safe to make the change now, with archivers on the three major 
platforms supporting the modern standard for nearly a decade, and any esoteric 
ones at least as likely to support the POSIX standard as the vendor-specific 
GNU extension.

Is there any particular reason why we shouldn't make this change? Is there a 
particular group/list I should contact to follow up about seeing this 
implemented? It seems it should only require a one-line change 
[here](https://stackoverflow.com/questions/19902544/tarfile-produce-garbled-file-name-in-the-tar-gz-archivement),
 aside from updating the docs and presumably the NEWS file, which I would be 
willing to do (I would think it should make a fairly straightforward first 
contribution).

----------
components: Library (Lib)
messages: 337710
nosy: CAM-Gerlach
priority: normal
severity: normal
status: open
title: Change default tar format to modern POSIX 2001 (pax) for better 
portability/interop, support and standards conformance
type: behavior
versions: Python 3.8

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue36268>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to