On Tue, Apr 18, 2023 at 11:38 AM Tom Lane <t...@sss.pgh.pa.us> wrote: > Robert Haas <robertmh...@gmail.com> writes: > > We have a few different places in the code where we generate or modify > > tar headers or just read data out of them. The code in question uses > > one of my less-favorite programming things: magic numbers. The offsets > > of the various fields within the tar header are just hard-coded in > > each relevant place in our code. I think we should clean that up, as > > in the attached patch. > > Generally +1, with a couple of additional thoughts: > > 1. Is it worth inventing macros for the values of the file type, > rather than just writing the comment you did?
Might be. > 2. The header size is defined as 512 bytes, but this doesn't sum to 512: > > + TAR_OFFSET_PREFIX = 345 /* 155 byte string */ > > Either that field's length is really 167 bytes, or there's some other > field you didn't document. (It looks like you may have copied "155" > from an incorrect existing comment?) According to my research, it is neither of those, e.g. see https://www.subspacefield.org/~vax/tar_format.html https://www.ibm.com/docs/en/zos/2.4.0?topic=formats-tar-format-tar-archives https://wiki.osdev.org/USTAR I think that what happened is that whoever designed the original tar format decided on 512 byte blocks. And the header did not take up the whole block. The USTAR format is an extension of the original format which uses more of the block, but still not all of it. > Yeah, this is adding greppability (a good thing!) but little more. > However, I'm not convinced it's worth doing more. It's not like > this data structure will change anytime soon. Right. greppability is a major concern for me here, and also bug surface. Right now, to use the functions in pgtar.h, you need to know all the header offsets as well as the format and length of each header field. This centralizes constants for the header offsets, and at least provides some centralized documentation of the rest. It's not great, though, because it seems like there's some risk of someone writing new code and getting confused about whether the length of a certain field is 8 or 12, for example. A thicker abstraction layer might be able to avoid or minimize such risks better than what we have, but I don't really know how to design it, whereas this seems like an obvious improvement. -- Robert Haas EDB: http://www.enterprisedb.com