Hi, On 2019-10-31 20:15:12 +0100, Tomas Vondra wrote: > On Thu, Oct 31, 2019 at 11:48:21AM -0700, Andres Freund wrote: > > We currently align byval types such as int4/8, float4/8, timestamp *, > > date etc, even though we mostly don't need to. When tuples are deformed, > > all byval types are copied out from the tuple data into the > > corresponding Datum array, therefore the original alignment in the tuple > > data doesn't matter. This is different from byref types, where the > > Datum formed will often be a pointer into the tuple data. > > > > While there are some older systems where it could be a bit slower to > > copy data out from unaligned positions into the datum array, this is > > more than bought back by the next point: > > > > > > The fact that these types are aligned has substantial costs: > > > > For one, we often waste substantial amounts of space inside tables with > > alignment padding. It's not uncommon to see about 30% or more of space > > wasted (especially when taking alignment of the first column into > > account). > > > > For another, and this I think is less obvious, we actually waste > > substantial amounts of CPU maintaining the alignment. This is primarily > > the case because we have to perform to align the pointer to the next > > field during tuple [de]forming. Those instructions [1] have to be > > executed taking time, but what's worse, they also reduce the ability of > > out-of-order execution to hide latencies. There's a hard dependency on > > knowing the offset to the next column to be able to continue with the > > next column. > > > > Right. Reducing this overhead was one of the goals to allow "logical > ordering" of columns in a table (while arbitrarily reordering the > physical ones), but that patch got out of hand pretty quickly. Also, > it'd still keep some of the overhead, because it was not removing the > alignment/padding entirely.
Yea. It'd keep just about all the CPU overhead, because we'd still need to align as soon as there is a preceding nulled or varlena colum. There's still some benefit for logical column order, as grouping NOT NULL fixed-length columns at the start is beneficial. And it's also beneficial to have frequently accessed columns at the start. But I think this proposal gains a lot of the space related benefits, at a much lower complexity, together with a lot of other benefits. > > There's two reasons why we can't just set the alignment for these types > > to 'c'. > > 1) pg_upgrade, for fairly obvious reasons > > 2) We map catalog table rows to structs, in a *lot* of places. > > > > > > It seems to me that, despite the above, it's still worth trying to > > improve upon the current state, to benefit from reduced space and CPU > > usage. > > > > As it turns out we already separate out the alignment for a type, and a > > column, between pg_type.typalign and pg_attribute.attalign. One way to > > tackle this would be to allow to specify, for byval types only, at > > column creation time whether a column uses a 'struct-mappable' alignment > > or not. When set, set attalign to pg_type.typalign for alignment, when > > not, to 'c'. By changing pg_dump in binary upgrade mode to emit the > > necessary options, and by adding such options during bki processing, > > we'd solve 1) and 2), but otherwise gain the benefits. > > > > Alternatively we could declare such a propert on the table level, but > > that seems more restrictive, without a corresponding upside. > I don't know, but it seems entirely sufficient specifying this at the > table level, no? What would be the use case for removing padding for > only some of the columns? I don't see the use case for that. Well, if we had it on a per-table level, we'd also align a) catalog table columns that follow the first varlena column - which we don't need to align, as they can't be accessed via mapping b) columns in pg_upgraded tables that have been added after the upgrade > > It's possible that we should do something related with a few varlena > > datatypes. We currently use intalign for types like text, json, and as > > far as I can tell that does not make all that much sense. They're not > > struct mappable *anyway* (and if they were, they'd need to be 8 byte > > aligned on common platforms, __alignof__(void*) is 8). We'd have to take > > a bit of care to treat the varlena header as unaligned - but we need to > > do so anyway, because of 1byte varlenas. Short varlenas seems to make it > > less crucial to pursue this, as the average datum that'd benefit is long > > enough to make padding a non-issue. So I don't think it'd make sense to > > tackle this project at the same time. > > > > Not sure, but how come it's not failing on the picky platforms, then? On > x86 it's probably OK because it's pretty permissive, but I'd expect some > platforms (s390, parisc, itanium, powerpc, ...) to be much pickier. I'm not quite following? As I said, we already need to use alignment aware code due to caring for short varlenas. And these types aren't actually struct mappable. Greetings, Andres Freund