On Wed, Dec 9, 2015 at 4:54 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Robert Haas <robertmh...@gmail.com> writes: >> The comment in mdnblocks.c says this: > >> * Because we pass O_CREAT, we will create the >> next segment (with >> * zero length) immediately, if the last >> segment is of length >> * RELSEG_SIZE. While perhaps not strictly >> necessary, this keeps >> * the logic simple. > >> I don't really see how this "keeps the logic simple". > > My (vague) recollection is that this is defending some assumptions > elsewhere in md.c. You sure you haven't broken anything with this? > Relation extension across a segment boundary, perhaps?
It doesn't look like that particular thing gets broken here. The key to extending across a segment boundary is that _mdfd_openseg() needs to get O_CREAT as its fourth argument, but _mdfd_getseg() thankfully does not rely on mdnblocks() to have already done that: it passes O_CREAT itself. I did also test it by creating a 13GB pgbench_accounts table, and at least in that simple case it works fine. It's possible that there is some other thing I've missed, but in general I think that if there is some dependency on mdnblocks() creating new segments, that's a bad idea and we should try to get rid of it. A call that supposedly reports the number of blocks in a relation really has no business changing on-disk state. (More broadly, as Kevin was pointing out to me yesterday, md.c looks like it could do with a face lift. Keeping a linked list of 1GB segments and chasing down the list to find the length of the file may have been fine when relations over 1GB were rare, but that's now routine. Some relations may be over 1TB, and using a linked list of 1000 entries to keep track of all of those segments does not seem like an especially good choice. In fact, having no way to get the relation length other than scanning 1000 files doesn't seem like an especially good choice even if we used a better data structure. Putting a header page in the heap would make getting the length of a relation O(1) instead of O(segments), and for a bonus, we'd be able to reliably detect it if a relation file disappeared out from under us. That's a difficult project and definitely not my top priority, but this code is old and crufty all the same.) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers