On Fri, Sep 20, 2019 at 11:09 AM David Steele <da...@pgmasters.net> wrote: > Seems to me we are overdue for elog()/ereport() compatible > error-handling in the front end. Plus mem contexts. > > It sucks to make that a prereq for this project but the longer we kick > that can down the road...
There are no doubt many patches that would benefit from having more backend infrastructure exposed in frontend contexts, and I think we're slowly moving in that direction, but I generally do not believe in burdening feature patches with major infrastructure improvements. Sometimes it's necessary, as in the case of parallel query, which required upgrading a whole lot of backend infrastructure in order to have any chance of doing something useful. In most cases, however, there's a way of getting the patch done that dodges the problem. For example, I think there's a pretty good argument that Heikki's design for relation forks was a bad one. It's proven to scale poorly and create performance problems and extra complexity in quite a few places. It would likely have been better, from a strictly theoretical point of view, to insist on a design where the FSM and VM pages got stored inside the relation itself, and the heap was responsible for figuring out how various pages were being used. When BRIN came along, we insisted on precisely that design, because it was clear that further straining the relation fork system was not a good plan. However, if we'd insisted on that when Heikki did the original work, it might have delayed the arrival of the free space map for one or more releases, and we got big benefits out of having that done sooner. There's nothing stopping someone from writing a patch to get rid of relation forks and allow a heap AM to have multiple relfilenodes (with the extra ones used for the FSM and VM) or with multiplexing all the data inside of a single file. Nobody has, though, because it's hard, and the problems with the status quo are not so bad as to justify the amount of development effort that would be required to fix it. At some point, that problem is probably going to work its way to the top of somebody's priority list, but it's already been about 10 years since that all happened and everyone has so far dodged dealing with the problem, which in turn has enabled them to work on other things that are perhaps more important. I think the same principle applies here. It's reasonable to ask the author of a feature patch to fix issues that are closely related to the feature in question, or even problems that are not new but would be greatly exacerbated by the addition of the feature. It's not reasonable to stack up a list of infrastructure upgrades that somebody has to do as a condition of having a feature patch accepted that does not necessarily require those upgrades. I am not convinced that JSON is actually a better format for a backup manifest (more on that below), but even if I were, I believe that getting a backup manifest functionality into PostgreSQL 13, and perhaps incremental backup on top of that, is valuable enough to justify making some compromises to make that happen. And I don't mean "compromises" as in "let's commit something that doesn't work very well;" rather, I mean making design choices that are aimed at making the project something that is feasible and can be completed in reasonable time, rather than not. And saying, well, the backup manifest format *has* to be JSON because everything else suxxor is not that. We don't have a single other example of a file that we read and write in JSON format. Extension control files use a custom format. Backup labels and backup history files and timeline history files and tablespace map files use custom formats. postgresql.conf, pg_hba.conf, and pg_ident.conf use custom formats. postmaster.opts and postmaster.pid use custom formats. If JSON is better and easier, at least one of the various people who coded those things up would have chosen to use it, but none of them did, and nobody's made a serious attempt to convert them to use it. That might be because we lack the infrastructure for dealing with JSON and building it is more work than anybody's willing to do, or it might be because JSON is not actually better for these kinds of use cases, but either way, it's hard to see why this particular patch should be burdened with a requirement that none of the previous ones had to satisfy. Personally, I'd be intensely unhappy if a motion to convert postgresql.conf or pg_hba.conf to JSON format gathered enough steam to be adopted. It would be darn useful, because you could specify complex values for options instead of being limited to scalars, but it would also make the configuration files a lot harder for human beings to read and grep and the quality of error reporting would probably decline significantly. Also, appending a setting to the file, something which is currently quite simple, would get a lot harder. Ad-hoc file formats can be problematic, but they can also have real advantages in terms of readability, brevity, and fitness for purpose. > This talk was good fun. The largest number of tables we've seen is a > few hundred thousand, but that still adds up to more than a million > files to backup. A quick survey of some of my colleagues turned up a few examples of people with 2-4 million files to backup, so similar kind of ballpark. Probably not big enough for the manifest to hit the 1GB mark, but getting close. > > Or we could just decide that you have to have enough memory > > to hold the parsed version of the entire manifest file in memory all > > at once, and if you don't, maybe you should drop some tables or buy > > more RAM. > > I assume you meant "un-parsed" here? I don't think I meant that, although it seems like you might need to store either all the parsed data or all the unparsed data or even both, depending on exactly what you are trying to do. > > I hear you saying that this is going to end up being just as complex > > in the end, but I don't think I believe it. It sounds to me like the > > difference between spending a couple of hours figuring this out and > > spending a couple of months trying to figure it out and maybe not > > actually getting anywhere. > > Maybe the initial implementation will be easier but I am confident we'll > pay for it down the road. Also, don't we want users to be able to read > this file? Do we really want them to need to cook up a custom parser in > Perl, Go, Python, etc.? Well, I haven't heard anybody complain that they can't read a backup_label file because it's too hard to cook up a parser. And I think the reason is pretty clear: such files are not hard to parse. Similarly for a pg_hba.conf file. This case is a little more complicated than those, but AFAICS, not enormously so. Actually, it seems like a combination of those two cases: it has some fixed metadata fields that can be represented with one line per field, like a backup_label, and then a bunch of entries for files that are somewhat like entries in a pg_hba.conf file, in that they can be represented by a line per record with a certain number of fields on each line. I attach here a couple of patches. The first one does some refactoring of relevant code in pg_basebackup, and the second one adds checksum manifests using a format that I pulled out of my ear. It probably needs some adjustment but I don't think it's crazy. Each file gets a line that looks like this: File $FILENAME $FILESIZE $FILEMTIME $FILECHECKSUM Right now, the file checksums are computed using SHA-256 but it could be changed to anything else for which we've got code. On my system, shasum -a256 $FILE produces the same answer that shows up here. At the bottom of the manifest there's a checksum of the manifest itself, which looks like this: Manifest-Checksum 385fe156a8c6306db40937d59f46027cc079350ecf5221027d71367675c5f781 That's a SHA-256 checksum of the file contents excluding the final line. It can be verified by feeding all the file contents except the last line to shasum -a256. I can't help but observe that if the file were defined to be a JSONB blob, it's not very clear how you would include a checksum of the blob contents in the blob itself, but with a format based on a bunch of lines of data, it's super-easy to generate and super-easy to write tools that verify it. This is just a prototype so I haven't written a verification tool, and there's a bunch of testing and documentation and so forth that would need to be done aside from whatever we've got to hammer out in terms of design issues and file formats. But I think it's cool, and perhaps some discussion of how it could be evolved will get us closer to a resolution everybody can at least live with. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
0002-POC-of-backup-manifest-with-file-names-sizes-timesta.patch
Description: Binary data
0001-Refactor-some-pg_basebackup-code.patch
Description: Binary data