On Fri, Jun 05, 2015 at 08:25:34AM +0100, Simon Riggs wrote: > This whole idea of "feature development" vs reliability is bogus. It > implies people that work on features don't care about reliability. Given > the fact that many of the features are actually about increasing database > reliability in the event of crashes and corruptions it just makes no sense.
I'm contrasting work that helps to keep our existing promises ("reliability") with work that makes new promises ("features"). In software development, we invariably hazard old promises to make new promises; our success hinges on electing neither too little nor too much risk. Two years ago, PostgreSQL's track record had placed it in a good position to invest in new, high-risk, high-reward promises. We did that, and we emerged solvent yet carrying an elevated debt service ratio. It's time to reduce risk somewhat. You write about a different sense of "reliability." (Had I anticipated this misunderstanding, I might have written "Restore-probity mode.") None of this was about classifying people, most of whom allocate substantial time to each kind of work. > How will we participate in cleanup efforts? How do we know when something > has been "cleaned up", how will we measure our success or failure? I think > we should be clear that wasting N months on cleanup can *fail* to achieve a > useful objective. Without a clear plan it almost certainly will do so. The > flip side is that wasting N months will cause great amusement and dancing > amongst those people who wish to pull ahead of our open source project and > we should take care not to hand them a victory from an overreaction. I agree with all that. We should likewise take care not to become insolvent from an underreaction. > So lets do our normal things, not do a "total stop" for an indefinite > period. If someone has specific things that in their opinion need to be > addressed, list them and we can talk about doing them, together. I recommend these four exit criteria: 1. Non-author committer review of foreign keys locks/multixact durability. Done when that committer certifies, as if he were committing the patch himself today, that the code will not eat data. 2. Non-author committer review of row-level security. Done when that committer certifies that the code keeps its promises and that the documentation bounds those promises accurately. 3. Second committer review of the src/backend/access changes for INSERT ... ON CONFLICT DO NOTHING/UPDATE. (Bugs affecting folks who don't use the new syntax are most likely to fall in that portion.) Unlike the previous two criteria, a review without certification is sufficient. 4. Non-author committer certifying that the 9.5 WAL format changes will not eat your data. The patch lists Andres and Alvaro as reviewers; if they already reviewed it enough to make that certification, this one is easy. That ties up four people. For everyone else: - Fix bugs those reviews find. This will start slow but will grow to keep everyone busy. Committers won't certify code, and thus we can't declare victory, until these bugs are fixed. The rest of this list, in contrast, calls out topics to sample from, not topics to exhaust. - Turn current buildfarm members green. - Write, review and commit more automated test machinery to PostgreSQL. Test whatever excites you. If you need ideas, Craig posted some good ones upthread. Here are a few more: - Add a debug mode that calls sched_yield() in SpinLockRelease(); see 6322.1406219...@sss.pgh.pa.us. - Improve TAP suite (src/test/perl/TestLib.pm) logging. Currently, these suites redirect much output to /dev/null. Instead, log that output and teach the buildfarm to capture the log. - Call VALGRIND_MAKE_MEM_NOACCESS() on a shared buffer when its local pin count falls to zero. Under CLOBBER_FREED_MEMORY, wipe a shared buffer when its global pin count falls to zero. - With assertions enabled, or perhaps in a new debug mode, have pg_do_encoding_conversion() and pg_server_to_any() check the data for a no-op conversion instead of assuming the data is valid. - Add buildfarm members. This entails reporting any bugs that prevent an initial passing run. Once you have a passing run, schedule regular runs. Examples of useful additions: - "./configure ac_cv_func_getopt_long=no, ac_cv_func_snprintf=no ..." to enable all the replacement code regardless of the current platform's need for it. This helps distinguish "Windows bug" from "replacement code bug." - --disable-integer-datetimes, --disable-float8-byval, disable-float4-byval, --disable-spinlocks, --disable-atomics, disable-thread-safety, --disable-largefile, #define RANDOMIZE_ALLOCATED_MEMORY - Any OS or CPU architecture other than x86 GNU/Linux, even ones already represented. - Write, review and commit fixes for the bugs that come to light by way of these new automated tests. - Anything else targeted to make PostgreSQL keep the promises it has already made to our users. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers